MarkLogic结构化查询Peformance

我工作的地方达1万份文件需要我们使用的是Java客户端API和结构化查询来执行搜索以2小时MarkLogic结构化查询Peformance

要获取的应用程序。但是,查询仍然很慢。

守则如下:

def fetchPostMessages(dbParam: DbParam): Page = { 

val queryManager = dbClient.newQueryManager()

val sqb: StructuredQueryBuilder = queryManager.newStructuredQueryBuilder()

log.info(s "Fetching post messages from database for params: {}", dbParam)

val modifiedQueryDef = dbParam.param.map {

param => {

sqb.and(

sqb.word(sqb.jsonProperty(status), toBeReported),

sqb.word(sqb.jsonProperty(dataCategory), "dataCategory1"),

sqb.range(sqb.jsonProperty(creationDate), marklogicDateFormat.name, Operator.LE, DateUtil.printFpmlDateTime(param.messagesTime)))

}

}.getOrElse(sqb.and(sqb.word(sqb.jsonProperty(status.name), toBeReported.name)))

modifiedQueryDef.setCollections(XmlConstants.ItracMessageTypes.OUTPUT_MESSAGE.name)

modifiedQueryDef.setOptionsName(sortOption)

search(modifiedQueryDef, dbParam.pageNum, dbParam.batchSize)

}

private def search(queryDef: QueryDefinition, startIndex: Int, batchSize: Int): Page = {

val dataList: ListBuffer[Document] = new ListBuffer()

val jsonDocManager = dbClient.newJSONDocumentManager()

jsonDocManager.setMetadataCategories(Metadata.ALL)

jsonDocManager.setPageLength(

if (batchSize < pageLength) batchSize

else pageLength)

val documentPage = jsonDocManager.search(queryDef, startIndex);

dataList.++ = (extractContent(documentPage))

val totalSize = documentPage.getTotalSize

log.info(s "Total documents to be reported : ${totalSize}")

var pageSize = documentPage.getPageSize

while (pageSize < batchSize && pageSize <= totalSize) {

if (batchSize - pageSize < pageSize)

jsonDocManager.setPageLength(batchSize - pageSize)

var newDocPage = jsonDocManager.search(queryDef, pageSize + 1)

dataList.++ = (extractContent(newDocPage))

pageSize = pageSize + newDocPage.getPageSize

}

log.info("Total messages fetched are : {}", dataList.size)

Page(startIndex, totalSize - batchSize, dataList.to[collection.immutable.Seq])

}

排序选项:

<search:options xmlns:search="http://marklogic.com/appservices/search"> 

<search:sort-order type="xs:string" direction="ascending">

<search:json-property>subdomLvl1</search:json-property>

</search:sort-order>

<search:sort-order type="xs:string" direction="ascending">

<search:json-property>trdId</search:json-property>

</search:sort-order>

<search:sort-order type="xs:string" direction="ascending">

<search:json-property>validStartDate</search:json-property>

</search:sort-order>

<search:sort-order type="xs:string" direction="ascending">

<search:json-property>ver</search:json-property>

</search:sort-order>

<search:sort-order type="xs:string" direction="ascending">

<search:json-property>reportStatus</search:json-property>

</search:sort-order>

</search:options>

的数据库索引是如下:

元素范围索引上 - 状态,dataCategory和creationDate和所有排序选项

回答:

如果进程不需要该文档nt元数据,请考虑使用jsonDocManager.clearMetadataCategories()而不是jsonDocManager.setMetadataCategories(Metadata.ALL)进行配置。这种方法将减少服务器和客户端的工作,并减少传输的数据。

循环可以通过测试newDocPage.hasNextPage()被简化 - 见:

http://docs.marklogic.com/guide/java/bulk#id_21619

而是在一个列表中积累的所有万份文件中,可以在客户端流文件到消费过程他们到了?这肯定会提高吞吐量。

您还可以考虑使用数据移动SDK阅读多线程文件:

http://docs.marklogic.com/guide/java/data-movement#id_60613

http://docs.marklogic.com/javadoc/client/com/marklogic/client/datamovement/QueryBatcher.html

希望帮助,

以上是 MarkLogic结构化查询Peformance 的全部内容, 来源链接: utcz.com/qa/265368.html

回到顶部