如何进行elasticsearch查询以过滤字段的最大值?

我希望能够查询文本,但也只能检索数据中某个整数字段的最大值的结果。我已经阅读了有关聚合和过滤器的文档,但我不太清楚自己在寻找什么。

例如,我有一些重复的数据得到索引,除了整数字段外,这些数据都是相同的-我们称这个字段为lastseen

因此,作为示例,给定将这些数据放入elasticsearch中:

  //  these two the same except "lastseen" field

curl -XPOST localhost:9200/myindex/myobject -d '{

"field1": "dinner carrot potato broccoli",

"field2": "something here",

"lastseen": 1000

}'

curl -XPOST localhost:9200/myindex/myobject -d '{

"field1": "dinner carrot potato broccoli",

"field2": "something here",

"somevalue": 100

}'

# and these two the same except "lastseen" field

curl -XPOST localhost:9200/myindex/myobject -d '{

"field1": "fish chicken something",

"field2": "dinner",

"lastseen": 2000

}'

curl -XPOST localhost:9200/myindex/myobject -d '{

"field1": "fish chicken something",

"field2": "dinner",

"lastseen": 200

}'

如果我查询 "dinner"

  curl -XPOST localhost:9200/myindex -d '{  

"query": {

"query_string": {

"query": "dinner"

}

}

}'

我会得到4个结果。我想要一个过滤器,这样我只能得到两个结果-仅包含具有最大lastseen字段的项目。

这 是 ,但希望它能使您对我的追求有一个了解:

{

"query": {

"query_string": {

"query": "dinner"

}

},

"filter": {

"max": "lastseen"

}

}

结果如下所示:

"hits": [

{

...

"_source": {

"field1": "dinner carrot potato broccoli",

"field2": "something here",

"lastseen": 1000

}

},

{

...

"_source": {

"field1": "fish chicken something",

"field2": "dinner",

"lastseen": 2000

}

}

]

我尝试创建一个不lastseen包含在索引中的映射。这没有用。仍会取回所有4个结果。

curl -XPOST localhost:9200/myindex -d '{  

"mappings": {

"myobject": {

"properties": {

"lastseen": {

"type": "long",

"store": "yes",

"include_in_all": false

}

}

}

}

}'

我尝试使用此处列出的agg方案进行重复数据删除,但该方法不起作用,但更重要的是,我没有找到将其与关键字搜索结合的方法。

回答:

不理想,但是我认为它可以满足您的需求。

field1假设您是用来定义“重复”文档的字段,请更改字段的映射,如下所示:

PUT /lastseen

{

"mappings": {

"test": {

"properties": {

"field1": {

"type": "string",

"fields": {

"raw": {

"type": "string",

"index": "not_analyzed"

}

}

},

"field2": {

"type": "string"

},

"lastseen": {

"type": "long"

}

}

}

}

}

意思是,您添加了一个.raw子字段,not_analyzed这意味着将按原样对它进行索引,而无需进行分析并将其分解为术语。这是为了使有些“重复的文档发现”成为可能。

然后,您需要在上使用terms聚合field1.raw(用于重复项)和top_hits子聚合,以获取每个field1值的单个文档:

GET /lastseen/test/_search

{

"size": 0,

"query": {

"query_string": {

"query": "dinner"

}

},

"aggs": {

"field1_unique": {

"terms": {

"field": "field1.raw",

"size": 2

},

"aggs": {

"first_one": {

"top_hits": {

"size": 1,

"sort": [{"lastseen": {"order":"desc"}}]

}

}

}

}

}

}

此外,传回的那个单一文件top_hits是最高的lastseen(可能使"sort": [{"lastseen":

{"order":"desc"}}])。

您将获得的结果是这些(在aggregationsnot 之下hits):

   ...

"aggregations": {

"field1_unique": {

"doc_count_error_upper_bound": 0,

"sum_other_doc_count": 0,

"buckets": [

{

"key": "dinner carrot potato broccoli",

"doc_count": 2,

"first_one": {

"hits": {

"total": 2,

"max_score": null,

"hits": [

{

"_index": "lastseen",

"_type": "test",

"_id": "AU60ZObtjKWeJgeyudI-",

"_score": null,

"_source": {

"field1": "dinner carrot potato broccoli",

"field2": "something here",

"lastseen": 1000

},

"sort": [

1000

]

}

]

}

}

},

{

"key": "fish chicken something",

"doc_count": 2,

"first_one": {

"hits": {

"total": 2,

"max_score": null,

"hits": [

{

"_index": "lastseen",

"_type": "test",

"_id": "AU60ZObtjKWeJgeyudJA",

"_score": null,

"_source": {

"field1": "fish chicken something",

"field2": "dinner",

"lastseen": 2000

},

"sort": [

2000

]

}

]

}

}

}

]

}

}

以上是 如何进行elasticsearch查询以过滤字段的最大值? 的全部内容, 来源链接: utcz.com/qa/436356.html

回到顶部