如何进行elasticsearch查询以过滤字段的最大值?
我希望能够查询文本,但也只能检索数据中某个整数字段的最大值的结果。我已经阅读了有关聚合和过滤器的文档,但我不太清楚自己在寻找什么。
例如,我有一些重复的数据得到索引,除了整数字段外,这些数据都是相同的-我们称这个字段为lastseen
。
因此,作为示例,给定将这些数据放入elasticsearch中:
// these two the same except "lastseen" field curl -XPOST localhost:9200/myindex/myobject -d '{
"field1": "dinner carrot potato broccoli",
"field2": "something here",
"lastseen": 1000
}'
curl -XPOST localhost:9200/myindex/myobject -d '{
"field1": "dinner carrot potato broccoli",
"field2": "something here",
"somevalue": 100
}'
# and these two the same except "lastseen" field
curl -XPOST localhost:9200/myindex/myobject -d '{
"field1": "fish chicken something",
"field2": "dinner",
"lastseen": 2000
}'
curl -XPOST localhost:9200/myindex/myobject -d '{
"field1": "fish chicken something",
"field2": "dinner",
"lastseen": 200
}'
如果我查询 "dinner"
curl -XPOST localhost:9200/myindex -d '{ "query": {
"query_string": {
"query": "dinner"
}
}
}'
我会得到4个结果。我想要一个过滤器,这样我只能得到两个结果-仅包含具有最大lastseen
字段的项目。
这 是 ,但希望它能使您对我的追求有一个了解:
{ "query": {
"query_string": {
"query": "dinner"
}
},
"filter": {
"max": "lastseen"
}
}
结果如下所示:
"hits": [ {
...
"_source": {
"field1": "dinner carrot potato broccoli",
"field2": "something here",
"lastseen": 1000
}
},
{
...
"_source": {
"field1": "fish chicken something",
"field2": "dinner",
"lastseen": 2000
}
}
]
我尝试创建一个不lastseen
包含在索引中的映射。这没有用。仍会取回所有4个结果。
curl -XPOST localhost:9200/myindex -d '{ "mappings": {
"myobject": {
"properties": {
"lastseen": {
"type": "long",
"store": "yes",
"include_in_all": false
}
}
}
}
}'
我尝试使用此处列出的agg方案进行重复数据删除,但该方法不起作用,但更重要的是,我没有找到将其与关键字搜索结合的方法。
回答:
不理想,但是我认为它可以满足您的需求。
field1
假设您是用来定义“重复”文档的字段,请更改字段的映射,如下所示:
PUT /lastseen{
"mappings": {
"test": {
"properties": {
"field1": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
},
"field2": {
"type": "string"
},
"lastseen": {
"type": "long"
}
}
}
}
}
意思是,您添加了一个.raw
子字段,not_analyzed
这意味着将按原样对它进行索引,而无需进行分析并将其分解为术语。这是为了使有些“重复的文档发现”成为可能。
然后,您需要在上使用terms
聚合field1.raw
(用于重复项)和top_hits
子聚合,以获取每个field1
值的单个文档:
GET /lastseen/test/_search{
"size": 0,
"query": {
"query_string": {
"query": "dinner"
}
},
"aggs": {
"field1_unique": {
"terms": {
"field": "field1.raw",
"size": 2
},
"aggs": {
"first_one": {
"top_hits": {
"size": 1,
"sort": [{"lastseen": {"order":"desc"}}]
}
}
}
}
}
}
此外,传回的那个单一文件top_hits
是最高的lastseen
(可能使"sort": [{"lastseen":
{"order":"desc"}}])。
您将获得的结果是这些(在aggregations
not 之下hits
):
... "aggregations": {
"field1_unique": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "dinner carrot potato broccoli",
"doc_count": 2,
"first_one": {
"hits": {
"total": 2,
"max_score": null,
"hits": [
{
"_index": "lastseen",
"_type": "test",
"_id": "AU60ZObtjKWeJgeyudI-",
"_score": null,
"_source": {
"field1": "dinner carrot potato broccoli",
"field2": "something here",
"lastseen": 1000
},
"sort": [
1000
]
}
]
}
}
},
{
"key": "fish chicken something",
"doc_count": 2,
"first_one": {
"hits": {
"total": 2,
"max_score": null,
"hits": [
{
"_index": "lastseen",
"_type": "test",
"_id": "AU60ZObtjKWeJgeyudJA",
"_score": null,
"_source": {
"field1": "fish chicken something",
"field2": "dinner",
"lastseen": 2000
},
"sort": [
2000
]
}
]
}
}
}
]
}
}
以上是 如何进行elasticsearch查询以过滤字段的最大值? 的全部内容, 来源链接: utcz.com/qa/436356.html