在Elasticsearch上查询每种类型的最新文档

我正在尝试在Elasticsearch上运行看起来像一个简单查询的内容,但似乎无法获得想要的结果。

这是我要做的简短示例:

我有一个新闻数据库。每条新闻都包含一个来源,一个标题,一个时间戳和一个用户。

我想要获得给定用户的每个可用来源的最新标题(基于时间戳)。

#!/bin/bash

export ELASTICSEARCH_ENDPOINT="http://localhost:9200"

# Create indexes

curl -XPUT "$ELASTICSEARCH_ENDPOINT/news" -d '{

"mappings": {

"news": {

"properties": {

"source": { "type": "string", "index": "not_analyzed" },

"headline": { "type": "object" },

"timestamp": { "type": "date", "format": "date_hour_minute_second_millis" },

"user": { "type": "string", "index": "not_analyzed" }

}

}

}

}'

# Index documents

curl -XPOST "$ELASTICSEARCH_ENDPOINT/_bulk?refresh=true" -d '

{"index":{"_index":"news","_type":"news"}}

{"user": "John", "source": "CNN", "headline": "Great news", "timestamp": "2015-07-28T00:07:29.000"}

{"index":{"_index":"news","_type":"news"}}

{"user": "John", "source": "CNN", "headline": "More great news", "timestamp": "2015-07-28T00:08:23.000"}

{"index":{"_index":"news","_type":"news"}}

{"user": "John", "source": "ESPN", "headline": "Sports news", "timestamp": "2015-07-28T00:09:32.000"}

{"index":{"_index":"news","_type":"news"}}

{"user": "John", "source": "ESPN", "headline": "More sports news", "timestamp": "2015-07-28T00:10:35.000"}

{"index":{"_index":"news","_type":"news"}}

{"user": "Mary", "source": "Yahoo", "headline": "More news", "timestamp": "2015-07-28T00:11:54.000"}

{"index":{"_index":"news","_type":"news"}}

{"user": "Mary", "source": "Yahoo", "headline": "Crazy news", "timestamp": "2015-07-28T00:12:31.000"}

'

那么,例如,如何从约翰那里获得最后的CNN和最后的ESPN头条新闻?

我一直在研究多重搜索API,但这意味着我需要事先了解所有资源(在本例中为CNN和ESPN)。

回答:

首先,请注意,我必须将您对该headline字段的映射更改为string,因为在示例文档中,标题为,string而不是object

因此,类似以下查询的查询将检索您期望的结果:

curl -XPOST "$ELASTICSEARCH_ENDPOINT/news/_search" -d '{

"size": 0,

"query": {

"filtered": {

"filter": {

"term": {

"user": "John" <--- filter for user=John

}

}

}

},

"aggs": {

"sources": {

"terms": {

"field": "source" <--- aggregate by source

},

"aggs": {

"latest": {

"top_hits": {

"size": 1, <--- only take the first...

"_source": [ <--- only the date and headline

"headline",

"timestamp"

],

"sort": {

"timestamp": "desc" <--- ...and only the latest hit

}

}

}

}

}

}

}'

这将产生如下内容:

{

...

"aggregations" : {

"sources" : {

"doc_count_error_upper_bound" : 0,

"sum_other_doc_count" : 0,

"buckets" : [ {

"key" : "CNN",

"doc_count" : 2,

"latest" : {

"hits" : {

"total" : 2,

"max_score" : null,

"hits" : [ {

"_index" : "news",

"_type" : "news",

"_id" : "AU7Sh3VDGDddn2ZNuDVl",

"_score" : null,

"_source":{

"headline": "More great news",

"timestamp": "2015-07-28T00:08:23.000"

},

"sort" : [ 1438042103000 ]

} ]

}

}

}, {

"key" : "ESPN",

"doc_count" : 2,

"latest" : {

"hits" : {

"total" : 2,

"max_score" : null,

"hits" : [ {

"_index" : "news",

"_type" : "news",

"_id" : "AU7Sh3VDGDddn2ZNuDVn",

"_score" : null,

"_source":{

"headline": "More sports news",

"timestamp": "2015-07-28T00:10:35.000"

},

"sort" : [ 1438042235000 ]

} ]

}

}

} ]

}

}

}

以上是 在Elasticsearch上查询每种类型的最新文档 的全部内容, 来源链接: utcz.com/qa/399394.html

回到顶部