Elasticsearch:聚合,按字段计数

我将此数据插入了Elasticsearch

[

{ "name": "Cassandra Irwin", "location": "Monzon de Campos" .. },

{ "name": "Gayle Mooney", "location": "Villarroya del Campo" .. },

{ "name": "Angelita Charles", "location": "Revenga de Campos" .. },

{ "name": "Sheppard Sweet", "location": "Santiago del Campo" .. },

..

..

旁注:重现:

1)下载:http://wmo.co/20160928_es_query/bulk.json

2)执行:卷曲-s -XPOST ‘ 的http://本地主机:9200 /测试/外部/

_bulk漂亮 ‘ -数据二进制@

bulk.json

获取每个“位置”有多少记录的计数。

curl -s -XPOST 'localhost:9200/testing/_search?pretty' -d '

{

"aggs": { "location_count": { "terms": { "field":"location", "size":100 }}}

}' | jq '.aggregations'

结果:

{"location_count":{"doc_count_error_upper_bound":0,"sum_other_doc_count":0,

"buckets":[

{"key":"campo", "doc_count":47},

{"key":"del", "doc_count":47},

{"key":"campos", "doc_count":29},

{"key":"de", "doc_count":29},

{"key":"villarroya","doc_count":15},

{"key":"torre", "doc_count":12},

{"key":"monzon", "doc_count":11},

{"key":"santiago", "doc_count":11},

{"key":"pina", "doc_count":9},

{"key":"revenga", "doc_count":9},

{"key":"uleila", "doc_count":9}

]}}

问题 :它将“位置”字段拆分为单词,然后每个单词返回一个文档计数。

我可以使用此查询来做到这一点,提取所有位置并在jq(每个方便的JSON cli工具)中进行聚合,但是当应用于大量数据时,这可能会成为性能噩梦:

curl -s -XPOST 'localhost:9200/testing/_search?pretty' -d '

{

"query": { "wildcard": { "location": "*" } }, "size":1000,

"_source": ["location"]

}' | jq '[.hits.hits[] |

{location:._source.location,"count":1}] |

group_by(.location) |

map({ key: .[0].location, value: map(.count)|add })'

结果:

[

{ "key": "Monzon de Campos", "value": 11 },

{ "key": "Pina de Campos", "value": 9 },

{ "key": "Revenga de Campos", "value": 9 },

{ "key": "Santiago del Campo", "value": 11 },

{ "key": "Torre del Campo", "value": 12 },

{ "key": "Uleila del Campo", "value": 9 },

{ "key": "Villarroya del Campo", "value": 15 }

]

这是我想要的确切结果。

(即通过elasticsearch而不是jq处理聚合)

回答:

您需要在not_analyzed您的location字段中添加一个子字段。

首先像这样修改您的映射:

curl -XPOST 'http://localhost:9200/testing/_mapping/external' -d '{

"properties": {

"location": {

"type": "string",

"fields": {

"raw": {

"type": "string",

"index": "not_analyzed"

}

}

}

}

}'

然后再次为您的数据重新编制索引:

curl -s -XPOST 'http://localhost:9200/testing/external/_bulk?pretty' --data-binary @bulk.json

最后,您将能够像这样(在location.raw字段上)运行查询并获得您期望的结果:

curl -s -XPOST 'localhost:9200/testing/_search?pretty' -d '

{

"aggs": { "location_count": { "terms": { "field":"location.raw", "size":100 }}}

}' | jq '.aggregations'

以上是 Elasticsearch:聚合,按字段计数 的全部内容, 来源链接: utcz.com/qa/434105.html

回到顶部