Elasticsearch-计算重复值和唯一值
我有以下json
[ {"firstname": "john", "lastname": "doe"},
{"firstname": "john", "lastname": "smith"},
{"firstname": "jane", "lastname": "smith"},
{"firstname": "jane", "lastname": "doe"},
{"firstname": "joe", "lastname": "smith"},
{"firstname": "joe", "lastname": "doe"},
{"firstname": "steve", "lastname": "smith"},
{"firstname": "jack", "lastname": "doe"}
]
我想计算重复的名字
重复计数3
不可重复的名字计数
非重复计数2
我试图计算存储桶的数量,但似乎计算所有存储桶是重复的还是非重复的
GET mynames/_search{
"aggs" : {
"name_count" : {
"terms" : {
"field" : "firstname.keyword",
"min_doc_count": 2
}
},
"count":{
"cardinality": {
"field": "firstname.keyword"
}
}
}
回答:
好吧,我在这里利用了几种聚合。以下是我使用过的列表。列表的顺序是聚合的执行顺序。
- 术语汇总
- 统计数据桶汇总
- 术语汇总
- 桶选择器 (作为子集合)
- 总和桶选择器
回答:
POST <your_index_name>/_search{
"size":0,
"aggs":{
"duplicate_aggs":{
"terms":{
"field":"firstname.keyword",
"min_doc_count":2
}
},
"duplicate_bucketcount":{
"stats_bucket":{
"buckets_path":"duplicate_aggs._count"
}
},
"nonduplicate_aggs":{
"terms":{
"field":"firstname.keyword"
},
"aggs":{
"equal_one":{
"bucket_selector":{
"buckets_path":{
"count":"_count"
},
"script":"params.count == 1"
}
}
}
},
"nonduplicate_bucketcount":{
"sum_bucket":{
"buckets_path":"nonduplicate_aggs._count"
}
}
}
}
回答:
{ "took": 10,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 8,
"max_score": 0,
"hits": []
},
"aggregations": {
"duplicate_aggs": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "jane",
"doc_count": 2
},
{
"key": "joe",
"doc_count": 2
},
{
"key": "john",
"doc_count": 2
}
]
},
"nonduplicate_aggs": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "jack",
"doc_count": 1
},
{
"key": "steve",
"doc_count": 1
}
]
},
"duplicate_bucketcount": {
"count": 3,
"min": 2,
"max": 2,
"avg": 2,
"sum": 6
},
"nonduplicate_bucketcount": {
"value": 2
}
}
}
注意,在上面的响应中,我们有一个duplicate_bucketcount.count
键,其值3
是将显示存储桶计数的值,该值是重复的键的数量。
让我知道是否有帮助!
以上是 Elasticsearch-计算重复值和唯一值 的全部内容, 来源链接: utcz.com/qa/431219.html