Elasticsearch-计算重复值和唯一值

我有以下json

[

{"firstname": "john", "lastname": "doe"},

{"firstname": "john", "lastname": "smith"},

{"firstname": "jane", "lastname": "smith"},

{"firstname": "jane", "lastname": "doe"},

{"firstname": "joe", "lastname": "smith"},

{"firstname": "joe", "lastname": "doe"},

{"firstname": "steve", "lastname": "smith"},

{"firstname": "jack", "lastname": "doe"}

]

我想计算重复的名字

重复计数3

不可重复的名字计数

非重复计数2

我试图计算存储桶的数量,但似乎计算所有存储桶是重复的还是非重复的

GET mynames/_search

{

"aggs" : {

"name_count" : {

"terms" : {

"field" : "firstname.keyword",

"min_doc_count": 2

}

},

"count":{

"cardinality": {

"field": "firstname.keyword"

}

}

}

回答:

好吧,我在这里利用了几种聚合。以下是我使用过的列表。列表的顺序是聚合的执行顺序。

  • 术语汇总
  • 统计数据桶汇总

  • 术语汇总

    • 桶选择器 (作为子集合)

  • 总和桶选择器

回答:

POST <your_index_name>/_search

{

"size":0,

"aggs":{

"duplicate_aggs":{

"terms":{

"field":"firstname.keyword",

"min_doc_count":2

}

},

"duplicate_bucketcount":{

"stats_bucket":{

"buckets_path":"duplicate_aggs._count"

}

},

"nonduplicate_aggs":{

"terms":{

"field":"firstname.keyword"

},

"aggs":{

"equal_one":{

"bucket_selector":{

"buckets_path":{

"count":"_count"

},

"script":"params.count == 1"

}

}

}

},

"nonduplicate_bucketcount":{

"sum_bucket":{

"buckets_path":"nonduplicate_aggs._count"

}

}

}

}

回答:

{

"took": 10,

"timed_out": false,

"_shards": {

"total": 5,

"successful": 5,

"skipped": 0,

"failed": 0

},

"hits": {

"total": 8,

"max_score": 0,

"hits": []

},

"aggregations": {

"duplicate_aggs": {

"doc_count_error_upper_bound": 0,

"sum_other_doc_count": 0,

"buckets": [

{

"key": "jane",

"doc_count": 2

},

{

"key": "joe",

"doc_count": 2

},

{

"key": "john",

"doc_count": 2

}

]

},

"nonduplicate_aggs": {

"doc_count_error_upper_bound": 0,

"sum_other_doc_count": 0,

"buckets": [

{

"key": "jack",

"doc_count": 1

},

{

"key": "steve",

"doc_count": 1

}

]

},

"duplicate_bucketcount": {

"count": 3,

"min": 2,

"max": 2,

"avg": 2,

"sum": 6

},

"nonduplicate_bucketcount": {

"value": 2

}

}

}

注意,在上面的响应中,我们有一个duplicate_bucketcount.count键,其值3是将显示存储桶计数的值,该值是重复的键的数量。

让我知道是否有帮助!

以上是 Elasticsearch-计算重复值和唯一值 的全部内容, 来源链接: utcz.com/qa/431219.html

回到顶部