我有以下json
[ {"firstname": "john", "lastname": "doe"}, {"firstname": "john", "lastname": "smith"}, {"firstname": "jane", "lastname": "smith"}, {"firstname": "jane", "lastname": "doe"}, {"firstname": "joe", "lastname": "smith"}, {"firstname": "joe", "lastname": "doe"}, {"firstname": "steve", "lastname": "smith"}, {"firstname": "jack", "lastname": "doe"} ]
我想计算重复的名字
重复计数3
不可重复的名字计数
非重复计数2
我试图计算存储桶的数量,但似乎计算所有存储桶是重复的还是非重复的
GET mynames/_search { "aggs" : { "name_count" : { "terms" : { "field" : "firstname.keyword", "min_doc_count": 2 } }, "count":{ "cardinality": { "field": "firstname.keyword" } } }
好吧,我在这里利用了几种聚合。以下是我使用过的列表。列表的顺序是聚合的执行顺序。
对于重复
对于非重复
POST <your_index_name>/_search { "size":0, "aggs":{ "duplicate_aggs":{ "terms":{ "field":"firstname.keyword", "min_doc_count":2 } }, "duplicate_bucketcount":{ "stats_bucket":{ "buckets_path":"duplicate_aggs._count" } }, "nonduplicate_aggs":{ "terms":{ "field":"firstname.keyword" }, "aggs":{ "equal_one":{ "bucket_selector":{ "buckets_path":{ "count":"_count" }, "script":"params.count == 1" } } } }, "nonduplicate_bucketcount":{ "sum_bucket":{ "buckets_path":"nonduplicate_aggs._count" } } } }
{ "took": 10, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 8, "max_score": 0, "hits": [] }, "aggregations": { "duplicate_aggs": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "jane", "doc_count": 2 }, { "key": "joe", "doc_count": 2 }, { "key": "john", "doc_count": 2 } ] }, "nonduplicate_aggs": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "jack", "doc_count": 1 }, { "key": "steve", "doc_count": 1 } ] }, "duplicate_bucketcount": { "count": 3, "min": 2, "max": 2, "avg": 2, "sum": 6 }, "nonduplicate_bucketcount": { "value": 2 } } }
注意,在上面的响应中,我们有一个duplicate_bucketcount.count键,其值3是将显示存储桶计数的值,该值是重复的键的数量。
duplicate_bucketcount.count
3
让我知道是否有帮助!