我有一个临时索引,其中包含我需要审核的文档。我想按它们包含的单词对这些文档进行分组。
例如,我有以下文件:
1-“ aaa bbb ccc ddd eee fff”
2-“ bbb mmm aaa fff xxx”
3-“ hhh aaa fff”
因此,我想得到最受欢迎的单词,最好是计数:“ aaa”-3,“ fff”-3,“ bbb”-2,等等。
elasticsearch有可能吗?
进行简单的术语聚合搜索将满足您的需求:
(mydata您的字段名称在哪里)
mydata
curl -XGET 'http://localhost:9200/test/data/_search?search_type=count&pretty' -d '{ "query": { "match_all" : {} }, "aggs" : { "mydata_agg" : { "terms": {"field" : "mydata"} } } }'
将返回:
{ "took" : 3, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 3, "max_score" : 0.0, "hits" : [ ] }, "aggregations" : { "mydata_agg" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "aaa", "doc_count" : 3 }, { "key" : "fff", "doc_count" : 3 }, { "key" : "bbb", "doc_count" : 2 }, { "key" : "ccc", "doc_count" : 1 }, { "key" : "ddd", "doc_count" : 1 }, { "key" : "eee", "doc_count" : 1 }, { "key" : "hhh", "doc_count" : 1 }, { "key" : "mmm", "doc_count" : 1 }, { "key" : "xxx", "doc_count" : 1 } ] } } }