我正在尝试检索过去一年的文档,每个文档都放入1个月宽的存储桶中。我将为每个1个月的存储时段提取文档,然后进一步分析它们(超出我的问题范围)。从描述中看来,“桶聚合”似乎是可行的方法,但是在“桶”响应中,我仅获得每个桶中的文档计数,而不是原始文档本身。我想念什么?
GET命令
{ "aggs" : { "DateHistogram" : { "date_histogram" : { "field" : "timestamp", "interval": "month" } } }, "size" : 0 }
结果输出
{ "took" : 138, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1313058, "max_score" : 0.0, "hits" : [ ] }, "aggregations" : { "DateHistogram" : { "buckets" : [ { "key_as_string" : "2015-02-01T00:00:00.000Z", "key" : 1422748800000, "doc_count" : 270 }, { "key_as_string" : "2015-03-01T00:00:00.000Z", "key" : 1425168000000, "doc_count" : 459 }, (...and all the other months...) { "key_as_string" : "2016-03-01T00:00:00.000Z", "key" : 1456790400000, "doc_count" : 136009 } ] } } }
您快到了,您只需要添加一个top_hits子聚合即可为每个存储桶检索一些文档:
top_hits
POST /your_index/_search { "aggs" : { "DateHistogram" : { "date_histogram" : { "field" : "timestamp", "interval": "month" }, "aggs": { <--- add this "docs": { "top_hits": { "size": 10 } } } } }, "size" : 0 }