我正在尝试在Elasticsearch上运行看起来像一个简单查询的内容,但似乎无法获得想要的结果。
这是我要做的简短示例:
我有一个新闻数据库。每条新闻都包含一个来源,一个标题,一个时间戳和一个用户。
我想要获得给定用户的每个可用来源的最新标题(基于时间戳)。
#!/bin/bash export ELASTICSEARCH_ENDPOINT="http://localhost:9200" # Create indexes curl -XPUT "$ELASTICSEARCH_ENDPOINT/news" -d '{ "mappings": { "news": { "properties": { "source": { "type": "string", "index": "not_analyzed" }, "headline": { "type": "object" }, "timestamp": { "type": "date", "format": "date_hour_minute_second_millis" }, "user": { "type": "string", "index": "not_analyzed" } } } } }' # Index documents curl -XPOST "$ELASTICSEARCH_ENDPOINT/_bulk?refresh=true" -d ' {"index":{"_index":"news","_type":"news"}} {"user": "John", "source": "CNN", "headline": "Great news", "timestamp": "2015-07-28T00:07:29.000"} {"index":{"_index":"news","_type":"news"}} {"user": "John", "source": "CNN", "headline": "More great news", "timestamp": "2015-07-28T00:08:23.000"} {"index":{"_index":"news","_type":"news"}} {"user": "John", "source": "ESPN", "headline": "Sports news", "timestamp": "2015-07-28T00:09:32.000"} {"index":{"_index":"news","_type":"news"}} {"user": "John", "source": "ESPN", "headline": "More sports news", "timestamp": "2015-07-28T00:10:35.000"} {"index":{"_index":"news","_type":"news"}} {"user": "Mary", "source": "Yahoo", "headline": "More news", "timestamp": "2015-07-28T00:11:54.000"} {"index":{"_index":"news","_type":"news"}} {"user": "Mary", "source": "Yahoo", "headline": "Crazy news", "timestamp": "2015-07-28T00:12:31.000"} '
那么,例如,如何从约翰那里获得最后的CNN和最后的ESPN头条新闻?
我一直在研究多重搜索API,但这意味着我需要事先了解所有资源(在本例中为CNN和ESPN)。
首先,请注意,我必须将您对该headline字段的映射更改为string,因为在示例文档中,标题为,string而不是object。
headline
string
object
因此,类似以下查询的查询将检索您期望的结果:
curl -XPOST "$ELASTICSEARCH_ENDPOINT/news/_search" -d '{ "size": 0, "query": { "filtered": { "filter": { "term": { "user": "John" <--- filter for user=John } } } }, "aggs": { "sources": { "terms": { "field": "source" <--- aggregate by source }, "aggs": { "latest": { "top_hits": { "size": 1, <--- only take the first... "_source": [ <--- only the date and headline "headline", "timestamp" ], "sort": { "timestamp": "desc" <--- ...and only the latest hit } } } } } } }'
这将产生如下内容:
{ ... "aggregations" : { "sources" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "CNN", "doc_count" : 2, "latest" : { "hits" : { "total" : 2, "max_score" : null, "hits" : [ { "_index" : "news", "_type" : "news", "_id" : "AU7Sh3VDGDddn2ZNuDVl", "_score" : null, "_source":{ "headline": "More great news", "timestamp": "2015-07-28T00:08:23.000" }, "sort" : [ 1438042103000 ] } ] } } }, { "key" : "ESPN", "doc_count" : 2, "latest" : { "hits" : { "total" : 2, "max_score" : null, "hits" : [ { "_index" : "news", "_type" : "news", "_id" : "AU7Sh3VDGDddn2ZNuDVn", "_score" : null, "_source":{ "headline": "More sports news", "timestamp": "2015-07-28T00:10:35.000" }, "sort" : [ 1438042235000 ] } ] } } } ] } } }