我一直在尝试使用facet来获取字段的频率。我的查询仅返回一次匹配,因此我想让方面返回在特定字段中出现频率最高的字词。
我的映射:
{ "mappings":{ "document":{ "properties":{ "tags":{ "type":"object", "properties":{ "title":{ "fields":{ "partial":{ "search_analyzer":"main", "index_analyzer":"partial", "type":"string", "index" : "analyzed" } "title":{ "type":"string", "analyzer":"main", "index" : "analyzed" } }, "type":"multi_field" } } } } } }, "settings":{ "analysis":{ "filter":{ "name_ngrams":{ "side":"front", "max_gram":50, "min_gram":2, "type":"edgeNGram" } }, "analyzer":{ "main":{ "filter": ["standard", "lowercase", "asciifolding"], "type": "custom", "tokenizer": "standard" }, "partial":{ "filter":["standard","lowercase","asciifolding","name_ngrams"], "type": "custom", "tokenizer": "standard" } } } } }
测试数据:
curl -XPUT localhost:9200/testindex/document -d '{"tags": {"title": "people also kill people"}}'
查询:
curl -XGET 'localhost:9200/testindex/document/_search?pretty=1' -d ' { "query": { "term": { "tags.title": "people" } }, "facets": { "popular_tags": { "terms": {"field": "tags.title"}} } }'
这个结果
"hits" : { "total" : 1, "max_score" : 0.99381393, "hits" : [ { "_index" : "testindex", "_type" : "document", "_id" : "uI5k0wggR9KAvG9o7S7L2g", "_score" : 0.99381393, "_source" : {"tags": {"title": "people also kill people"}} } ] }, "facets" : { "popular_tags" : { "_type" : "terms", "missing" : 0, "total" : 3, "other" : 0, "terms" : [ { "term" : "people", "count" : 1 // I expect this to be 2 }, { "term" : "kill", "count" : 1 }, { "term" : "also", "count" : 1 } ] }
}
以上结果不是我想要的。我想让频率计数为2
"hits" : { "total" : 1, "max_score" : 0.99381393, "hits" : [ { "_index" : "testindex", "_type" : "document", "_id" : "uI5k0wggR9KAvG9o7S7L2g", "_score" : 0.99381393, "_source" : {"tags": {"title": "people also kill people"}} } ] }, "facets" : { "popular_tags" : { "_type" : "terms", "missing" : 0, "total" : 3, "other" : 0, "terms" : [ { "term" : "people", "count" : 2 }, { "term" : "kill", "count" : 1 }, { "term" : "also", "count" : 1 } ] } }
我该如何实现?方面走错了路吗?
构面会计算文档,而不是文档中的术语。您得到1是因为只有一个文档包含该术语,所以发生多少次都没有关系。我不知道使用开箱即用的方式来返回术语频率,该方面不是一个好的选择。 如果启用术语向量,则该信息可以存储在索引中,但是目前尚无法从elasticsearch读取术语向量。