我试图弄清楚More like this query(ES 2.X)的工作原理。我用术语向量创建了以下索引。
PUT /test_index { "settings": { "number_of_shards": 1, "number_of_replicas": 0 }, "mappings": { "doc": { "properties": { "text": { "type": "string", "term_vector": "yes" } } } } } PUT /test_index/doc/1 { "text": ["Hello","World"] } PUT /test_index/doc/2 { "text": ["This","is","me"] } PUT /test_index/doc/3 { "text": ["Hello","World"] } PUT /test_index/doc/4 { "text": ["Hello","World","World"] }
为什么以下查询没有返回结果?在第二个查询中,我希望至少检索doc 3,其值与doc 1相同。
POST /test_index/doc/_search { "query": { "more_like_this": { "like": "Hello", "min_term_freq": 1 } } } POST /test_index/doc/_search { "query": { "more_like_this": { "fields": [ "text" ], "like": [ { "_index": "test_index", "_type": "doc", "_id": "1" } ] } } }
默认情况下min_doc_freq为5,因此您的查询不起作用,因为您的索引中至少包含5个term属性为黄色的文档。因此,min_doc_freq在查询中设置为1,它应该可以工作。
min_doc_freq
term
{ "query": { "more_like_this": { "like": "Hello", "min_term_freq": 1, "min_doc_freq": 1 } } }