我有一组通过NLP算法从文本中提取的单词,以及每个文档中每个单词的相关分数。
例如 :
document 1: { "vocab": [ {"wtag":"James Bond", "rscore": 2.14 }, {"wtag":"world", "rscore": 0.86 }, ...., {"wtag":"somemore", "rscore": 3.15 } ] } document 2: { "vocab": [ {"wtag":"hiii", "rscore": 1.34 }, {"wtag":"world", "rscore": 0.94 }, ...., {"wtag":"somemore", "rscore": 3.23 } ] }
我希望每个文档中rscore的match wtag都可以影响_scoreES给它的给定值,或者乘以或加到上_score,以影响_score结果文档的最终(依次,顺序)。有什么办法可以做到这一点?
rscore
wtag
_score
解决此问题的另一种方法是使用嵌套文档:
首先设置映射以创建vocab一个嵌套文档,这意味着每个wtag/ rscore文档将在内部作为单独的文档建立索引:
vocab
curl -XPUT "http://localhost:9200/myindex/" -d' { "settings": {"number_of_shards": 1}, "mappings": { "mytype": { "properties": { "vocab": { "type": "nested", "fields": { "wtag": { "type": "string" }, "rscore": { "type": "float" } } } } } } }'
然后索引您的文档:
curl -XPUT "http://localhost:9200/myindex/mytype/1" -d' { "vocab": [ { "wtag": "James Bond", "rscore": 2.14 }, { "wtag": "world", "rscore": 0.86 }, { "wtag": "somemore", "rscore": 3.15 } ] }' curl -XPUT "http://localhost:9200/myindex/mytype/2" -d' { "vocab": [ { "wtag": "hiii", "rscore": 1.34 }, { "wtag": "world", "rscore": 0.94 }, { "wtag": "somemore", "rscore": 3.23 } ] }'
并运行nested查询以匹配所有嵌套文档,并rscore为每个与之匹配的嵌套文档求和:
nested
curl -XGET "http://localhost:9200/myindex/mytype/_search" -d' { "query": { "nested": { "path": "vocab", "score_mode": "sum", "query": { "function_score": { "query": { "match": { "vocab.wtag": "james bond world" } }, "script_score": { "script": "doc[\"rscore\"].value" } } } } } }'