我有一个Elasticsearch索引,其中有一些数据。我实现了该did-you- mean功能,所以当用户写错拼写的东西时,它可能会收到带有正确单词的建议。
did-you- mean
之所以使用短语建议者,是因为我需要一些简短的短语(例如名称)的建议,问题是索引中不存在某些建议。
例:
document in the index: coding like a master search: Codning like a boss suggestion: <em>coding</em> like a boss search result: not found
我的问题是,索引中没有与指定建议匹配的词组,因此它向我推荐了不存在的词组,因此会给我一个未找到的搜索词。
我该怎么办?短语建议者是否应该为索引中实际存在的短语提供建议?
在这里,我将保留相应的查询,映射和设置,以防万一。
设置和映射
{ "settings": { "index": { "number_of_shards": 3, "number_of_replicas": 1, "search.slowlog.threshold.fetch.warn": "2s", "index.analysis.analyzer.default.filter.0": "standard", "index.analysis.analyzer.default.tokenizer": "standard", "index.analysis.analyzer.default.filter.1": "lowercase", "index.analysis.analyzer.default.filter.2": "asciifolding", "index.priority": 3, "analysis": { "analyzer": { "suggests_analyzer": { "tokenizer": "lowercase", "filter": [ "lowercase", "asciifolding", "shingle_filter" ], "type": "custom" } }, "filter": { "shingle_filter": { "min_shingle_size": 2, "max_shingle_size": 3, "type": "shingle" } } } } }, "mappings": { "my_type": { "properties": { "suggest_field": { "analyzer": "suggests_analyzer", "type": "string" } } } } }
询问
{ "DidYouMean": { "text": "Codning like a boss", "phrase": { "field": "suggest_field", "size": 1, "gram_size": 1, "confidence": 2.0 } } }
谢谢你的帮助。
这实际上是预期的。如果您使用analytics api分析文档,则可以更好地了解正在发生的事情。
GET suggest_index/_analyze?text=coding like a master&analyzer=suggests_analyzer
这是输出
{ "tokens": [ { "token": "coding", "start_offset": 0, "end_offset": 6, "type": "word", "position": 1 }, { "token": "coding like", "start_offset": 0, "end_offset": 11, "type": "shingle", "position": 1 }, { "token": "coding like a", "start_offset": 0, "end_offset": 13, "type": "shingle", "position": 1 }, { "token": "like", "start_offset": 7, "end_offset": 11, "type": "word", "position": 2 }, { "token": "like a", "start_offset": 7, "end_offset": 13, "type": "shingle", "position": 2 }, { "token": "like a master", "start_offset": 7, "end_offset": 20, "type": "shingle", "position": 2 }, { "token": "a", "start_offset": 12, "end_offset": 13, "type": "word", "position": 3 }, { "token": "a master", "start_offset": 12, "end_offset": 20, "type": "shingle", "position": 3 }, { "token": "master", "start_offset": 14, "end_offset": 20, "type": "word", "position": 4 } ] }
如您所见,为文本生成了一个令牌“编码”,因此它在您的索引中。这 并不是在 建议您不要在索引中。如果您严格地想要短语搜索,那么您可能要考虑使用关键字标记器。例如,如果您将映射更改为类似
{ "settings": { "index": { "analysis": { "analyzer": { "suggests_analyzer": { "tokenizer": "lowercase", "filter": [ "lowercase", "asciifolding", "shingle_filter" ], "type": "custom" }, "raw_analyzer": { "tokenizer": "keyword", "filter": [ "lowercase", "asciifolding" ] } }, "filter": { "shingle_filter": { "min_shingle_size": 2, "max_shingle_size": 3, "type": "shingle" } } } } }, "mappings": { "my_type": { "properties": { "suggest_field": { "analyzer": "suggests_analyzer", "type": "string", "fields": { "raw": { "analyzer": "raw_analyzer", "type": "string" } } } } } } }
那么此查询将为您提供预期的结果
{ "DidYouMean": { "text": "codning lke a master", "phrase": { "field": "suggest_field.raw", "size": 1, "gram_size": 1 } } }
它不会显示 “像老板一样编码”的 任何内容。
编辑1
2)从您的评论以及在我自己的数据集上运行一些短语建议中,我觉得更好的方法是使用collate选项phrase suggester提供,以便我们可以针对a检查每个建议query并仅在返回时才给出建议索引中的任何文档。我还添加stemmers了映射以仅考虑词根。我正在使用,light_english因为它的攻击性较小。关于更多。
collate
phrase suggester
query
stemmers
light_english
映射器的分析器部分现在看起来像这样
"analysis": { "analyzer": { "suggests_analyzer": { "tokenizer": "standard", "filter": [ "lowercase", "english_possessive_stemmer", "light_english_stemmer", "asciifolding", "shingle_filter" ], "type": "custom" } }, "filter": { "light_english_stemmer": { "type": "stemmer", "language": "light_english" }, "english_possessive_stemmer": { "type": "stemmer", "language": "possessive_english" }, "shingle_filter": { "min_shingle_size": 2, "max_shingle_size": 4, "type": "shingle" } } }
现在,此查询将为您提供所需的结果。
{ "suggest" : { "text" : "appel on the tabel", "simple_phrase" : { "phrase" : { "field" : "suggest_field", "size" : 5, "collate": { "query": { "inline" : { "match_phrase": { "{{field_name}}" : "{{suggestion}}" } } }, "params": {"field_name" : "suggest_field"}, "prune": false } } } }, "size": 0 }
这将使您回到 桌上的苹果。 这里使用match_phrase查询,它将对索引运行每个建议的短语。"prune" : true无论匹配如何,您都可以查看并建议所有结果。您可能要考虑使用stop过滤器来避免停用词。
match_phrase
"prune" : true
stop
希望这可以帮助!!