ElasticSearch 5.x对Suggider API(文档)进行了一些(重大)更改。最值得注意的变化如下:
完成建议器面向文档 建议知道它们所属的文档。现在,关联文档(_source)作为完成建议的一部分返回。
完成建议器面向文档
建议知道它们所属的文档。现在,关联文档(_source)作为完成建议的一部分返回。
_source
简而言之,所有完成查询都返回所有匹配的 文档, 而不是匹配的 word 。这里存在一个问题-如果在多个文档中出现自动完成的单词,则重复它们。
假设我们有一个简单的映射:
{ "my-index": { "mappings": { "users": { "properties": { "firstName": { "type": "text" }, "lastName": { "type": "text" }, "suggest": { "type": "completion", "analyzer": "simple" } } } } } }
带有一些测试文件:
{ "_index": "my-index", "_type": "users", "_id": "1", "_source": { "firstName": "John", "lastName": "Doe", "suggest": [ { "input": [ "John", "Doe" ] } ] } }, { "_index": "my-index", "_type": "users", "_id": "2", "_source": { "firstName": "John", "lastName": "Smith", "suggest": [ { "input": [ "John", "Smith" ] } ] } }
和按书查询:
POST /my-index/_suggest?pretty { "my-suggest" : { "text" : "joh", "completion" : { "field" : "suggest" } } }
结果如下:
{ "_shards": { "total": 5, "successful": 5, "failed": 0 }, "my-suggest": [ { "text": "joh", "offset": 0, "length": 3, "options": [ { "text": "John", "_index": "my-index", "_type": "users", "_id": "1", "_score": 1, "_source": { "firstName": "John", "lastName": "Doe", "suggest": [ { "input": [ "John", "Doe" ] } ] } }, { "text": "John", "_index": "my-index", "_type": "users", "_id": "2", "_score": 1, "_source": { "firstName": "John", "lastName": "Smith", "suggest": [ { "input": [ "John", "Smith" ] } ] } } ] } ] }
简而言之,对于文本“ joh”的完成建议,返回了两(2)个 文档 -约翰的 文档 和都具有相同text属性值的 文档 。
text
但是,我想收到一(1)个 字 。像这样简单的东西:
{ "_shards": { "total": 5, "successful": 5, "failed": 0 }, "my-suggest": [ { "text": "joh", "offset": 0, "length": 3, "options": [ "John" ] } ] }
问题 :如何实现基于单词的完成提示。无需返回任何与文档相关的数据,因为在这一点上我不需要它。
“完成建议”是否还适合我的情况?还是应该使用完全不同的方法?
编辑 :正如你们中许多人指出的那样,附加的仅完成索引将是一个可行的解决方案。但是,我看到这种方法有多个问题:
"John", "Doe", "David", "Smith"
"John D"
"Doe"
"Doe", "David"
要克服第二点,仅索引单个单词是不够的,因为您还需要将所有单词映射到文档,以便适当缩小自动完成的后续单词的范围。这样,您实际上遇到了与查询原始索引相同的问题。因此,附加索引不再有意义。
如评论中所暗示的,在不获取重复文档的情况下实现此目的的另一种方法是为firstname包含ngram 个字段的字段创建一个子字段。首先,您要像这样定义映射:
firstname
PUT my-index { "settings": { "analysis": { "analyzer": { "completion_analyzer": { "type": "custom", "filter": [ "lowercase", "completion_filter" ], "tokenizer": "keyword" } }, "filter": { "completion_filter": { "type": "edge_ngram", "min_gram": 1, "max_gram": 24 } } } }, "mappings": { "users": { "properties": { "autocomplete": { "type": "text", "fields": { "raw": { "type": "keyword" }, "completion": { "type": "text", "analyzer": "completion_analyzer", "search_analyzer": "standard" } } }, "firstName": { "type": "text" }, "lastName": { "type": "text" } } } } }
然后您索引一些文档:
POST my-index/users/_bulk {"index":{}} { "firstName": "John", "lastName": "Doe", "autocomplete": "John Doe"} {"index":{}} { "firstName": "John", "lastName": "Deere", "autocomplete": "John Deere" } {"index":{}} { "firstName": "Johnny", "lastName": "Cash", "autocomplete": "Johnny Cash" }
然后您可以查询joh并获得一个结果,John而另一个获得Johnny
joh
John
Johnny
{ "size": 0, "query": { "term": { "autocomplete.completion": "john d" } }, "aggs": { "suggestions": { "terms": { "field": "autocomplete.raw" } } } }
结果:
{ "aggregations": { "suggestions": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "John Doe", "doc_count": 1 }, { "key": "John Deere", "doc_count": 1 } ] } } }
更新(2019年6月25日):
ES 7.2引入了一种称为的新数据类型search_as_you_type,该数据类型本身就允许这种行为。有关更多信息,请访问:https : //www.elastic.co/guide/en/elasticsearch/reference/7.2/search-as-you- type.html
search_as_you_type