我正在发送此请求
curl -XGET 'host/process_test_3/14/_search' -d '{ "query" : { "query_string" : { "query" : "\"*cor interface*\"", "fields" : ["title", "obj_id"] } } }'
我得到正确的结果
{ "took": 12, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 3, "max_score": 5.421598, "hits": [ { "_index": "process_test_3", "_type": "14", "_id": "141_dashboard_14", "_score": 5.421598, "_source": { "obj_type": "dashboard", "obj_id": "141", "title": "Cor Interface Monitoring" } } ] } }
但是当我想按单词部分搜索时,例如
curl -XGET 'host/process_test_3/14/_search' -d ' { "query" : { "query_string" : { "query" : "\"*cor inter*\"", "fields" : ["title", "obj_id"] } } }'
我没有得到任何结果:
{ "took" : 4, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 0, "max_score" : null, "hits" : [] } }
我究竟做错了什么?
这是因为您的title字段可能已由标准分析器(默认设置)进行了分析,并且标题Cor Interface Monitoring已被标记为三个标记cor,interface并且monitoring。
title
Cor Interface Monitoring
cor
interface
monitoring
为了搜索单词的任何子串,您需要创建一个自定义分析器,该分析器利用ngram令牌过滤器来索引每个令牌的所有子串。
您可以这样创建索引:
curl -XPUT localhost:9200/process_test_3 -d '{ "settings": { "analysis": { "analyzer": { "substring_analyzer": { "tokenizer": "standard", "filter": ["lowercase", "substring"] } }, "filter": { "substring": { "type": "nGram", "min_gram": 2, "max_gram": 15 } } } }, "mappings": { "14": { "properties": { "title": { "type": "string", "analyzer": "substring_analyzer" } } } } }'
然后,您可以重新索引数据。这将使标题Cor Interface Monitoring现在标记为:
co
or
in
int
inte
inter
interf
mo
mon
moni
让你的第二个搜索查询现在将返回你所期望,因为令牌的文件cor和inter现在相匹配。