对于以下情况,我在Google或ES中都找不到完美的解决方案,希望有人可以在此提供帮助。
假设在“电子邮件”字段下存储了五个电子邮件地址:
1. {"email": "john.doe@gmail.com"} 2. {"email": "john.doe@gmail.com, john.doe@outlook.com"} 3. {"email": "hello-john.doe@outlook.com"} 4. {"email": "john.doe@outlook.com} 5. {"email": "john@yahoo.com"}
我要满足以下搜索方案:
[搜索->接收]
“ john.doe@gmail.com”-> 1,2
“ john.doe@outlook.com”-> 2,4
“ john@yahoo.com”-> 5
“ john.doe”-> 1,2,3,4
“约翰”-> 1,2,3,4,5
“ gmail.com”-> 1,2
“ outlook.com”-> 2,3,4
前三个匹配项是必须的,对于其他匹配项,越精确越好。已经尝试了索引/搜索分析器,标记器和过滤器的不同组合。还尝试在匹配查询的条件下工作,但是没有找到理想的解决方案,欢迎任何想法,并且对映射,分析器或使用哪种查询没有限制,谢谢。
映射 :
PUT /test { "settings": { "analysis": { "filter": { "email": { "type": "pattern_capture", "preserve_original": 1, "patterns": [ "([^@]+)", "(\\p{L}+)", "(\\d+)", "@(.+)", "([^-@]+)" ] } }, "analyzer": { "email": { "tokenizer": "uax_url_email", "filter": [ "email", "lowercase", "unique" ] } } } }, "mappings": { "emails": { "properties": { "email": { "type": "string", "analyzer": "email" } } } } }
测试数据 :
POST /test/emails/_bulk {"index":{"_id":"1"}} {"email": "john.doe@gmail.com"} {"index":{"_id":"2"}} {"email": "john.doe@gmail.com, john.doe@outlook.com"} {"index":{"_id":"3"}} {"email": "hello-john.doe@outlook.com"} {"index":{"_id":"4"}} {"email": "john.doe@outlook.com"} {"index":{"_id":"5"}} {"email": "john@yahoo.com"}
要使用的查询 :
GET /test/emails/_search { "query": { "term": { "email": "john.doe@gmail.com" } } }