我有一个充满关键字的索引,根据这些关键字,我想从输入文本中提取关键字。
以下是示例关键字索引。请注意,关键字也可以是多个单词,或者基本上是唯一的标签。
{ "hits": { "total": 2000, "hits": [ { "id": 1, "keyword": "thousand eyes" }, { "id": 2, "keyword": "facebook" }, { "id": 3, "keyword": "superdoc" }, { "id": 4, "keyword": "quora" }, { "id": 5, "keyword": "your story" }, { "id": 6, "keyword": "Surgery" }, { "id": 7, "keyword": "lending club" }, { "id": 8, "keyword": "ad roll" }, { "id": 9, "keyword": "the honest company" }, { "id": 10, "keyword": "Draft kings" } ] } }
现在,如果输入文本为 “我在Facebook上看到了借贷俱乐部的新闻,您的故事和法定人数” ,则搜索结果应为 [“借贷俱乐部”,“ facebook”,“您的故事”,“法定人数”] 。此外,搜索应 区分大小写
只有一种真正的方法可以做到这一点。您必须将您的数据作为关键字建立索引,并使用带状疱疹对其进行分析:
看到这个复制品:
首先,我们将创建两个自定义分析器:keyword和shingles:
PUT test { "settings": { "analysis": { "analyzer": { "my_analyzer_keyword": { "type": "custom", "tokenizer": "keyword", "filter": [ "asciifolding", "lowercase" ] }, "my_analyzer_shingle": { "type": "custom", "tokenizer": "standard", "filter": [ "asciifolding", "lowercase", "shingle" ] } } } }, "mappings": { "your_type": { "properties": { "keyword": { "type": "string", "index_analyzer": "my_analyzer_keyword", "search_analyzer": "my_analyzer_shingle" } } } } }
现在,让我们使用您提供的数据创建一些示例数据:
POST /test/your_type/1 { "id": 1, "keyword": "thousand eyes" } POST /test/your_type/2 { "id": 2, "keyword": "facebook" } POST /test/your_type/3 { "id": 3, "keyword": "superdoc" } POST /test/your_type/4 { "id": 4, "keyword": "quora" } POST /test/your_type/5 { "id": 5, "keyword": "your story" } POST /test/your_type/6 { "id": 6, "keyword": "Surgery" } POST /test/your_type/7 { "id": 7, "keyword": "lending club" } POST /test/your_type/8 { "id": 8, "keyword": "ad roll" } POST /test/your_type/9 { "id": 9, "keyword": "the honest company" } POST /test/your_type/10 { "id": 10, "keyword": "Draft kings" }
最后查询以运行搜索:
POST /test/your_type/_search { "query": { "match": { "keyword": "I saw the news of lending club on facebook, your story and quora" } } }
这是结果:
{ "took": 6, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 4, "max_score": 0.009332742, "hits": [ { "_index": "test", "_type": "your_type", "_id": "2", "_score": 0.009332742, "_source": { "id": 2, "keyword": "facebook" } }, { "_index": "test", "_type": "your_type", "_id": "7", "_score": 0.009332742, "_source": { "id": 7, "keyword": "lending club" } }, { "_index": "test", "_type": "your_type", "_id": "4", "_score": 0.009207102, "_source": { "id": 4, "keyword": "quora" } }, { "_index": "test", "_type": "your_type", "_id": "5", "_score": 0.0014755741, "_source": { "id": 5, "keyword": "your story" } } ] } }
那么它在幕后做什么?
é
e
Draft kings
draft kings