我想在elasticsearch中更改评分系统,以摆脱对一个术语的多次出现计数的麻烦。例如,我想要:
“德克萨斯州德克萨斯州”
和
“得克萨斯州”
得分相同。我发现elasticsearch表示该映射将禁用词频统计,但是我的搜索结果却不一样:
"mappings":{ "business": { "properties" : { "name" : { "type" : "string", "index_options" : "docs", "norms" : { "enabled": false}} } } }
}
任何帮助将不胜感激,我无法找到很多有关此的信息。
编辑:
我正在添加搜索代码,并在使用解释时返回了什么。
我的搜索代码:
Settings settings = ImmutableSettings.settingsBuilder().put("cluster.name", "escluster").build(); Client client = new TransportClient(settings) .addTransportAddress(new InetSocketTransportAddress("127.0.0.1", 9300)); SearchRequest request = Requests.searchRequest("businesses") .source(SearchSourceBuilder.searchSource().query(QueryBuilders.boolQuery() .should(QueryBuilders.matchQuery("name", "Texas") .minimumShouldMatch("1")))).searchType(SearchType.DFS_QUERY_THEN_FETCH); ExplainRequest request2 = client.prepareIndex("businesses", "business")
当我搜索解释时,我得到:
"took" : 14, "timed_out" : false, "_shards" : { "total" : 3, "successful" : 3, "failed" : 0 }, "hits" : { "total" : 2, "max_score" : 1.0, "hits" : [ { "_shard" : 1, "_node" : "BTqBPVDET5Kr83r-CYPqfA", "_index" : "businesses", "_type" : "business", "_id" : "AU9U5KBks4zEorv9YI4n", "_score" : 1.0, "_source":{ "name" : "texas" } , "_explanation" : { "value" : 1.0, "description" : "weight(_all:texas in 0) [PerFieldSimilarity], result of:", "details" : [ { "value" : 1.0, "description" : "fieldWeight in 0, product of:", "details" : [ { "value" : 1.0, "description" : "tf(freq=1.0), with freq of:", "details" : [ { "value" : 1.0, "description" : "termFreq=1.0" } ] }, { "value" : 1.0, "description" : "idf(docFreq=2, maxDocs=3)" }, { "value" : 1.0, "description" : "fieldNorm(doc=0)" } ] } ] } }, { "_shard" : 1, "_node" : "BTqBPVDET5Kr83r-CYPqfA", "_index" : "businesses", "_type" : "business", "_id" : "AU9U5K6Ks4zEorv9YI4o", "_score" : 0.8660254, "_source":{ "name" : "texas texas texas" } , "_explanation" : { "value" : 0.8660254, "description" : "weight(_all:texas in 0) [PerFieldSimilarity], result of:", "details" : [ { "value" : 0.8660254, "description" : "fieldWeight in 0, product of:", "details" : [ { "value" : 1.7320508, "description" : "tf(freq=3.0), with freq of:", "details" : [ { "value" : 3.0, "description" : "termFreq=3.0" } ] }, { "value" : 1.0, "description" : "idf(docFreq=2, maxDocs=3)" }, { "value" : 0.5, "description" : "fieldNorm(doc=0)" } ] } ] } } ] }
似乎仍在考虑频率和文档频率。有任何想法吗?很抱歉,格式不好,我不知道为什么它看起来如此怪异。
编辑编辑:
我在浏览器中搜索http:// localhost:9200 / businesses / business / _search?pretty = true&qname = texas的代码 是:
{ "took" : 2, "timed_out" : false, "_shards" : { "total" : 3, "successful" : 3, "failed" : 0 }, "hits" : { "total" : 4, "max_score" : 1.0, "hits" : [ { "_index" : "businesses", "_type" : "business", "_id" : "AU9YcCKjKvtg8NgyozGK", "_score" : 1.0, "_source":{"business" : { "name" : "texas texas texas texas" } } }, { "_index" : "businesses", "_type" : "business", "_id" : "AU9YateBKvtg8Ngyoy-p", "_score" : 1.0, "_source":{ "name" : "texas" } }, { "_index" : "businesses", "_type" : "business", "_id" : "AU9YavVnKvtg8Ngyoy-4", "_score" : 1.0, "_source":{ "name" : "texas texas texas" } }, { "_index" : "businesses", "_type" : "business", "_id" : "AU9Yb7NgKvtg8NgyozFf", "_score" : 1.0, "_source":{"business" : { "name" : "texas texas texas" } } } ] } }
它找到我在那里的所有4个对象,并且它们的得分都相同。当我运行Java API搜索并进行解释时,我得到:
{ "took" : 2, "timed_out" : false, "_shards" : { "total" : 3, "successful" : 3, "failed" : 0 }, "hits" : { "total" : 2, "max_score" : 1.287682, "hits" : [ { "_shard" : 1, "_node" : "BTqBPVDET5Kr83r-CYPqfA", "_index" : "businesses", "_type" : "business", "_id" : "AU9YateBKvtg8Ngyoy-p", "_score" : 1.287682, "_source":{ "name" : "texas" } , "_explanation" : { "value" : 1.287682, "description" : "weight(name:texas in 0) [PerFieldSimilarity], result of:", "details" : [ { "value" : 1.287682, "description" : "fieldWeight in 0, product of:", "details" : [ { "value" : 1.0, "description" : "tf(freq=1.0), with freq of:", "details" : [ { "value" : 1.0, "description" : "termFreq=1.0" } ] }, { "value" : 1.287682, "description" : "idf(docFreq=2, maxDocs=4)" }, { "value" : 1.0, "description" : "fieldNorm(doc=0)" } ] } ] } }, { "_shard" : 1, "_node" : "BTqBPVDET5Kr83r-CYPqfA", "_index" : "businesses", "_type" : "business", "_id" : "AU9YavVnKvtg8Ngyoy-4", "_score" : 1.1151654, "_source":{ "name" : "texas texas texas" } , "_explanation" : { "value" : 1.1151654, "description" : "weight(name:texas in 0) [PerFieldSimilarity], result of:", "details" : [ { "value" : 1.1151654, "description" : "fieldWeight in 0, product of:", "details" : [ { "value" : 1.7320508, "description" : "tf(freq=3.0), with freq of:", "details" : [ { "value" : 3.0, "description" : "termFreq=3.0" } ] }, { "value" : 1.287682, "description" : "idf(docFreq=2, maxDocs=4)" }, { "value" : 0.5, "description" : "fieldNorm(doc=0)" } ] } ] } } ] } }
index options在映射中初始设置字段后,似乎无法覆盖该字段的
index options
例:
put test put test/business/_mapping { "properties": { "name": { "type": "string", "index_options": "freqs", "norms": { "enabled": false } } } } put test/business/_mapping { "properties": { "name": { "type": "string", "index_options": "docs", "norms": { "enabled": false } } } } get test/business/_mapping { "test": { "mappings": { "business": { "properties": { "name": { "type": "string", "norms": { "enabled": false }, "index_options": "freqs" } } } } } }
您将不得不重新创建索引以获取新的映射