我正在尝试计算具有唯一嵌套字段值的文档(以及文档本身)。看起来获得唯一文档有效。但是,当我尝试执行的请求时count,出现如下错误:
count
禁止:org.elasticsearch.client.ResponseException:方法[POST],主机 [http:// localhost:9200],URI [/ package / _count?ignore_throttled = true&ignore_unavailable = false&expand_wildcards = open&allow_no_indices = true],状态行[HTTP / 1.1 400错误的请求] {“错误”:{“ root_cause”:[{“ type”:“ parsing_exception”,“原因”:“请求不支持[collapse]”,“ line”:1,“ col”:216} ],“ type”:“ parsing_exception”,“ reason”:“请求不支持[collapse]”,“ line”:1,“ col”:216},“ status”:400}
代码:
BoolQueryBuilder innerTemplNestedBuilder = QueryBuilders.boolQuery(); NestedQueryBuilder templatesNestedQuery = QueryBuilders.nestedQuery("attachment", innerTemplNestedBuilder, ScoreMode.None); BoolQueryBuilder mainQueryBuilder = QueryBuilders.boolQuery().must(templatesNestedQuery); if (!isEmpty(templateName)) { innerTemplNestedBuilder.filter(QueryBuilders.termQuery("attachment.name", templateName)); } SearchSourceBuilder searchSourceBuilder = SearchSourceBuilder.searchSource() .collapse(new CollapseBuilder("attachment.uuid")) .query(mainQueryBuilder); // NEXT LINE CAUSE ERROR long count = client.count(new CountRequest("package").source(searchSourceBuilder), RequestOptions.DEFAULT).getCount(); <<<<<<<<<< ERROR HERE // THIS WORKS SearchResponse searchResponse = client.search( new SearchRequest( new String[] {"package"}, searchSourceBuilder.timeout(new TimeValue(20, TimeUnit.SECONDS)).from(offset).size(limit) ).indices("package").searchType(SearchType.DFS_QUERY_THEN_FETCH), RequestOptions.DEFAULT ); return ....;
该方法的总体意图是获取一部分文档以及所有此类文档的数量。可能已经有另一种方法可以满足这种需求。如果我尝试count使用aggregations和cardinality-我得到的结果为零,并且看起来不适用于嵌套字段。
aggregations
cardinality
计数要求:
{ "query": { "bool": { "must": [ { "nested": { "query": { "bool": { "adjust_pure_negative": true, "boost": 1.0 } }, "path": "attachment", "ignore_unmapped": false, "score_mode": "none", "boost": 1.0 } } ], "adjust_pure_negative": true, "boost": 1.0 } }, "collapse": { "field": "attachment.uuid" } }
如何创建映射:
curl -X DELETE "localhost:9200/package?pretty" curl -X PUT "localhost:9200/package?include_type_name=true&pretty" -H 'Content-Type: application/json' -d '{ "settings" : { "number_of_shards" : 1, "number_of_replicas" : 1 }}' curl -X PUT "localhost:9200/package/_mappings?pretty" -H 'Content-Type: application/json' -d' { "dynamic": false, "properties" : { "attachment": { "type": "nested", "properties": { "uuid" : { "type" : "keyword" }, "name" : { "type" : "text" } } }, "uuid" : { "type" : "keyword" } } } '
代码生成的结果查询应如下所示:
curl -X POST "localhost:9200/package/_count?&pretty" -H 'Content-Type: application/json' -d' { "query" : { "bool": { "must": [ { "nested": { "query": { "bool": { "adjust_pure_negative": true, "boost": 1.0 } }, "path": "attachment", "ignore_unmapped": false, "score_mode": "none", "boost": 1.0 } } ], "adjust_pure_negative": true, "boost": 1.0 } }, "collapse": { "field": "attachment.uuid" } }'
折叠只能在_search上下文中使用,而不能在中使用_count。
_search
_count
其次,您的查询甚至可以做什么?您那里有很多多余的参数,例如boost:1etc。您不妨说:
boost:1
POST /package/_count?&pretty { "query": { "bool": { "must": [ { "nested": { "path": "attachment", "query": { "match_all": {} } } } ] } } }
这实际上什么也没做:)
假设有3个文档,其中2个具有相同的attachment.uuid值:
attachment.uuid
[ { "attachment":{ "uuid":"04144e14-62c3-11ea-bc55-0242ac130003" } }, { "attachment":{ "uuid":"04144e14-62c3-11ea-bc55-0242ac130003" } }, { "attachment":{ "uuid":"100b9632-62c3-11ea-bc55-0242ac130003" } } ]
要获取s 的terms细分uuid,请运行
terms
uuid
GET package/_search { "size": 0, "aggs": { "nested_uniques": { "nested": { "path": "attachment" }, "aggs": { "subagg": { "terms": { "field": "attachment.uuid" } } } } } }
产生
... { "aggregations":{ "nested_uniques":{ "doc_count":3, "subagg":{ "doc_count_error_upper_bound":0, "sum_other_doc_count":0, "buckets":[ { "key":"04144e14-62c3-11ea-bc55-0242ac130003", "doc_count":2 }, { "key":"100b9632-62c3-11ea-bc55-0242ac130003", "doc_count":1 } ] } } } }
GET package/_search { "size": 0, "aggs": { "nested_uniques": { "nested": { "path": "attachment" }, "aggs": { "scripted_uniques": { "scripted_metric": { "init_script": "state.my_map = [:];", "map_script": """ if (doc.containsKey('attachment.uuid')) { state.my_map[doc['attachment.uuid'].value.toString()] = 1; } """, "combine_script": """ def sum = 0; for (c in state.my_map.entrySet()) { sum += 1 } return sum """, "reduce_script": """ def sum = 0; for (agg in states) { sum += agg; } return sum; """ } } } } } }
哪个返回
... { "aggregations":{ "nested_uniques":{ "doc_count":3, "scripted_uniques":{ "value":2 } } } }
而这scripted_uniques: 2正是您所追求的。
scripted_uniques: 2
注意:我使用嵌套的脚本指标aggs解决了该用例,但是如果您知道更干净的方法,我非常乐于学习!