我在elasticsearch(YML中的定义)中具有字段的下一个映射:
my_analyzer: type: custom tokenizer: keyword filter: lowercase products_filter: type: "nested" properties: filter_name: {"type" : "string", analyzer: "my_analyzer"} filter_value: {"type" : "string" , analyzer: "my_analyzer"}
每个文档都有很多过滤器,看起来像:
"products_filter": [ { "filter_name": "Rahmengröße", "filter_value": "33,5 cm" } , { "filter_name": "color", "filter_value": "gelb" } , { "filter_name": "Rahmengröße", "filter_value": "39,5 cm" } , { "filter_name": "Rahmengröße", "filter_value": "45,5 cm" }]
我试图获取唯一过滤器名称的列表以及每个过滤器的唯一过滤器值的列表。
我的意思是,我想获得结构是怎样的:Rahmengröße: 39.5厘米 45.5厘米 33.5厘米 颜色: 盖尔布
为了得到它,我尝试了几种聚合的变体,例如:
{ "aggs": { "bla": { "terms": { "field": "products_filter.filter_name" }, "aggs": { "bla2": { "terms": { "field": "products_filter.filter_value" } } } } } }
这个请求是错误的。
它将为我返回唯一过滤器名称的列表,并且每个列表将包含所有filter_values的列表。
"bla": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 103, "buckets": [ { "key": "color", "doc_count": 9, "bla2": { "doc_count_error_upper_bound": 4, "sum_other_doc_count": 366, "buckets": [ { "key": "100", "doc_count": 5 } , { "key": "cm", "doc_count": 5 } , { "key": "unisex", "doc_count": 5 } , { "key": "11", "doc_count": 4 } , { "key": "160", "doc_count": 4 } , { "key": "22", "doc_count": 4 } , { "key": "a", "doc_count": 4 } , { "key": "alu", "doc_count": 4 } , { "key": "aluminium", "doc_count": 4 } , { "key": "aus", "doc_count": 4 } ] } } ,
另外,我尝试使用反向嵌套聚合,但这对我没有帮助。
所以我认为我的尝试有逻辑上的错误吗?
如我所说。您的问题是您的文本被分析,elasticsearch总是在令牌级别聚合。因此,为了解决该问题,必须将字段值索引为单个标记。有两种选择:
因此,将使用小写过滤器并删除重音符号(ö => o以及ß => ss您的字段的其他字段,以创建自定义关键字分析器)来进行设置,以便将它们用于聚合(raw和keyword):
ö => o
ß => ss
raw
keyword
PUT /test { "settings": { "analysis": { "analyzer": { "my_analyzer_keyword": { "type": "custom", "tokenizer": "keyword", "filter": [ "asciifolding", "lowercase" ] } } } }, "mappings": { "data": { "properties": { "products_filter": { "type": "nested", "properties": { "filter_name": { "type": "string", "analyzer": "standard", "fields": { "raw": { "type": "string", "index": "not_analyzed" }, "keyword": { "type": "string", "analyzer": "my_analyzer_keyword" } } }, "filter_value": { "type": "string", "analyzer": "standard", "fields": { "raw": { "type": "string", "index": "not_analyzed" }, "keyword": { "type": "string", "analyzer": "my_analyzer_keyword" } } } } } } } } }
测试文件,您给了我们:
PUT /test/data/1 { "products_filter": [ { "filter_name": "Rahmengröße", "filter_value": "33,5 cm" }, { "filter_name": "color", "filter_value": "gelb" }, { "filter_name": "Rahmengröße", "filter_value": "39,5 cm" }, { "filter_name": "Rahmengröße", "filter_value": "45,5 cm" } ] }
这将是查询以使用raw字段进行汇总:
GET /test/_search { "size": 0, "aggs": { "Nesting": { "nested": { "path": "products_filter" }, "aggs": { "raw_names": { "terms": { "field": "products_filter.filter_name.raw", "size": 0 }, "aggs": { "raw_values": { "terms": { "field": "products_filter.filter_value.raw", "size": 0 } } } } } } } }
它确实带来了预期的结果(带有过滤器名称的存储桶和带有其值的子存储桶):
{ "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0, "hits": [] }, "aggregations": { "Nesting": { "doc_count": 4, "raw_names": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "Rahmengröße", "doc_count": 3, "raw_values": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "33,5 cm", "doc_count": 1 }, { "key": "39,5 cm", "doc_count": 1 }, { "key": "45,5 cm", "doc_count": 1 } ] } }, { "key": "color", "doc_count": 1, "raw_values": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "gelb", "doc_count": 1 } ] } } ] } } } }
另外,您可以将field与关键字分析器(以及一些规范化)结合使用,以获得更通用且不区分大小写的结果:
GET /test/_search { "size": 0, "aggs": { "Nesting": { "nested": { "path": "products_filter" }, "aggs": { "keyword_names": { "terms": { "field": "products_filter.filter_name.keyword", "size": 0 }, "aggs": { "keyword_values": { "terms": { "field": "products_filter.filter_value.keyword", "size": 0 } } } } } } } }
结果就是:
{ "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0, "hits": [] }, "aggregations": { "Nesting": { "doc_count": 4, "keyword_names": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "rahmengrosse", "doc_count": 3, "keyword_values": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "33,5 cm", "doc_count": 1 }, { "key": "39,5 cm", "doc_count": 1 }, { "key": "45,5 cm", "doc_count": 1 } ] } }, { "key": "color", "doc_count": 1, "keyword_values": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "gelb", "doc_count": 1 } ] } } ] } } } }