我有一个包含两个字符串字段的索引映射,field1并且field2都被声明为copy_to到另一个名为的字段all_fields。 all_fields索引为“ not_analyzed”。
field1
field2
all_fields
当我在上创建存储桶聚合时all_fields,我期望field1和field2的键连接在一起的不同存储桶。取而代之的是,我得到了带有未连接的field1和field2键的单独存储桶。
示例:映射:
{ "mappings": { "myobject": { "properties": { "field1": { "type": "string", "index": "analyzed", "copy_to": "all_fields" }, "field2": { "type": "string", "index": "analyzed", "copy_to": "all_fields" }, "all_fields": { "type": "string", "index": "not_analyzed" } } } } }
数据在:
{ "field1": "dinner carrot potato broccoli", "field2": "something here", }
和
{ "field1": "fish chicken something", "field2": "dinner", }
聚合:
{ "aggs": { "t": { "terms": { "field": "all_fields" } } } }
结果:
... "aggregations": { "t": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "dinner", "doc_count": 1 }, { "key": "dinner carrot potato broccoli", "doc_count": 1 }, { "key": "fish chicken something", "doc_count": 1 }, { "key": "something here", "doc_count": 1 } ] } }
我期待只有2桶,fish chicken somethingdinner和dinner carrot potato broccolisomethinghere
fish chicken somethingdinner
dinner carrot potato broccolisomethinghere
我究竟做错了什么?
您正在寻找的是两个字符串的串联。copy_to即使看起来正在这样做,也不会。从copy_to概念上讲,与您一起从field1和两者创建一组值,而field2不是将它们连接在一起。
copy_to
对于您的用例,您有两种选择:
_source
我建议进行_source转换,因为我认为它比编写脚本更有效。意思是,与进行繁重的脚本聚合相比,您在索引编制时付出的代价很小。
对于 _source 转换:
PUT /lastseen { "mappings": { "test": { "transform": { "script": "ctx._source['all_fields'] = ctx._source['field1'] + ' ' + ctx._source['field2']" }, "properties": { "field1": { "type": "string" }, "field2": { "type": "string" }, "lastseen": { "type": "long" }, "all_fields": { "type": "string", "index": "not_analyzed" } } } } }
和查询:
GET /lastseen/test/_search { "aggs": { "NAME": { "terms": { "field": "all_fields", "size": 10 } } } }
对于 脚本聚合 ,为了易于执行(意味着使用doc['field'].value而不是使用更昂贵的_source.field),请.raw向field1和添加子字段field2:
doc['field'].value
_source.field
.raw
PUT /lastseen { "mappings": { "test": { "properties": { "field1": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } }, "field2": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } }, "lastseen": { "type": "long" } } } } }
脚本将使用以下.raw子字段:
{ "aggs": { "NAME": { "terms": { "script": "doc['field1.raw'].value + ' ' + doc['field2.raw'].value", "size": 10, "lang": "groovy" } } } }
如果没有.raw子字段(是故意创建的not_analyzed),您将需要执行以下操作,这会变得更加昂贵:
not_analyzed
{ "aggs": { "NAME": { "terms": { "script": "_source.field1 + ' ' + _source.field2", "size": 10, "lang": "groovy" } } } }