elasticsearch copy_to字段在聚合中的行为不正常
我有一个包含两个字符串字段的索引映射,field1
并且field2
都被声明为copy_to到另一个名为的字段all_fields
。
all_fields
索引为“ not_analyzed”。
当我在上创建存储桶聚合时all_fields
,我期望field1和field2的键连接在一起的不同存储桶。取而代之的是,我得到了带有未连接的field1和field2键的单独存储桶。
示例:映射:
{ "mappings": {
"myobject": {
"properties": {
"field1": {
"type": "string",
"index": "analyzed",
"copy_to": "all_fields"
},
"field2": {
"type": "string",
"index": "analyzed",
"copy_to": "all_fields"
},
"all_fields": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
数据在:
{ "field1": "dinner carrot potato broccoli",
"field2": "something here",
}
和
{ "field1": "fish chicken something",
"field2": "dinner",
}
聚合:
{ "aggs": {
"t": {
"terms": {
"field": "all_fields"
}
}
}
}
结果:
..."aggregations": {
"t": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "dinner",
"doc_count": 1
},
{
"key": "dinner carrot potato broccoli",
"doc_count": 1
},
{
"key": "fish chicken something",
"doc_count": 1
},
{
"key": "something here",
"doc_count": 1
}
]
}
}
我期待只有2桶,fish chicken somethingdinner
和dinner carrot potato
broccolisomethinghere
我究竟做错了什么?
回答:
您正在寻找的是两个字符串的串联。copy_to
即使看起来正在这样做,也不会。从copy_to
概念上讲,与您一起从field1
和两者创建一组值,而field2
不是将它们连接在一起。
对于您的用例,您有两种选择:
- 使用
_source
转换 - 执行脚本聚合
我建议进行_source
转换,因为我认为它比编写脚本更有效。意思是,与进行繁重的脚本聚合相比,您在索引编制时付出的代价很小。
对于 :
PUT /lastseen{
"mappings": {
"test": {
"transform": {
"script": "ctx._source['all_fields'] = ctx._source['field1'] + ' ' + ctx._source['field2']"
},
"properties": {
"field1": {
"type": "string"
},
"field2": {
"type": "string"
},
"lastseen": {
"type": "long"
},
"all_fields": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
和查询:
GET /lastseen/test/_search{
"aggs": {
"NAME": {
"terms": {
"field": "all_fields",
"size": 10
}
}
}
}
对于
,为了易于执行(意味着使用doc['field'].value
而不是使用更昂贵的_source.field
),请.raw
向field1
和添加子字段field2
:
PUT /lastseen{
"mappings": {
"test": {
"properties": {
"field1": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
},
"field2": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
},
"lastseen": {
"type": "long"
}
}
}
}
}
脚本将使用以下.raw
子字段:
{ "aggs": {
"NAME": {
"terms": {
"script": "doc['field1.raw'].value + ' ' + doc['field2.raw'].value",
"size": 10,
"lang": "groovy"
}
}
}
}
如果没有.raw
子字段(是故意创建的not_analyzed
),您将需要执行以下操作,这会变得更加昂贵:
{ "aggs": {
"NAME": {
"terms": {
"script": "_source.field1 + ' ' + _source.field2",
"size": 10,
"lang": "groovy"
}
}
}
}
以上是 elasticsearch copy_to字段在聚合中的行为不正常 的全部内容, 来源链接: utcz.com/qa/413563.html