elasticsearch copy_to字段在聚合中的行为不正常

我有一个包含两个字符串字段的索引映射,field1并且field2都被声明为copy_to到另一个名为的字段all_fields

all_fields索引为“ not_analyzed”。

当我在上创建存储桶聚合时all_fields,我期望field1和field2的键连接在一起的不同存储桶。取而代之的是,我得到了带有未连接的field1和field2键的单独存储桶。

示例:映射:

  {

"mappings": {

"myobject": {

"properties": {

"field1": {

"type": "string",

"index": "analyzed",

"copy_to": "all_fields"

},

"field2": {

"type": "string",

"index": "analyzed",

"copy_to": "all_fields"

},

"all_fields": {

"type": "string",

"index": "not_analyzed"

}

}

}

}

}

数据在:

  {

"field1": "dinner carrot potato broccoli",

"field2": "something here",

}

  {

"field1": "fish chicken something",

"field2": "dinner",

}

聚合:

{

"aggs": {

"t": {

"terms": {

"field": "all_fields"

}

}

}

}

结果:

...

"aggregations": {

"t": {

"doc_count_error_upper_bound": 0,

"sum_other_doc_count": 0,

"buckets": [

{

"key": "dinner",

"doc_count": 1

},

{

"key": "dinner carrot potato broccoli",

"doc_count": 1

},

{

"key": "fish chicken something",

"doc_count": 1

},

{

"key": "something here",

"doc_count": 1

}

]

}

}

我期待只有2桶,fish chicken somethingdinnerdinner carrot potato

broccolisomethinghere

我究竟做错了什么?

回答:

您正在寻找的是两个字符串的串联。copy_to即使看起来正在这样做,也不会。从copy_to概念上讲,与您一起从field1和两者创建一组值,而field2不是将它们连接在一起。

对于您的用例,您有两种选择:

  1. 使用_source转换
  2. 执行脚本聚合

我建议进行_source转换,因为我认为它比编写脚本更有效。意思是,与进行繁重的脚本聚合相比,您在索引编制时付出的代价很小。

对于 :

PUT /lastseen

{

"mappings": {

"test": {

"transform": {

"script": "ctx._source['all_fields'] = ctx._source['field1'] + ' ' + ctx._source['field2']"

},

"properties": {

"field1": {

"type": "string"

},

"field2": {

"type": "string"

},

"lastseen": {

"type": "long"

},

"all_fields": {

"type": "string",

"index": "not_analyzed"

}

}

}

}

}

和查询:

GET /lastseen/test/_search

{

"aggs": {

"NAME": {

"terms": {

"field": "all_fields",

"size": 10

}

}

}

}

对于

,为了易于执行(意味着使用doc['field'].value而不是使用更昂贵的_source.field),请.rawfield1和添加子字段field2

PUT /lastseen

{

"mappings": {

"test": {

"properties": {

"field1": {

"type": "string",

"fields": {

"raw": {

"type": "string",

"index": "not_analyzed"

}

}

},

"field2": {

"type": "string",

"fields": {

"raw": {

"type": "string",

"index": "not_analyzed"

}

}

},

"lastseen": {

"type": "long"

}

}

}

}

}

脚本将使用以下.raw子字段:

{

"aggs": {

"NAME": {

"terms": {

"script": "doc['field1.raw'].value + ' ' + doc['field2.raw'].value",

"size": 10,

"lang": "groovy"

}

}

}

}

如果没有.raw子字段(是故意创建的not_analyzed),您将需要执行以下操作,这会变得更加昂贵:

{

"aggs": {

"NAME": {

"terms": {

"script": "_source.field1 + ' ' + _source.field2",

"size": 10,

"lang": "groovy"

}

}

}

}

以上是 elasticsearch copy_to字段在聚合中的行为不正常 的全部内容, 来源链接: utcz.com/qa/413563.html

回到顶部