ElasticSearch聚合+在非数值字段5.3上排序
我想将数据汇总到另一个字段中,也想根据名称以排序方式获取汇总数据。
我的数据是:
{ "_index": "testing-aggregation",
"_type": "employee",
"_id": "emp001_local000000000000001",
"_score": 10.0,
"_source": {
"name": [
"Person 01"
],
"groupbyid": [
"group0001"
],
"ranking": [
"2.0"
]
}
},
{
"_index": "testing-aggregation",
"_type": "employee",
"_id": "emp002_local000000000000001",
"_score": 85146.375,
"_source": {
"name": [
"Person 02"
],
"groupbyid": [
"group0001"
],
"ranking": [
"10.0"
]
}
},
{
"_index": "testing-aggregation",
"_type": "employee",
"_id": "emp003_local000000000000001",
"_score": 20.0,
"_source": {
"name": [
"Person 03"
],
"groupbyid": [
"group0002"
],
"ranking": [
"-1.0"
]
}
},
{
"_index": "testing-aggregation",
"_type": "employee",
"_id": "emp004_local000000000000001",
"_score": 5.0,
"_source": {
"name": [
"Person 04"
],
"groupbyid": [
"group0002"
],
"ranking": [
"2.0"
]
}
}
我的查询:
{ "size": 0,
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "name:emp*^1000.0"
}
}
]
}
},
"aggs": {
"groupbyid": {
"terms": {
"field": "groupbyid.raw",
"order": {
"top_hit_agg": "desc"
},
"size": 10
},
"aggs": {
"top_hit_agg": {
"terms": {
"field": "name"
}
}
}
}
}
}
我的映射是:
{ "name": {
"type": "text",
"fielddata": true,
"fields": {
"lower_case_sort": {
"type": "text",
"fielddata": true,
"analyzer": "case_insensitive_sort"
}
}
},
"groupbyid": {
"type": "text",
"fielddata": true,
"index": "analyzed",
"fields": {
"raw": {
"type": "keyword",
"index": "not_analyzed"
}
}
}
}
我正在根据分组记录的相关性平均值来获取数据。现在,我想要的是第一个基于groupid的记录,然后在每个存储桶中根据名称字段对数据进行排序。
我想对一个字段进行分组,然后在对分区进行分组之后,对另一个字段进行排序。这是样本数据。
还有其他字段,例如created_on,updated_on。我还想获得基于该字段的排序数据。还按字母顺序获取数据。
我想对非数字数据类型(字符串)进行排序。我可以做数字数据类型。
我可以为排名字段执行此操作,但不能为名称字段执行此操作。它给出了以下错误。
Expected numeric type on field [name], but got [text];
回答:
您要问几件事,所以我将尝试依次回答。
回答:
我正在根据分组记录的相关性平均值来获取数据。
如果这是您要尝试执行的操作,则不是您编写的聚合正在执行的操作。术语汇总默认情况下按每个存储区中的文档数降序对存储区进行排序。要按“平均相关性”(我将其解释为“
_score
组中文档的平均”)对组进行排序,您需要在得分上添加一个子聚合,并按此对术语聚合进行排序:
"aggregations": { "most_relevant_groups": {
"terms": {
"field": "groupbyid.raw",
"order": {
"average_score": "desc"
}
},
"aggs": {
"average_score": {
"avg": {
"script": {
"inline": "_score",
"lang": "painless",
}
}
}
}
}
}
回答:
现在,我想要的是第一个基于groupid的记录,然后在每个存储桶中根据名称字段对数据进行排序。
要对每个存储桶中的文档进行排序,可以使用top_hits
聚合:
"aggregations": { "most_relevant_groups": {
"terms": {
"field": "groupbyid.raw",
"order": {
"average_score": "desc"
}
},
"aggs": {
"employees": {
"top_hits": {
"size": 10, // Default will be 10 - change to whatever
"sort": [
{
"name.lower_case_sort": {
"order": "asc"
}
}
]
}
}
}
}
}
回答:
将以上两者放在一起,以下聚合将满足您的需求(请注意,我使用了function_score查询来基于排名模拟“相关性”-您的查询可以是任意查询,而只要是能够产生所需相关性的查询即可)
:
POST /testing-aggregation/employee/_search{
"size": 0,
"query": {
"function_score": {
"functions": [
{
"field_value_factor": {
"field": "ranking"
}
}
]
}
},
"aggs": {
"groupbyid": {
"terms": {
"field": "groupbyid.raw",
"size": 10,
"order": {
"average_score": "desc"
}
},
"aggs": {
"average_score": {
"avg": {
"script": {
"inline": "_score",
"lang": "painless"
}
}
},
"employees": {
"top_hits": {
"size": 10,
"sort": [
{
"name.lower_case_sort": {
"order": "asc"
}
}
]
}
}
}
}
}
}
以上是 ElasticSearch聚合+在非数值字段5.3上排序 的全部内容, 来源链接: utcz.com/qa/430111.html