Elasticsearch:使用文档中的自定义分数字段进行影响力评分
我有一组通过NLP算法从文本中提取的单词,以及每个文档中每个单词的相关分数。
例如 :
document 1: { "vocab": [ {"wtag":"James Bond", "rscore": 2.14 }, {"wtag":"world", "rscore": 0.86 },
....,
{"wtag":"somemore", "rscore": 3.15 }
]
}
document 2: { "vocab": [ {"wtag":"hiii", "rscore": 1.34 },
{"wtag":"world", "rscore": 0.94 },
....,
{"wtag":"somemore", "rscore": 3.23 }
]
}
我希望每个文档中rscore
的match
wtag
都可以影响_score
ES给它的给定值,或者乘以或加到上_score
,以影响_score
结果文档的最终(依次,顺序)。有什么办法可以做到这一点?
回答:
解决此问题的另一种方法是使用嵌套文档:
首先设置映射以创建vocab
一个嵌套文档,这意味着每个wtag
/ rscore
文档将在内部作为单独的文档建立索引:
curl -XPUT "http://localhost:9200/myindex/" -d'{
"settings": {"number_of_shards": 1},
"mappings": {
"mytype": {
"properties": {
"vocab": {
"type": "nested",
"fields": {
"wtag": {
"type": "string"
},
"rscore": {
"type": "float"
}
}
}
}
}
}
}'
然后索引您的文档:
curl -XPUT "http://localhost:9200/myindex/mytype/1" -d'{
"vocab": [
{
"wtag": "James Bond",
"rscore": 2.14
},
{
"wtag": "world",
"rscore": 0.86
},
{
"wtag": "somemore",
"rscore": 3.15
}
]
}'
curl -XPUT "http://localhost:9200/myindex/mytype/2" -d'
{
"vocab": [
{
"wtag": "hiii",
"rscore": 1.34
},
{
"wtag": "world",
"rscore": 0.94
},
{
"wtag": "somemore",
"rscore": 3.23
}
]
}'
并运行nested
查询以匹配所有嵌套文档,并rscore
为每个与之匹配的嵌套文档求和:
curl -XGET "http://localhost:9200/myindex/mytype/_search" -d'{
"query": {
"nested": {
"path": "vocab",
"score_mode": "sum",
"query": {
"function_score": {
"query": {
"match": {
"vocab.wtag": "james bond world"
}
},
"script_score": {
"script": "doc[\"rscore\"].value"
}
}
}
}
}
}'
以上是 Elasticsearch:使用文档中的自定义分数字段进行影响力评分 的全部内容, 来源链接: utcz.com/qa/430564.html