Elasticsearch:使用文档中的自定义分数字段进行影响力评分

我有一组通过NLP算法从文本中提取的单词,以及每个文档中每个单词的相关分数。

例如 :

document 1: {  "vocab": [ {"wtag":"James Bond", "rscore": 2.14 }, 

{"wtag":"world", "rscore": 0.86 },

....,

{"wtag":"somemore", "rscore": 3.15 }

]

}

document 2: { "vocab": [ {"wtag":"hiii", "rscore": 1.34 },

{"wtag":"world", "rscore": 0.94 },

....,

{"wtag":"somemore", "rscore": 3.23 }

]

}

我希望每个文档中rscore的match

wtag都可以影响_scoreES给它的给定值,或者乘以或加到上_score,以影响_score结果文档的最终(依次,顺序)。有什么办法可以做到这一点?

回答:

解决此问题的另一种方法是使用嵌套文档:

首先设置映射以创建vocab一个嵌套文档,这意味着每个wtag/ rscore文档将在内部作为单独的文档建立索引:

curl -XPUT "http://localhost:9200/myindex/" -d'

{

"settings": {"number_of_shards": 1},

"mappings": {

"mytype": {

"properties": {

"vocab": {

"type": "nested",

"fields": {

"wtag": {

"type": "string"

},

"rscore": {

"type": "float"

}

}

}

}

}

}

}'

然后索引您的文档:

curl -XPUT "http://localhost:9200/myindex/mytype/1" -d'

{

"vocab": [

{

"wtag": "James Bond",

"rscore": 2.14

},

{

"wtag": "world",

"rscore": 0.86

},

{

"wtag": "somemore",

"rscore": 3.15

}

]

}'

curl -XPUT "http://localhost:9200/myindex/mytype/2" -d'

{

"vocab": [

{

"wtag": "hiii",

"rscore": 1.34

},

{

"wtag": "world",

"rscore": 0.94

},

{

"wtag": "somemore",

"rscore": 3.23

}

]

}'

并运行nested查询以匹配所有嵌套文档,并rscore为每个与之匹配的嵌套文档求和:

curl -XGET "http://localhost:9200/myindex/mytype/_search" -d'

{

"query": {

"nested": {

"path": "vocab",

"score_mode": "sum",

"query": {

"function_score": {

"query": {

"match": {

"vocab.wtag": "james bond world"

}

},

"script_score": {

"script": "doc[\"rscore\"].value"

}

}

}

}

}

}'

以上是 Elasticsearch:使用文档中的自定义分数字段进行影响力评分 的全部内容, 来源链接: utcz.com/qa/430564.html

回到顶部