Elasticsearch:文档pt.2中具有自定义得分字段的影响力得分

拥有这些文件:

{

"created_at" : "2017-07-31T20:30:14-04:00",

"description" : null,

"height" : 3213,

"id" : "1",

"tags" : [

{

"confidence" : 65.48948436785749,

"tag" : "beach"

},

{

"confidence" : 57.31950504425406,

"tag" : "sea"

},

{

"confidence" : 43.58207236617374,

"tag" : "coast"

},

{

"confidence" : 35.6857910950816,

"tag" : "sand"

},

{

"confidence" : 33.660057321079655,

"tag" : "landscape"

},

{

"confidence" : 32.53252312423727,

"tag" : "sky"

}

],

"width" : 5712,

"color" : "#0C0A07",

"boost_multiplier" : 1

}

{

"created_at" : "2017-07-31T20:43:17-04:00",

"description" : null,

"height" : 4934,

"id" : "2",

"tags" : [

{

"confidence" : 84.09123410403951,

"tag" : "mountain"

},

{

"confidence" : 56.412795342449456,

"tag" : "valley"

},

{

"confidence" : 48.36547551196872,

"tag" : "landscape"

},

{

"confidence" : 40.51100450186575,

"tag" : "mountains"

},

{

"confidence" : 33.14263528292239,

"tag" : "sky"

},

{

"confidence" : 31.064394646169404,

"tag" : "peak"

},

{

"confidence" : 29.372,

"tag" : "natural elevation"

}

],

"width" : 4016,

"color" : "#FEEBF9",

"boost_multiplier" : 1

}

我想获得基于每个标签的置信度值计算的_score。例如,如果您搜索“ mountain”,则显然应该仅返回ID为1的文档;如果您搜索“

landscape”,则得分2应该高于1,因为景观对2的置信度高于1(48.36 vs 33.66)。如果您搜索“ coast

landscape”,则此时间得分1应该高于2,因为doc 1在标签数组中同时包含了Coast和Landscape。我还想将分数与“

boost_multiplier”相乘,以增强某些文档的性能。

我在Elasticsearch中发现了这个问题:文档中具有自定义得分字段的影响力得分

但是,当我尝试接受的解决方案(我在我的ES服务器中启用脚本)时,无论搜索词如何,它都返回带有_score 1.0的两个文档。这是我尝试过的查询:

{

"query": {

"nested": {

"path": "tags",

"score_mode": "sum",

"query": {

"function_score": {

"query": {

"match": {

"tags.tag": "coast landscape"

}

},

"script_score": {

"script": "doc[\"confidence\"].value"

}

}

}

}

}

}

我还尝试了@yahermann在注释中建议的内容,将“ script_score”替换为“ field_value_factor”:{“ field”:“

confidence”},结果仍然相同。知道为什么它会失败,或者有更好的方法吗?

只是为了全面了解,这是我使用的映射定义:

{

"mappings": {

"photo": {

"properties": {

"created_at": {

"type": "date"

},

"description": {

"type": "text"

},

"height": {

"type": "short"

},

"id": {

"type": "keyword"

},

"tags": {

"type": "nested",

"properties": {

"tag": { "type": "string" },

"confidence": { "type": "float"}

}

},

"width": {

"type": "short"

},

"color": {

"type": "string"

},

"boost_multiplier": {

"type": "float"

}

}

}

},

"settings": {

"number_of_shards": 1

}

}

在下面@Joanna的答案之后,我尝试了查询,但是实际上,无论我在匹配查询,coast,foo,bar中放置什么,它总是返回两个文档都带有_score1.0的文档,我在elasticsearch2.4上进行了尝试Docker中的.6、5.3、5.5.1。这是我得到的答复:

HTTP/1.1 200 OK

Content-Type: application/json; charset=UTF-8

Content-Length: 1635

{"took":24,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":2,"max_score":1.0,"hits":[{"_index":"my_index","_type":"my_type","_id":"2","_score":1.0,"_source":{

"created_at" : "2017-07-31T20:43:17-04:00",

"description" : null,

"height" : 4934,

"id" : "2",

"tags" : [

{

"confidence" : 84.09123410403951,

"tag" : "mountain"

},

{

"confidence" : 56.412795342449456,

"tag" : "valley"

},

{

"confidence" : 48.36547551196872,

"tag" : "landscape"

},

{

"confidence" : 40.51100450186575,

"tag" : "mountains"

},

{

"confidence" : 33.14263528292239,

"tag" : "sky"

},

{

"confidence" : 31.064394646169404,

"tag" : "peak"

},

{

"confidence" : 29.372,

"tag" : "natural elevation"

}

],

"width" : 4016,

"color" : "#FEEBF9",

"boost_multiplier" : 1

}

},{"_index":"my_index","_type":"my_type","_id":"1","_score":1.0,"_source":{

"created_at" : "2017-07-31T20:30:14-04:00",

"description" : null,

"height" : 3213,

"id" : "1",

"tags" : [

{

"confidence" : 65.48948436785749,

"tag" : "beach"

},

{

"confidence" : 57.31950504425406,

"tag" : "sea"

},

{

"confidence" : 43.58207236617374,

"tag" : "coast"

},

{

"confidence" : 35.6857910950816,

"tag" : "sand"

},

{

"confidence" : 33.660057321079655,

"tag" : "landscape"

},

{

"confidence" : 32.53252312423727,

"tag" : "sky"

}

],

"width" : 5712,

"color" : "#0C0A07",

"boost_multiplier" : 1

}

}]}}

我在SO上发现了这一点:Elasticsearch:带有“boost_mode”的“function_score”:“replace”忽略了函数得分

它的基本含义是,如果函数不匹配,则返回1。这是有道理的,但我正在对同一文档运行查询。令人困惑。

最终我发现了问题,我很愚蠢。ES101,如果您发送GET请求以搜索api,它将返回所有得分为1.0的文档:)您应该发送POST请求…非常感谢@Joanna,它运行良好!

回答:

您可以尝试使用此查询-它结合了得分:confidenceboost_multiplier字段:

{

"query": {

"function_score": {

"query": {

"bool": {

"should": [{

"nested": {

"path": "tags",

"score_mode": "sum",

"query": {

"function_score": {

"query": {

"match": {

"tags.tag": "landscape"

}

},

"field_value_factor": {

"field": "tags.confidence",

"factor": 1,

"missing": 0

}

}

}

}

}]

}

},

"field_value_factor": {

"field": "boost_multiplier",

"factor": 1,

"missing": 0

}

}

}

}

  • id=1仅具有此术语的文档具有该术语,得分为"_score": 100.27469

  • id=2得分为“ _score”的文档:85.83046
  • id=1得分为“ _score”的文档:59.7339

由于id=2具有较高confidence字段值的文档,其得分更高。

  • id=1得分为“ _score”的文档:160.00859
  • id=2得分为“ _score”的文档:85.83046

尽管id=2具有的文档具有较高的confidence字段值,但是具有的文档id=1具有匹配的单词,因此得分更高。通过更改"factor":

1参数的值,您可以决定confidence应多少影响结果。

boost_muliplier字段

当我为一个新文档建立索引时,会发生更有趣的事情:假设它与具有的文档几乎相同,id=2但是我设置了"boost_multiplier" :

4"id": 3

{

"created_at" : "2017-07-31T20:43:17-04:00",

"description" : null,

"height" : 4934,

"id" : "3",

"tags" : [

...

{

"confidence" : 48.36547551196872,

"tag" : "landscape"

},

...

],

"width" : 4016,

"color" : "#FEEBF9",

"boost_multiplier" : 4

}

使用coast landscapeterm 运行相同的查询将返回三个文档:

  • id=3得分为“ _score”的文档:360.02664
  • id=1得分为“ _score”的文档:182.09859
  • id=2得分为“ _score”的文档:90.00666

尽管的文档id=3只有一个匹配的单词(landscape),但其boost_multiplier值大大提高了评分。在此处,"factor":

1您还可以使用决定该值应增加多少分值,并"missing": 0确定如果没有索引该字段应发生什么。

以上是 Elasticsearch:文档pt.2中具有自定义得分字段的影响力得分 的全部内容, 来源链接: utcz.com/qa/407431.html

回到顶部