如何在elasticsearch中配置同义词_路径

我是Elasticsearch的新手,我想使用同义词,我在配置文件中添加了以下几行:

index :

analysis :

analyzer :

synonym :

type : custom

tokenizer : whitespace

filter : [synonym]

filter :

synonym :

type : synonym

synonyms_path: synonyms.txt

然后我创建了一个索引测试:

"mappings" : {

"test" : {

"properties" : {

"text_1" : {

"type" : "string",

"analyzer" : "synonym"

},

"text_2" : {

"search_analyzer" : "standard",

"index_analyzer" : "synonym",

"type" : "string"

},

"text_3" : {

"type" : "string",

"analyzer" : "synonym"

}

}

}

}

并使用以下数据插入类型测试:

{

"text_3" : "foo dog cat",

"text_2" : "foo dog cat",

"text_1" : "foo dog cat"

}

onymousy.txt包含“ foo,bar,baz”,当我搜索foo时,它返回我期望的结果,但是当我搜索baz或bar时,它返回零结果:

{

"query":{

"query_string":{

"query" : "bar",

"fields" : [ "text_1"],

"use_dis_max" : true,

"boost" : 1.0

}}}

结果:

{

"took":1,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":0,

"max_score":null,

"hits":[

]

}

}

回答:

我不知道,如果您的问题是因为您为“

bar”定义了错误的同义词。正如您所说的,您是一个非常新的人,我将举一个与您的例子相似的例子。我想展示一下Elasticsearch在搜索时和索引时如何处理同义词。希望能帮助到你。

首先,创建同义词文件:

foo => foo bar, baz

现在,我使用您要测试的特定设置创建索引:

curl -XPUT 'http://localhost:9200/test/' -d '{

"settings": {

"index": {

"analysis": {

"analyzer": {

"synonym": {

"tokenizer": "whitespace",

"filter": ["synonym"]

}

},

"filter" : {

"synonym" : {

"type" : "synonym",

"synonyms_path" : "synonyms.txt"

}

}

}

}

},

"mappings": {

"test" : {

"properties" : {

"text_1" : {

"type" : "string",

"analyzer" : "synonym"

},

"text_2" : {

"search_analyzer" : "standard",

"index_analyzer" : "standard",

"type" : "string"

},

"text_3" : {

"type" : "string",

"search_analyzer" : "synonym",

"index_analyzer" : "standard"

}

}

}

}

}'

请注意,onymous.txt必须与配置文件位于同一目录中,因为该路径相对于config目录。

现在为文档编制索引:

curl -XPUT 'http://localhost:9200/test/test/1' -d '{

"text_3": "baz dog cat",

"text_2": "foo dog cat",

"text_1": "foo dog cat"

}'

现在搜索

curl -XGET 'http://localhost:9200/test/_search?q=text_1:baz'

{

"took": 3,

"timed_out": false,

"_shards": {

"total": 5,

"successful": 5,

"failed": 0

},

"hits": {

"total": 1,

"max_score": 0.15342641,

"hits": [

{

"_index": "test",

"_type": "test",

"_id": "1",

"_score": 0.15342641,

"_source": {

"text_3": "baz dog cat",

"text_2": "foo dog cat",

"text_1": "foo dog cat"

}

}

]

}

}

您得到该文档,因为baz是foo的同义词,并且在索引时间foo用其同义词扩展

curl -XGET 'http://localhost:9200/test/_search?q=text_2:baz'

结果:

{

"took": 2,

"timed_out": false,

"_shards": {

"total": 5,

"successful": 5,

"failed": 0

},

"hits": {

"total": 0,

"max_score": null,

"hits": []

}

}

我没有获得成功,因为我在索引(标准分析器)时没有扩展同义词。而且,由于我正在搜索baz,并且baz不在文本中,所以没有任何结果。

curl -XGET 'http://localhost:9200/test/_search?q=text_3:foo'

{

"took": 3,

"timed_out": false,

"_shards": {

"total": 5,

"successful": 5,

"failed": 0

},

"hits": {

"total": 1,

"max_score": 0.15342641,

"hits": [

{

"_index": "test",

"_type": "test",

"_id": "1",

"_score": 0.15342641,

"_source": {

"text_3": "baz dog cat",

"text_2": "foo dog cat",

"text_1": "foo dog cat"

}

}

]

}

}

注意:text_3是“巴兹狗猫”

text_3是没有扩展同义词的索引。当我搜索foo时,它的同义词之一是“ baz”,我得到了结果。

如果要调试,可以使用_analyze端点,例如:

curl -XGET 'http://localhost:9200/test/_analyze?text=foo&analyzer=synonym&pretty=true'

结果:

{

"tokens": [

{

"token": "foo",

"start_offset": 0,

"end_offset": 3,

"type": "SYNONYM",

"position": 1

},

{

"token": "baz",

"start_offset": 0,

"end_offset": 3,

"type": "SYNONYM",

"position": 1

},

{

"token": "bar",

"start_offset": 0,

"end_offset": 3,

"type": "SYNONYM",

"position": 2

}

]

}

以上是 如何在elasticsearch中配置同义词_路径 的全部内容, 来源链接: utcz.com/qa/403762.html

回到顶部