elasticsearch禁用术语频率评分

我想更改elasticsearch中的评分系统以摆脱计数术语的多次出现。例如,我想:elasticsearch禁用术语频率评分

“得克萨斯州得克萨斯州得克萨斯州”

“得克萨斯”

出来的分数相同。我发现这个键盘映射elasticsearch表示将禁用词频统计,但我的搜索不出来的相同比分:

"mappings":{ 

"business": {

"properties" : {

"name" : {

"type" : "string",

"index_options" : "docs",

"norms" : { "enabled": false}}

}

}

}

}

任何帮助将不胜感激,我一直没能找到很多这方面的信息。

编辑:

我加入我的搜索代码,当我使用的解释得到返回的东西。

我的搜索代码:

Settings settings = ImmutableSettings.settingsBuilder().put("cluster.name", "escluster").build(); 

Client client = new TransportClient(settings)

.addTransportAddress(new InetSocketTransportAddress("127.0.0.1", 9300));

SearchRequest request = Requests.searchRequest("businesses")

.source(SearchSourceBuilder.searchSource().query(QueryBuilders.boolQuery()

.should(QueryBuilders.matchQuery("name", "Texas")

.minimumShouldMatch("1")))).searchType(SearchType.DFS_QUERY_THEN_FETCH);

ExplainRequest request2 = client.prepareIndex("businesses", "business")

,当我解释我搜索得到:

"took" : 14, 

"timed_out" : false,

"_shards" : {

"total" : 3,

"successful" : 3,

"failed" : 0

},

"hits" : {

"total" : 2,

"max_score" : 1.0,

"hits" : [ {

"_shard" : 1,

"_node" : "BTqBPVDET5Kr83r-CYPqfA",

"_index" : "businesses",

"_type" : "business",

"_id" : "AU9U5KBks4zEorv9YI4n",

"_score" : 1.0,

"_source":{

"name" : "texas"

}

,

"_explanation" : {

"value" : 1.0,

"description" : "weight(_all:texas in 0) [PerFieldSimilarity], result of:",

"details" : [ {

"value" : 1.0,

"description" : "fieldWeight in 0, product of:",

"details" : [ {

"value" : 1.0,

"description" : "tf(freq=1.0), with freq of:",

"details" : [ {

"value" : 1.0,

"description" : "termFreq=1.0"

} ]

}, {

"value" : 1.0,

"description" : "idf(docFreq=2, maxDocs=3)"

}, {

"value" : 1.0,

"description" : "fieldNorm(doc=0)"

} ]

} ]

}

}, {

"_shard" : 1,

"_node" : "BTqBPVDET5Kr83r-CYPqfA",

"_index" : "businesses",

"_type" : "business",

"_id" : "AU9U5K6Ks4zEorv9YI4o",

"_score" : 0.8660254,

"_source":{

"name" : "texas texas texas"

}

,

"_explanation" : {

"value" : 0.8660254,

"description" : "weight(_all:texas in 0) [PerFieldSimilarity], result of:",

"details" : [ {

"value" : 0.8660254,

"description" : "fieldWeight in 0, product of:",

"details" : [ {

"value" : 1.7320508,

"description" : "tf(freq=3.0), with freq of:",

"details" : [ {

"value" : 3.0,

"description" : "termFreq=3.0"

} ]

}, {

"value" : 1.0,

"description" : "idf(docFreq=2, maxDocs=3)"

}, {

"value" : 0.5,

"description" : "fieldNorm(doc=0)"

} ]

} ]

}

} ]

}

看起来它仍在考虑频率和文档频率。有任何想法吗?对不起格式不好,我不知道为什么它显得那么怪异。

编辑编辑:

我从浏览器搜索http://localhost:9200/businesses/business/_search?pretty=true&qname=texas 代码:

{ 

"took" : 2,

"timed_out" : false,

"_shards" : {

"total" : 3,

"successful" : 3,

"failed" : 0

},

"hits" : {

"total" : 4,

"max_score" : 1.0,

"hits" : [ {

"_index" : "businesses",

"_type" : "business",

"_id" : "AU9YcCKjKvtg8NgyozGK",

"_score" : 1.0,

"_source":{"business" : {

"name" : "texas texas texas texas" }

}

}, {

"_index" : "businesses",

"_type" : "business",

"_id" : "AU9YateBKvtg8Ngyoy-p",

"_score" : 1.0,

"_source":{

"name" : "texas" }

}, {

"_index" : "businesses",

"_type" : "business",

"_id" : "AU9YavVnKvtg8Ngyoy-4",

"_score" : 1.0,

"_source":{

"name" : "texas texas texas" }

}, {

"_index" : "businesses",

"_type" : "business",

"_id" : "AU9Yb7NgKvtg8NgyozFf",

"_score" : 1.0,

"_source":{"business" : {

"name" : "texas texas texas" }

}

} ]

}

}

它发现的所有4个对象我在那里,有他们都以同样的比分。 当我运行我的Java API搜索与解释,我得到:

{ 

"took" : 2,

"timed_out" : false,

"_shards" : {

"total" : 3,

"successful" : 3,

"failed" : 0

},

"hits" : {

"total" : 2,

"max_score" : 1.287682,

"hits" : [ {

"_shard" : 1,

"_node" : "BTqBPVDET5Kr83r-CYPqfA",

"_index" : "businesses",

"_type" : "business",

"_id" : "AU9YateBKvtg8Ngyoy-p",

"_score" : 1.287682,

"_source":{

"name" : "texas" }

,

"_explanation" : {

"value" : 1.287682,

"description" : "weight(name:texas in 0) [PerFieldSimilarity], result of:",

"details" : [ {

"value" : 1.287682,

"description" : "fieldWeight in 0, product of:",

"details" : [ {

"value" : 1.0,

"description" : "tf(freq=1.0), with freq of:",

"details" : [ {

"value" : 1.0,

"description" : "termFreq=1.0"

} ]

}, {

"value" : 1.287682,

"description" : "idf(docFreq=2, maxDocs=4)"

}, {

"value" : 1.0,

"description" : "fieldNorm(doc=0)"

} ]

} ]

}

}, {

"_shard" : 1,

"_node" : "BTqBPVDET5Kr83r-CYPqfA",

"_index" : "businesses",

"_type" : "business",

"_id" : "AU9YavVnKvtg8Ngyoy-4",

"_score" : 1.1151654,

"_source":{

"name" : "texas texas texas" }

,

"_explanation" : {

"value" : 1.1151654,

"description" : "weight(name:texas in 0) [PerFieldSimilarity], result of:",

"details" : [ {

"value" : 1.1151654,

"description" : "fieldWeight in 0, product of:",

"details" : [ {

"value" : 1.7320508,

"description" : "tf(freq=3.0), with freq of:",

"details" : [ {

"value" : 3.0,

"description" : "termFreq=3.0"

} ]

}, {

"value" : 1.287682,

"description" : "idf(docFreq=2, maxDocs=4)"

}, {

"value" : 0.5,

"description" : "fieldNorm(doc=0)"

} ]

} ]

}

} ]

}

}

回答:

看起来像一个不能覆盖index options了场场后就一直初始集映射

例子:

put test 

put test/business/_mapping

{

"properties": {

"name": {

"type": "string",

"index_options": "freqs",

"norms": {

"enabled": false

}

}

}

}

put test/business/_mapping

{

"properties": {

"name": {

"type": "string",

"index_options": "docs",

"norms": {

"enabled": false

}

}

}

}

get test/business/_mapping

{

"test": {

"mappings": {

"business": {

"properties": {

"name": {

"type": "string",

"norms": {

"enabled": false

},

"index_options": "freqs"

}

}

}

}

}

}

你将不得不重新创建索引来获取新的映射

以上是 elasticsearch禁用术语频率评分 的全部内容, 来源链接: utcz.com/qa/265411.html

回到顶部