如何获得Elasticsearch文档中每个单词的总数?
我搜索了这个问题,但找不到任何有用的答案。我想获取文档中每个单词的总数,例如,我的索引中有一些推文,并且有一条推文中写着这样的内容:“这里太无聊了,我想去我的家,甜蜜的家”。查询应返回如下响应:
It:1is:1
so:1
boring:1
here:1
I:1
want:1
to:2
go:1
my:1
home:2
sweet:1
有可能这样做吗?
回答:
您正在寻找term
vectors利用分析仪的。这样做时,您可以定义所需的任何分析器,即阻止分析器将单词转换为根/普通形式。查看文档以获取更多详细信息。
在:
POST so/_closePUT so/_settings
{
"settings": {
"analysis":{
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "my_stemmer"]
}
},
"filter": {
"my_stemmer": {
"type": "stemmer",
"name": "english"
}
}
}
}
}
POST so/_open
PUT so/t1/_mapping
{
"t1": {
"properties": {
"tweet": {
"type": "string",
"store": true,
"index_analyzer": "my_analyzer"
}
}
}
}
POST so/t1/1
{"tweet": "It is so boring here I want to go to my home sweet home. So I'm bored"}
出:
{ "_index": "so",
"_type": "t1",
"_id": "1",
"_version": 2,
"found": true,
"term_vectors": {
"tweet": {
"field_statistics": {
"sum_doc_freq": 13,
"doc_count": 1,
"sum_ttf": 17
},
"terms": {
"bore": {
"term_freq": 2,
...
},
"go": {
"term_freq": 1,
...
},
"here": {
"term_freq": 1,
...
},
"home": {
"term_freq": 2,
...
},
"i": {
"term_freq": 1,
...
},
"i'm": {
"term_freq": 1,
...
},
"is": {
"term_freq": 1,
...
},
"it": {
"term_freq": 1,
...
},
"my": {
"term_freq": 1,
...
},
"so": {
"term_freq": 2,
...
},
"sweet": {
"term_freq": 1,
...
},
"to": {
"term_freq": 2,
...
},
"want": {
"term_freq": 1,
...
}
}
}
}
}
以上是 如何获得Elasticsearch文档中每个单词的总数? 的全部内容, 来源链接: utcz.com/qa/414132.html