如何获得Elasticsearch文档中每个单词的总数?

我搜索了这个问题,但找不到任何有用的答案。我想获取文档中每个单词的总数,例如,我的索引中有一些推文,并且有一条推文中写着这样的内容:“这里太无聊了,我想去我的家,甜蜜的家”。查询应返回如下响应:

It:1

is:1

so:1

boring:1

here:1

I:1

want:1

to:2

go:1

my:1

home:2

sweet:1

有可能这样做吗?

回答:

您正在寻找term

vectors利用分析仪的。这样做时,您可以定义所需的任何分析器,即阻止分析器将单词转换为根/普通形式。查看文档以获取更多详细信息。

在:

POST so/_close

PUT so/_settings

{

"settings": {

"analysis":{

"analyzer": {

"my_analyzer": {

"type": "custom",

"tokenizer": "standard",

"filter": ["lowercase", "my_stemmer"]

}

},

"filter": {

"my_stemmer": {

"type": "stemmer",

"name": "english"

}

}

}

}

}

POST so/_open

PUT so/t1/_mapping

{

"t1": {

"properties": {

"tweet": {

"type": "string",

"store": true,

"index_analyzer": "my_analyzer"

}

}

}

}

POST so/t1/1

{"tweet": "It is so boring here I want to go to my home sweet home. So I'm bored"}

出:

{

"_index": "so",

"_type": "t1",

"_id": "1",

"_version": 2,

"found": true,

"term_vectors": {

"tweet": {

"field_statistics": {

"sum_doc_freq": 13,

"doc_count": 1,

"sum_ttf": 17

},

"terms": {

"bore": {

"term_freq": 2,

...

},

"go": {

"term_freq": 1,

...

},

"here": {

"term_freq": 1,

...

},

"home": {

"term_freq": 2,

...

},

"i": {

"term_freq": 1,

...

},

"i'm": {

"term_freq": 1,

...

},

"is": {

"term_freq": 1,

...

},

"it": {

"term_freq": 1,

...

},

"my": {

"term_freq": 1,

...

},

"so": {

"term_freq": 2,

...

},

"sweet": {

"term_freq": 1,

...

},

"to": {

"term_freq": 2,

...

},

"want": {

"term_freq": 1,

...

}

}

}

}

}

以上是 如何获得Elasticsearch文档中每个单词的总数? 的全部内容, 来源链接: utcz.com/qa/414132.html

回到顶部