我搜索了这个问题,但找不到任何有用的答案。我想获取文档中每个单词的总数,例如,我的索引中有一些推文,并且有一条推文中写着这样的内容:“这里太无聊了,我想去我的家,甜蜜的家”。查询应返回如下响应:
It:1 is:1 so:1 boring:1 here:1 I:1 want:1 to:2 go:1 my:1 home:2 sweet:1
有可能这样做吗?
您正在寻找term vectors利用分析仪的。这样做时,您可以定义所需的任何分析器,即阻止分析器将单词转换为根/普通形式。查看文档以获取更多详细信息。
term vectors
在:
POST so/_close PUT so/_settings { "settings": { "analysis":{ "analyzer": { "my_analyzer": { "type": "custom", "tokenizer": "standard", "filter": ["lowercase", "my_stemmer"] } }, "filter": { "my_stemmer": { "type": "stemmer", "name": "english" } } } } } POST so/_open PUT so/t1/_mapping { "t1": { "properties": { "tweet": { "type": "string", "store": true, "index_analyzer": "my_analyzer" } } } } POST so/t1/1 {"tweet": "It is so boring here I want to go to my home sweet home. So I'm bored"}
出:
{ "_index": "so", "_type": "t1", "_id": "1", "_version": 2, "found": true, "term_vectors": { "tweet": { "field_statistics": { "sum_doc_freq": 13, "doc_count": 1, "sum_ttf": 17 }, "terms": { "bore": { "term_freq": 2, ... }, "go": { "term_freq": 1, ... }, "here": { "term_freq": 1, ... }, "home": { "term_freq": 2, ... }, "i": { "term_freq": 1, ... }, "i'm": { "term_freq": 1, ... }, "is": { "term_freq": 1, ... }, "it": { "term_freq": 1, ... }, "my": { "term_freq": 1, ... }, "so": { "term_freq": 2, ... }, "sweet": { "term_freq": 1, ... }, "to": { "term_freq": 2, ... }, "want": { "term_freq": 1, ... } } } } }