有人可以告诉我如何编写将汇总(汇总和计数)有关我的文档内容的Python语句吗?
脚本
from datetime import datetime from elasticsearch_dsl import DocType, String, Date, Integer from elasticsearch_dsl.connections import connections from elasticsearch import Elasticsearch from elasticsearch_dsl import Search, Q # Define a default Elasticsearch client client = connections.create_connection(hosts=['http://blahblahblah:9200']) s = Search(using=client, index="attendance") s = s.execute() for tag in s.aggregations.per_tag.buckets: print (tag.key)
输出值
File "/Library/Python/2.7/site-packages/elasticsearch_dsl/utils.py", line 106, in __getattr__ '%r object has no attribute %r' % (self.__class__.__name__, attr_name)) AttributeError: 'Response' object has no attribute 'aggregations'
是什么原因造成的?“ aggregations”关键字是否错误?我还需要导入其他软件包吗?如果“出勤”索引中的文档有一个名为emailAddress的字段,我将如何计算哪些文档具有该字段的值?
首先。现在我注意到,我在这里写的内容实际上没有定义聚合。对我来说,有关如何使用它的文档不是很可读。使用我上面写的内容,我将进行扩展。我正在更改索引名称以使其成为一个更好的示例。
from datetime import datetime from elasticsearch_dsl import DocType, String, Date, Integer from elasticsearch_dsl.connections import connections from elasticsearch import Elasticsearch from elasticsearch_dsl import Search, Q # Define a default Elasticsearch client client = connections.create_connection(hosts=['http://blahblahblah:9200']) s = Search(using=client, index="airbnb", doc_type="sleep_overs") s = s.execute() # invalid! You haven't defined an aggregation. #for tag in s.aggregations.per_tag.buckets: # print (tag.key) # Lets make an aggregation # 'by_house' is a name you choose, 'terms' is a keyword for the type of aggregator # 'field' is also a keyword, and 'house_number' is a field in our ES index s.aggs.bucket('by_house', 'terms', field='house_number', size=0)
在上面,我们为每个门牌号创建1个存储桶。因此,存储桶的名称将是门牌号。ElasticSearch(ES)始终会提供适合该存储桶的文档的文档计数。Size = 0表示要使用所有结果,因为ES的默认设置是仅返回10个结果(或您的开发人员设置为执行的任何结果)。
# This runs the query. s = s.execute() # let's see what's in our results print s.aggregations.by_house.doc_count print s.hits.total print s.aggregations.by_house.buckets for item in s.aggregations.by_house.buckets: print item.doc_count
我之前的错误是认为elasticsearch查询默认具有聚合。您可以自己定义它们,然后执行它们。然后,您的响应可以通过您提到的聚合器进行拆分。
上面的CURL应该看起来像: 注意:我使用SENSE为Google Chrome浏览器提供一个ElasticSearch插件/扩展/附加组件。在SENSE中,您可以使用//注释掉。
POST /airbnb/sleep_overs/_search { // the size 0 here actually means to not return any hits, just the aggregation part of the result "size": 0, "aggs": { "by_house": { "terms": { // the size 0 here means to return all results, not just the the default 10 results "field": "house_number", "size": 0 } } } }
解决方法。DSL的GIT上的某人告诉我忘记翻译,而只是使用这种方法。它更简单,您只需用CURL编写难懂的内容。这就是为什么我称其为变通方法。
# Define a default Elasticsearch client client = connections.create_connection(hosts=['http://blahblahblah:9200']) s = Search(using=client, index="airbnb", doc_type="sleep_overs") # how simple we just past CURL code here body = { "size": 0, "aggs": { "by_house": { "terms": { "field": "house_number", "size": 0 } } } } s = Search.from_dict(body) s = s.index("airbnb") s = s.doc_type("sleepovers") body = s.to_dict() t = s.execute() for item in t.aggregations.by_house.buckets: # item.key will the house number print item.key, item.doc_count
希望这可以帮助。现在,我在CURL中设计所有内容,然后使用Python语句剥离结果以获取所需的内容。这有助于进行多个级别的聚合(子聚合)。