我有一个geojson文件,其中包含一个位置列表,每个位置都有一个经度,纬度和时间戳。请注意,经度和纬度乘以10000000。
{ "locations" : [ { "timestampMs" : "1461820561530", "latitudeE7" : -378107308, "longitudeE7" : 1449654070, "accuracy" : 35, "junk_i_want_to_save_but_ignore" : [ { .. } ] }, { "timestampMs" : "1461820455813", "latitudeE7" : -378107279, "longitudeE7" : 1449673809, "accuracy" : 33 }, { "timestampMs" : "1461820281089", "latitudeE7" : -378105184, "longitudeE7" : 1449254023, "accuracy" : 35 }, { "timestampMs" : "1461820155814", "latitudeE7" : -378177434, "longitudeE7" : 1429653949, "accuracy" : 34 } ..
这些位置中的许多位置将是相同的物理位置(例如,用户的家),但显然经度和纬度可能并不完全相同。
我想使用Elastic Search及其Geo功能来生成最常见位置的排名列表,如果这些位置位于彼此之间(例如100m之内),则认为这些位置相同?
对于每个公共位置,如果可能的话,我还希望它们在该位置的所有时间戳列表!
非常感谢您通过示例查询开始学习!
提前谢谢了。
为了使其工作,您需要像这样修改映射:
PUT /locations { "mappings": { "location": { "properties": { "location": { "type": "geo_point" }, "timestampMs": { "type": "long" }, "accuracy": { "type": "long" } } } } }
然后,当您为文档建立索引时,需要将纬度和经度除以10000000,然后像这样进行索引:
PUT /locations/location/1 { "timestampMs": "1461820561530", "location": { "lat": -37.8103308, "lon": 14.4967407 }, "accuracy": 35 }
最后,您的搜索查询如下…
POST /locations/location/_search { "aggregations": { "zoomedInView": { "filter": { "geo_bounding_box": { "location": { "top_left": "-37, 14", "bottom_right": "-38, 15" } } }, "aggregations": { "zoom1": { "geohash_grid": { "field": "location", "precision": 6 }, "aggs": { "ts": { "date_histogram": { "field": "timestampMs", "interval": "15m", "format": "DDD yyyy-MM-dd HH:mm" } } } } } } } }
…将产生以下结果:
{ "aggregations": { "zoomedInView": { "doc_count": 1, "zoom1": { "buckets": [ { "key": "k362cu", "doc_count": 1, "ts": { "buckets": [ { "key_as_string": "Thu 2016-04-28 05:15", "key": 1461820500000, "doc_count": 1 } ] } } ] } } } }
更新
根据我们的讨论,这是一个可以为您服务的解决方案。使用Logstash,您可以调用您的API并检索大的JSON文档(使用http_pollerinput),提取/转换所有位置并将其轻松沉入Elasticsearch(带有elasticsearch输出)。
http_poller
elasticsearch
这是如何格式化我最初回答中描述的每个事件的方式。
split
Logstash配置locations.conf:
locations.conf
input { http_poller { urls => { get_locations => { method => get url => "http://your_api.com/locations.json" headers => { Accept => "application/json" } } } request_timeout => 60 interval => 86400000 codec => "json" } } filter { split { field => "locations" } ruby { code => " event['location'] = { 'lat' => event['locations']['latitudeE7'] / 10000000.0, 'lon' => event['locations']['longitudeE7'] / 10000000.0 } " } mutate { add_field => { "timestampMs" => "%{[locations][timestampMs]}" "accuracy" => "%{[locations][accuracy]}" "junk_i_want_to_save_but_ignore" => "%{[locations][junk_i_want_to_save_but_ignore]}" } remove_field => [ "locations", "@timestamp", "@version" ] } } output { elasticsearch { hosts => ["localhost:9200"] index => "locations" document_type => "location" } }
然后可以使用以下命令运行:
bin/logstash -f locations.conf
运行该命令后,您可以启动搜索查询,并且应该得到期望的结果。