我正在尝试使用Java API从Elasticsearch获取所有记录。但我收到以下错误
n [[Wild Thing] [localhost:9300] [indices:data / read / search [phase / dfs]]]; 嵌套:QueryPhaseExecutionException [结果窗口太大,从+大小必须小于或等于:[10000],但为[10101]。
我的代码如下
Client client; try { client = TransportClient.builder().build(). addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("localhost"), 9300)); int from = 1; int to = 100; while (from <= 131881) { SearchResponse response = client .prepareSearch("demo_risk_data") .setSearchType(SearchType.DFS_QUERY_THEN_FETCH).setFrom(from) .setQuery(QueryBuilders.boolQuery().mustNot(QueryBuilders.termQuery("user_agent", ""))) .setSize(to).setExplain(true).execute().actionGet(); if (response.getHits().getHits().length > 0) { for (SearchHit searchData : response.getHits().getHits()) { JSONObject value = new JSONObject(searchData.getSource()); System.out.println(value.toString()); } } } }
目前本记录总数是131881,所以我开始from = 1和to = 100再拿到100个记录,直到from <= 131881。有什么方法可以检查例如说100中的获取记录,直到Elasticsearch中没有其他记录。
from = 1
to = 100
from <= 131881
是的,您可以使用Java客户端也支持的滚动API来实现。
您可以这样做:
Client client; try { client = TransportClient.builder().build(). addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("localhost"), 9300)); QueryBuilder qb = QueryBuilders.boolQuery().mustNot(QueryBuilders.termQuery("user_agent", "")); SearchResponse scrollResp = client.prepareSearch("demo_risk_data") .addSort(SortParseElement.DOC_FIELD_NAME, SortOrder.ASC) .setScroll(new TimeValue(60000)) .setQuery(qb) .setSize(100).execute().actionGet(); //Scroll until no hits are returned while (true) { //Break condition: No hits are returned if (scrollResp.getHits().getHits().length == 0) { break; } // otherwise read results for (SearchHit hit : scrollResp.getHits().getHits()) { JSONObject value = new JSONObject(searchData.getSource()); System.out.println(value.toString()); } // prepare next query scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new TimeValue(60000)).execute().actionGet(); } }