一尘不染

使用Java API从Elasticsearch获取所有记录

elasticsearch

我正在尝试使用Java API从Elasticsearch获取所有记录。但我收到以下错误

n [[Wild Thing] [localhost:9300] [indices:data / read / search [phase /
dfs]]]; 嵌套:QueryPhaseExecutionException
[结果窗口太大,从+大小必须小于或等于:[10000],但为[10101]。

我的代码如下

Client client;
try {
    client = TransportClient.builder().build().
            addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("localhost"), 9300));
    int from = 1;
    int to = 100;
    while (from <= 131881) {
        SearchResponse response = client
                .prepareSearch("demo_risk_data")
                .setSearchType(SearchType.DFS_QUERY_THEN_FETCH).setFrom(from)
                .setQuery(QueryBuilders.boolQuery().mustNot(QueryBuilders.termQuery("user_agent", "")))
                .setSize(to).setExplain(true).execute().actionGet();
        if (response.getHits().getHits().length > 0) {
            for (SearchHit searchData : response.getHits().getHits()) {
                JSONObject value = new JSONObject(searchData.getSource());
                System.out.println(value.toString());
            }
        }
    }
}

目前本记录总数是131881,所以我开始from = 1to = 100再拿到100个记录,直到from <= 131881。有什么方法可以检查例如说100中的获取记录,直到Elasticsearch中没有其他记录。


阅读 723

收藏
2020-06-22

共1个答案

一尘不染

是的,您可以使用Java客户端也支持滚动API来实现。

您可以这样做:

Client client;
try {
    client = TransportClient.builder().build().
            addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("localhost"), 9300));

    QueryBuilder qb = QueryBuilders.boolQuery().mustNot(QueryBuilders.termQuery("user_agent", ""));
    SearchResponse scrollResp = client.prepareSearch("demo_risk_data")
        .addSort(SortParseElement.DOC_FIELD_NAME, SortOrder.ASC)
        .setScroll(new TimeValue(60000))
        .setQuery(qb)
        .setSize(100).execute().actionGet();

    //Scroll until no hits are returned
    while (true) {
        //Break condition: No hits are returned
        if (scrollResp.getHits().getHits().length == 0) {
            break;
        }

        // otherwise read results
        for (SearchHit hit : scrollResp.getHits().getHits()) {
            JSONObject value = new JSONObject(searchData.getSource());
            System.out.println(value.toString());
        }

        // prepare next query
        scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new TimeValue(60000)).execute().actionGet();
    }
}
2020-06-22