我从elasticsearch提取数据,如下所示:
> packageVersion("elastic") [1] '0.7.8' # data extract body <- list(query=list(range=list(timestamp=list(gte="2016-10-13", lte="2016-10-15")))) b3 <- Search(index="myIndex", sort=c("timestamp:desc"), fields=c('timestamp','A','B','C','D','E','F','G'), body=body, size=3)
可以提取第一个和第二个元素(编辑以节省空间): $ hits $ hits [[1]] $ fields $ F,E,B,G,C,A,D,timestamp $ hits $ hits [[2] ] $ fields $ F,E,B,G,C,A,D,时间戳
第三个元素未完全提取为: $ hits $ hits [[3]] $ fields $ C,A,B,D,timestamp
==我按照这篇文章将列表转换为数据框: 将PackageElastic(嵌套列表?)的R输出转换为data.frame或JSON 第一个和第二个元素已完美加载。 第三个元素未正确加载,因为未提取完整元素,从而导致以下错误:
# (optional) verify that all hits expand to the same length # (should be true for data intended to be in a table format) stopifnot( sapply( b3$hits$hits, function(x) {!(length(unlist(x)) - length(unlist(b3$hits$hits[[1]])))} ) ) Error: sapply(b3$hits$hits, function(x) { .... are not all TRUE # load into the dataframe # count number of columns, use unlist() to convert # nested lists to a vector, use the first hit as proxy nColumns <- length(unlist(b3$hits$hits[[1]])) # fetch column names ... as above nNames <- names(unlist(b3$hits$hits[[1]])) # unlist all hits and convert to matrix with ncol Columns, don't forget byrow=TRUE! df.b3 <- data.frame(matrix(unlist(b3$hits$hits), ncol=nColumns, byrow=TRUE)) Warning message: In matrix(unlist(b3$hits$hits), ncol = nColumns, byrow = TRUE) : data length [33] is not a sub-multiple or multiple of the number of columns [12] >
注意:变量D,E,F,G中的某些记录包含空(NULL)和’-‘值。我怀疑这可能会导致提取物出现问题。
如果您中有人遇到类似问题并找到了解决方案,我希望能得到一些反馈。 非常感谢。
作者在这里 elastic
elastic
我们不试图将输出强制转换为data.frame的输出,因为它的变量是如此之大,以至于我们经常会遇到错误。但是我们确实允许您传递一个选项以jsonlite强制执行data.frame(通过asdf参数, 作为data.frame ),因为这永远不会失败。
jsonlite
data.frame
asdf
如果处理列表输出,我将使用之一,dplyr或者data.table如果要返回列表。
dplyr
data.table
为了重现性:
library(elastic) if (!index_exists("shakespeare")) { shakespeare <- system.file("examples", "shakespeare_data.json", package = "elastic") docs_bulk(shakespeare) } res <- Search(index="shakespeare", fields=c('play_name','speaker')) out <- lapply(res$hits$hits, function(x) unlist(x$fields, FALSE))
library(dplyr) bind_rows(out) #> # A tibble: 10 × 2 #> play_name speaker #> <chr> <chr> #> 1 Henry IV #> 2 Henry IV KING HENRY IV #> 3 Henry IV KING HENRY IV #> 4 Henry IV KING HENRY IV #> 5 Henry IV KING HENRY IV #> 6 Henry IV KING HENRY IV #> 7 Henry IV KING HENRY IV #> 8 Henry IV KING HENRY IV #> 9 Henry IV WESTMORELAND #> 10 Henry IV WESTMORELAND
library(data.table) rbindlist(out, fill = TRUE, use.names = TRUE) #> play_name speaker #> 1: Henry IV #> 2: Henry IV KING HENRY IV #> 3: Henry IV KING HENRY IV #> 4: Henry IV KING HENRY IV #> 5: Henry IV KING HENRY IV #> 6: Henry IV KING HENRY IV #> 7: Henry IV KING HENRY IV #> 8: Henry IV KING HENRY IV #> 9: Henry IV WESTMORELAND #> 10: Henry IV WESTMORELAND
或者,使用asdf参数,jsonlite::fromJSON如果可能的话,该参数在内部定向为解析到data.frame。
jsonlite::fromJSON
res <- Search(index="shakespeare", fields=c('play_name','speaker'), asdf = TRUE) res$hits$hits$fields #> play_name speaker #> 1 Henry IV #> 2 Henry IV KING HENRY IV #> 3 Henry IV KING HENRY IV #> 4 Henry IV KING HENRY IV #> 5 Henry IV KING HENRY IV #> 6 Henry IV KING HENRY IV #> 7 Henry IV KING HENRY IV #> 8 Henry IV KING HENRY IV #> 9 Henry IV WESTMORELAND #> 10 Henry IV WESTMORELAND
使用:
v3.3.2
v0.7.8.9000
Elasticsearch
v2.3.4