这是一个解决方案,可让您利用对数据字段名称和类的先验知识。此外,通过避免重复调用as.data.frame
和单个调用plyr
's rbind.fill()
(两者都是时间密集型)它在示例数据上的运行速度大约快 60 倍。
cols <- c("id", "ls", "ts", "l.lo","l.tz", "l.t", "l.ac", "l.la", "l.pr", "m")
numcols <- c("l.lo", "l.t", "l.ac", "l.la")
## Flatten each top-level list element, converting it to a character vector.
x <- lapply(obj$data, unlist)
## Extract fields that might be present in each record (returning NA if absent).
y <- sapply(x, function(X) X[cols])
## Convert to a data.frame with columns of desired classes.
z <- as.data.frame(t(y), stringsAsFactors=FALSE)
z[numcols] <- lapply(numcols, function(X) as.numeric(as.character(z[[X]])))
Edit:为了确认我的方法给出的结果与原始问题中的结果相同,我运行了以下测试。 (请注意,在这两种情况下我都设置了stringsAsFactors=FALSE
以避免因子水平的排序出现无意义的差异。)
flatdata <- lapply(obj$data, as.data.frame, stringsAsFactors=FALSE)
mydf <- rbind.fill(flatdata)
identical(z, mydf)
# [1] TRUE
进一步编辑:
仅供记录,这里是上述内容的替代版本,另外还会自动:
- 查找所有数据字段的名称
- 决定他们的类别/类型
- 将最终 data.frame 的列强制为正确的类
.
dat <- obj$data
## Find the names and classes of all fields
fields <- unlist(lapply(xx, function(X) rapply(X, class, how="unlist")))
fields <- fields[unique(names(fields))]
cols <- names(fields)
## Flatten each top-level list element, converting it to a character vector.
x <- lapply(dat, unlist)
## Extract fields that might be present in each record (returning NA if absent).
y <- sapply(x, function(X) X[cols])
## Convert to a data.frame with columns of desired classes.
z <- as.data.frame(t(y), stringsAsFactors=FALSE)
## Coerce columns of z (all currently character) back to their original type
z[] <- lapply(seq_along(fields), function(i) as(z[[cols[i]]], fields[i]))