我正在尝试根据一些分隔符将数据框列拆分为多个列。我在这个网站上找到了各种答案,并且我正在尝试寻找不同的工作方式。我遇到麻烦了ldply
。问题是输出strsplit
是不同长度元素的列表。这是一些示例数据、有效的数据以及我正在尝试的数据ldply
.
FirstName <- c("a,b", "c d", "e, f", "gh")
OtherInfo <- c(1:4)
df <- data.frame(FirstName, OtherInfo, stringsAsFactors = FALSE)
print(df)
#Solution with cSplit
library(splitstackshape)
cs <- cSplit(df, "FirstName", "[, ]+", fixed = FALSE)
#Solution with strsplit and as.data.frame
#Feels like a hack, and I have "gh" repeated
#Question: Is there a better way using a similar approach?
df2 <- t(as.data.frame(strsplit(df$FirstName, "[, ]+", fixed = FALSE)))
row.names(df2) <- NULL
#Question: Solution with strsplit and plyr
library(plyr)
list1 <- strsplit(df$FirstName, "[, ]+", fixed = FALSE)
df3 <- ldply(list1)
Error:
#Error in list_to_dataframe(res, attr(.data, "split_labels"), .id, id_as_factor) :
# Results do not have equal lengths
我编写了此修复程序来插入 NA 值,但感觉不是最好的方法。有没有更好的办法?
MAX = max(sapply(list1, length))
func1 <- function(x, MAX) {
vec <- c(x, rep(NA, MAX-length(x)))
return(vec)
}
list2 <- lapply(list1, func1, MAX = MAX)
list2
df3.1 <- ldply(list2)