Using na.strings
论证中read.table/read.csv
我们可以将缺失值转换为真实值NA
从而将“值”列读取为“数字”。和dplyr
,我们可以改变replace
the NAs
在多个值列中mean
该列的。
library(dplyr)
df1 %>%
group_by(part) %>%
mutate_each(funs(replace(., which(is.na(.)), mean(., na.rm=TRUE))),
starts_with('value'))
或者类似的选项data.table
library(data.table)
nm1 <- grep('value', names(df1))
setDT(df1)[, (nm1) := lapply(.SD, function(x) replace(x,
which(is.na(x)), mean(x, na.rm=TRUE))), by = part,.SDcols=nm1]
data
df1 <- read.table(text="part id value
a 1 23.4
a 2 23.8
a 3 45.6
a 4 34.7
a 5 Na
b 1 45.2
b 2 34.6
b 3 Na
b 4 30.9
b 5 28.1", header=TRUE, na.strings="Na", stringsAsFactors=FALSE)