我正在尝试理解 dplyr。我按组、箱和符号拆分数据框中的值,并且尝试获取每个组/箱/符号组合的平均值。我想输出一个数据框,其中包含每个组/箱/符号组合的这些计数以及每个组的总数。我想我有它,但有时与 ddplyr 的输出相比,我在基本 R 中得到不同的值。我这样做正确吗?也很扭曲……有更直接的方法吗?
library(ggplot2)
df <- data.frame(
id = sample(LETTERS[1:3], 100, replace=TRUE),
tobin = rnorm(1000),
value = rnorm(1000)
)
df$tobin[sample(nrow(df), 10)]=0
df$bin = cut_interval(abs(df$tobin), length=1)
df$sign = ifelse(df$tobin==0, "NULL", ifelse(df$tobin>0, "-", "+"))
# Find mean of value by group, bin, and sign using dplyr
library(dplyr)
res <- df %>% group_by(id, bin, sign) %>%
summarise(Num = length(bin), value=mean(value,na.rm=TRUE))
res %>% group_by(id) %>%
summarise(total= sum(Num))
res=data.frame(res)
total=data.frame(total)
res$total = total[match(res$id, total$id),"total"]
res[res$id=="A" & res$bin=="[0,1]" & res$sign=="NULL",]
# Check in base R if mean by group, bin, and sign is correct # Sometimes not?
groupA = df[df$id=="A" & df$bin=="[0,1]" & df$sign=="NULL",]
mean(groupA$value, na.rm=T)
我快要疯了,因为它对我的数据不起作用,而且这个命令只是重复整个数据集的平均值:
ddply(df, .(id, bin, sign), summarize, mean = mean(value,na.rm=TRUE))
其中mean等于mean(value,na.rm=TRUE),完全忽略分组...所有组都是因子,并且值是数字...
然而这有效:
with(df, aggregate(df$value, by = list(id, bin, sign), FUN = function(x) c(mean(x))))
请帮我..