我创建了这个代表性数据框,它使用 for 循环分配条件类别。
df <- data.frame(Date=c("08/29/2011", "08/29/2011", "08/30/2011", "08/30/2011", "08/30/2011", "08/29/2012", "08/29/2012", "01/15/2012", "08/29/2012"),
Time=c("09:45", "10:00", "13:00", "13:30", "10:14", "9:09", "11:23", "17:06", "12:20"),
Diff = c(0.2,4.3,6.5,15.0, 16.5, 31, 30.2, 21.9, 1.9))
df1<- df %>%
mutate(Accuracy=ifelse(Diff<=3, "Excellent", "TBD"))
for(i in 1:nrow(df1)){
if(df1$Diff[i]>3&&df1$Diff[i]<=10){
df1$Accuracy[i]<-"Good"}
if(df1$Diff[i]>10&&df1$Diff[i]<=15){
df1$Accuracy[i]<-"Fair"}
if(df1$Diff[i]>15&&df1$Diff[i]<=30){
df1$Accuracy[i]<-"Poor"}
if(df1$Diff[i]>30){
df1$Accuracy[i]<-"Unacceptable"}
}
我的实际数据集非常大,并且读取表明 for 循环通常不是在 R 中编码的最有效方法。我相信我可以通过为每个条件创建一个逻辑向量来完成同样的事情,并且在每个向量中,当每个条件为 TRUE 时遇见了。然后,我可以通过子集来分配值,例如 df1$Accuracy[Good]
这是我失败的尝试。它们返回不正确的 NA 或不正确的逻辑向量。我不明白的许多事情之一是 lapply 如何知道遍历列或行。
Good<-apply(df1, 1, function(x) ifelse(df1$Diff[x]>3&& df1$Diff[x]<=10, TRUE, FALSE)) #logical, TRUE where condition is true
Good<-unlist(lapply(df1$Diff, function(x) {(ifelse(df1$Diff[x]>3&& df1$Diff[x]<=10, TRUE, FALSE))}))
更新:嵌套的 ifelse 语句可以工作,但仍然欢迎任何有关如何使用 apply 的建议。
mutate(Accuracy=ifelse(pDiff<=3, "Excellent",
ifelse(pDiff>3&pDiff<=10, "Good",
ifelse(pDiff>10&pDiff<=15, "Fair",
ifelse(pDiff>15&pDiff<30, "Poor",
ifelse(Diff>30, "Unpublishable", "TBD"))))))