我创建了一个 RMarkdown 来检查错误,其输出print
指定错误以及需要更正哪些行号的语句(这将检查df
以下)。我创建了另一个数据框(df.index
在下面的示例中)来跟踪每列需要更正的行(即df
)。本质上,我需要添加一个列来存储需要为每列进行更正的行列表df
。然后,当我进行更多错误检查时,我需要将其附加到给定行中的列表中df.index
并将新列表添加到其他行rows
新创建的列summary
数据框。
我已经浏览了列表上的数十个 SO 条目,但找不到一个好的答案。这是我尝试过的,我用这个最小的例子来展示。这段代码确实有效,它给了我我想要的输出。然而,它非常冗长,我的项目团队中的其他人可能很难阅读和理解它。
最小的例子
Data
library(dplyr)
# Dataframe that contains the dataset that I'm checking for errors.
df <-
structure(
list(
`1.1.` = c("Andrew", "Max", "Sylvia", NA, "1",
NA, NA, "Jason"),
`1.2.` = c(1, 2, 2, NA, NA, 5, 3, NA),
`1.3.` = c(
"cool",
"amazing",
"wonderful",
"okay",
NA,
"sweet",
"chocolate",
"fine"
)
),
class = "data.frame",
row.names = c(NA, -8L)
)
# Dataframe that contains the column numbers and names, which will be used to create a summary of what rows need to be changed for each column.
df.index <-
structure(list(
number = c("1.1.", "1.2.", "1.3."),
name = c("name",
"number", "category")
),
class = "data.frame",
row.names = c(NA, -3L))
我尝试过的
obs <- "1.1."
na.index <- which(is.na(df$`1.1.`))
summary <- df.index %>%
dplyr::mutate(rows = ifelse(number == obs, list(na.index), NA))
# Check to see if there are any numeric values in this character column. Adding 6 just to have a duplicate for this example.
na.index2 <-
c(which(!is.na(as.numeric(
as.character(df$`1.1.`)
))), 6)
# Append new list from na.index2 to the existing list in row 1 (or 1.1.), and keep only the unique values, excluding NAs.
summary <- summary %>%
dplyr::mutate(rows = ifelse(number == obs, list(unique(na.omit(
unlist(append(rows, list(na.index2)))
))), NA))
# Column 1.2. in df.
obs <- "1.2."
na.index3 <- which(df$`1.2.` > 2)
summary <- summary %>%
dplyr::mutate(rows = ifelse(number == obs, list(na.index3), rows))
na.index4 <- which(df$`1.2.` == 2)
summary <- summary %>%
dplyr::mutate(rows = ifelse(number == obs, list(unique(na.omit(
unlist(append(rows[2], list(na.index4)))
))), rows))
# Column 1.3. in df.
obs <- "1.3."
na.index5 <- which(df$`1.3.` == "okay")
summary <- summary %>%
dplyr::mutate(rows = ifelse(number == obs, list(na.index5), rows))
Output(这也是预期的输出)
summary
number name rows
1 1.1. name 4, 6, 7, 5
2 1.2. number 6, 7, 2, 3
3 1.3. category 4
我在上面的示例中得到了所有正确的行,但是必须有一种更简单的方法来执行此操作,并且无需创建obs
并且必须指定行号(例如,rows[2]
) 附加列表时。
正如您所看到的,并非每一列都有相同的错误检查。所以,我希望有一种简单的方法来将列表添加到rows
列于summary
当我对每个类别进行类似的检查时(例如1.2.
, 1.3.
等),以及能够附加其他列表(如此处所示)。