1) gsubfn::带式 strapply
可以一次完成提取和翻译。strapply
将,对于每个组成部分stmt
, 匹配模式pat
到它,所有匹配都将使用 L 进行翻译,然后返回。这empty
参数定义了组件返回的内容stat
没有匹配项。这给出了一个匹配列表,每行一个列表组件,其中toString
用于将每个转换为逗号分隔的字符串。这是此处提供的 3 个替代方案中最短的一个。
library(gsubfn)
L <- list(APC = "APC", EMR = "EMR", HALO = "RFA", RFA = "RFA")
pat <- paste(names(L), collapse = "|")
transform(statement,
out = sapply(strapply(stmt, pat, L, empty = "No Event"), toString),
stringsAsFactors = FALSE)
giving:
stmt out
1 I have performed APC and RFA APC, RFA
2 An EMR was done EMR
3 I didn't do anything No Event
2) 基础R Using L
and pat
从上面创建一个函数,它接受单词的字符向量x
并提取出匹配的单词pat
into g
. If g
具有非零长度使用平移其元素L
并使用将其压缩为单个字符串toString
;否则,返回No Event
.
现在拆分每个元素stmt
转化为单词使用strsplit
并申请process
对于每个这样的字符向量。
process <- function(x) {
g <- grep(pat, x, value = TRUE)
if (length(g)) toString(L[g]) else "No Event"
}
transform(statement, out = sapply(strsplit(stmt, "\\s+"), process),
stringsAsFactors = FALSE)
3) dplyr/tidyr Using L
来自 (1) 按行号分组和stmt
并将单词分成单独的行。过滤掉这些词names(L)
并将所有行折叠为一行stmt
小组翻译通过L
并使用toString
生成逗号分隔的字符串。放下n
柱子。此时我们已经得到了想要的结果,除了No Event
行仍然缺失,所以正确加入我们所拥有的statement
并将 NA 替换为No Event
.
library(dplyr)
library(tidyr)
statement %>%
group_by(n = 1:n(), out = stmt) %>%
separate_rows(out) %>%
filter(out %in% names(L)) %>%
summarize(stmt = stmt[1], out = toString(L[out])) %>%
ungroup %>%
select(-n) %>%
right_join(statement, by = "stmt") %>%
mutate(out = if_else(is.na(out), "No Event", out))
giving:
# A tibble: 3 x 2
stmt out
<chr> <chr>
1 I have performed APC and RFA APC, RFA
2 An EMR was done EMR
3 I didn't do anything No Event
Note
我们用它作为输入:
statement <- structure(list(stmt = c("I have performed APC and RFA",
"An EMR was done", "I didn't do anything")),
class = "data.frame", row.names = c(NA, -3L))
Updates
重新阅读问题后修改了多次。还添加了更多替代方案。