如何使用 R 从列的字符串中删除重复字符? 例如,这是我的专栏:
df<- data.frame(name = c(A="a,a,b,c,d,d,d", B="a,b,b,b,f", C="d,d,d,d", D="a,a"))
还有我期待的专栏:
df<- data.frame(name = c(A="a,b,c,d", B="a,b,f", C="d", D="a"))
一个选项map and strsplit
map
strsplit
library(tidyverse) df %>% mutate(name = strsplit(as.character(name), ",") %>% map(~toString(unique(.x)))) # name #1 a, b, c, d #2 a, b, f #3 d #4 a
Or in base R使用正则表达式
base R
sub(",$", "", gsub("([a-z],)\\1+", "\\1", paste0(df$name, ","))) #[1] "a,b,c,d" "a,b,f" "d" "a"