Using:
# split each string by space into separate elements in a list
l <- strsplit(x, ' ')
# check which list parts contain 'Cxiv'
i <- sapply(l, function(v) any(v == 'Cxiv'))
# for those that contain 'Cxiv' increase the second number with 1
# and remove the 'Cxiv 1' part
l[i] <- lapply(l[i], function(v) {
v[2] <- as.character(as.numeric(v[2]) + 1);
v[-c(which(v == 'Cxiv') + 0:1)]
})
# check which are duplicates
as.integer(duplicated(l))
gives:
[1] 0 0 1 1 0 0 0
如果你想满足评论中所述的要求(当化学式带有Cxiv
首先),您需要将最后一步更改为:
as.integer((duplicated(l) | duplicated(l, fromLast = TRUE)) & grepl('Cxiv',x))
在新的示例数据上进行测试(x2
) 你会得到:
[1] 0 0 1 1 0 1 0 0
使用数据:
x <- c("C 4 H 15 O 7","C 13 H 17 O 7","C 3 Cxiv 1 H 15 O 7","C 12 Cxiv 1 H 17 O 7",
"C 24 H 15 O 4","C 32 H 13 O 10","C 12 Cxiv 1 H 24 N 1")
新数据:
x2 <- c("C 4 H 15 O 7","C 13 H 17 O 7","C 3 Cxiv 1 H 15 O 7","C 12 Cxiv 1 H 17 O 7",
"C 24 H 15 O 4","C 12 Cxiv 1 H 24 N 1","C 32 H 13 O 10","C 13 H 24 N 1")