如何获取一个单词并根据该单词在评论中的存在创建一个指示变量?

2023-12-20

我有一个单词向量和一个评论向量:

word.list <- c("very", "experience", "glad")

comments  <- c("very good experience. first time I have been and I would definitely come back.",
               "glad I scheduled an appointment.",
               "the staff have become more cordial.",
               "the experience i had was not good at all.",
               "i am very glad")

我想创建一个看起来像的数据框

df <- data.frame(comments = c("very good experience. first time I have been and I would definitely come back.",
               "glad I scheduled an appointment.",
               "the staff have become more cordial.",
               "the experience i had was not good at all.",
               "i am very glad"),
               very = c(1,0,0,0,1),
               glad = c(0,1,0,0,1),
               experience = c(1,0,0,1,0))

我有 12,000 多条评论和 20 个单词,我想用它来做这件事。我该如何有效地做到这一点?对于循环?还有其他方法吗?


一种方法是组合stringi and gdapTools包,即

library(stringi)
library(qdapTools)

mtabulate(stri_extract_all(comments, regex = paste(word.list, collapse = '|')))
#  experience glad very
#1          1    0    1
#2          0    1    0
#3          0    0    0
#4          1    0    0
#5          0    1    1

然后你可以使用cbind or data.frame绑定,

cbind(comments, mtabulate(stri_extract_all(comments, regex = paste(word.list, collapse = '|'))))) 
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

如何获取一个单词并根据该单词在评论中的存在创建一个指示变量? 的相关文章

随机推荐