我有一组文件:
documents = c("She had toast for breakfast",
"The coffee this morning was excellent",
"For lunch let's all have pancakes",
"Later in the day, there will be more talks",
"The talks on the first day were great",
"The second day should have good presentations too")
在这组文档中,我想删除停用词。我已经删除了标点符号并转换为小写,使用:
documents = tolower(documents) #make it lower case
documents = gsub('[[:punct:]]', '', documents) #remove punctuation
首先我转换为 Corpus 对象:
documents <- Corpus(VectorSource(documents))
然后我尝试删除停用词:
documents = tm_map(documents, removeWords, stopwords('english')) #remove stopwords
但最后一行会导致以下错误:
THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATION_FUNCTIONALITY___YOU_MUST_EXEC() 进行调试。
这已经被问过here https://stackoverflow.com/questions/18688599/the-process-has-forked-error-while-using-tm-package-in-r但没有给出答复。这个错误是什么意思?
EDIT
是的,我正在使用 tm 包。
这是 sessionInfo() 的输出:
R版本3.0.2 (2013-09-25)
平台:x86_64-apple-darwin10.8.0(64位)