我知道如何单独删除标点符号并保留撇号:
gsub( "[^[:alnum:]']", " ", db$text )
或者如何使用 tm 包保留字内破折号:
removePunctuation(db$text, preserve_intra_word_dashes = TRUE)
但我找不到同时完成这两件事的方法。例如,如果我原来的句子是:
"Interested in energy/the environment/etc.? Congrats to our new e-board! Ben, Nathan, Jenny, and Adam, y'all are sure to lead the club in a great direction next year! #obama #swag"
我希望它是:
"Interested in energy the environment etc Congrats to our new e-board Ben Nathan Jenny and Adam y'all are sure to lead the club in a great direction next year obama swag"
当然,会有多余的空格,但我可以稍后删除它们。
我将感谢您的帮助。
Use 字符类 http://www.regular-expressions.info/charclass.html
gsub("[^[:alnum:]['-]", " ", db$text)
## "Interested in energy the environment etc Congrats to our new e-board Ben Nathan Jenny and Adam y'all are sure to lead the club in a great direction next year obama swag"
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)