我有一个需要清理的字符向量。具体来说,我想删除“投票”一词之前的数字。请注意,该数字用逗号分隔千位,因此更容易将其视为字符串。
我知道 gsub("*.Votes","", text) 会删除所有内容,但如何删除数字?另外,如何将重复的空格折叠成一个空格?
感谢您可能提供的任何帮助!
示例数据:
text <- "STATE QUESTION NO. 1 Amendment to Title 15 of the Nevada Revised Statutes Shall Chapter 202 of the Nevada Revised Statutes be amended to prohibit, except in certain circumstances, a person from selling or transferring a firearm to another person unless a federally-licensed dealer first conducts a federal background check on the potential buyer or transferee? 558,586 Votes"
您可以使用
text <- "STATE QUESTION NO. 1 Amendment to Title 15 of the Nevada Revised Statutes Shall Chapter 202 of the Nevada Revised Statutes be amended to prohibit, except in certain circumstances, a person from selling or transferring a firearm to another person unless a federally-licensed dealer first conducts a federal background check on the potential buyer or transferee? 558,586 Votes"
trimws(gsub("(\\s){2,}|\\d[0-9,]*\\s*(Votes)", "\\1\\2", text))
# => [1] "STATE QUESTION NO. 1 Amendment to Title 15 of the Nevada Revised Statutes Shall Chapter 202 of the Nevada Revised Statutes be amended to prohibit, except in certain circumstances, a person from selling or transferring a firearm to another person unless a federally-licensed dealer first conducts a federal background check on the potential buyer or transferee? Votes"
See the 在线 R 演示和在线正则表达式演示.
Details
-
(\\s){2,}
- 匹配 2 个或更多空白字符,同时捕获将使用重新插入的最后一个匹配项\1
替换模式中的占位符
-
|
- or
-
\\d
- 一个数字
-
[0-9,]*
- 0个或多个数字或逗号
-
\\s*
- 0+ 空白字符
-
(Votes)
- 第 2 组(将使用\2
占位符):aVotes
子串。
注意trimws
将删除任何前导/尾随空格。
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)