我有一些数据清理任务。我有一个专栏从 H6 开始,然后再往下。该列包含本应位于 Snake_case 中的数据,但事实并非如此。单元格值的形式为:
- 带驼峰式案例:“CamelCase”
- 带空格:“间隔值”
- 有一些初始调用上限:ALLCAPSPREFIX_rest
- 以上的组合
我知道没有具体的算法可以将所有这些都带到snake_case,但我想提出至少可以将大多数单元格带到snake_case的代码。
我尝试用VBA代码用下划线替换空格并获取下划线的索引。现在我正在考虑将下划线后面的所有字符设为小写。此外,我正在考虑替换两个字符的序列:第一个小写,下一个大写,比如说lC
to l_c
因为我不想CCC
转换为c_c_c
,但要ccc
。但在进一步讨论之前,我想知道是否可以有更简单的方法。
这是一种可以满足您要求的方法:
Option Explicit
Function Snake_case(s As String) As String
Dim RE As Object
Const sPat As String = "([A-Za-z0-9]+)(?=[ _A-Z])[ _]?(\S+)"
Const sRepl As String = "$1_$2"
Dim v As Variant
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.ignorecase = False
.Pattern = sPat
v = Split(.Replace(s, sRepl), "_")
End With
v(0) = WorksheetFunction.Proper(v(0))
v(1) = LCase(v(1))
Snake_case = Join(v, "_")
End Function
这是正则表达式和替换字符串的解释:
蛇形大小写转换
([A-Za-z0-9]+)(?=[ _A-Z])[ _]?(\S+)
选项:区分大小写; ^$ 匹配换行符
-
Match the regex below and capture its match into backreference number 1 https://i.stack.imgur.com/baCrn.png
([A-Za-z0-9]+)
-
Match a single character present in the list below http://www.regular-expressions.info/charclass.html
[A-Za-z0-9]+
-
一次和无限次之间,尽可能多次,根据需要回馈(贪婪) http://www.regular-expressions.info/repeat.html
+
-
“A”和“Z”之间的字符 http://www.regular-expressions.info/charclass.html
A-Z
-
“a”和“z”之间的字符 http://www.regular-expressions.info/charclass.html
a-z
-
介于“0”和“9”之间的字符 http://www.regular-expressions.info/charclass.html
0-9
-
Assert that the regex below can be matched starting at this position (positive lookahead) http://www.regular-expressions.info/lookaround.html
(?=[ _A-Z])
-
Match a single character present in the list below http://www.regular-expressions.info/charclass.html
[ _A-Z]
-
列表“_”中的单个字符 http://www.regular-expressions.info/characters.html
_
-
“A”和“Z”之间的字符 http://www.regular-expressions.info/charclass.html
A-Z
-
Match a single character from the list “ _” http://www.regular-expressions.info/characters.html
[ _]?
-
零到一次之间,尽可能多的次数,根据需要回馈(贪婪) http://www.regular-expressions.info/optional.html
?
-
Match the regex below and capture its match into backreference number 2 https://i.stack.imgur.com/baCrn.png
(\S+)
-
Match a single character that is NOT a “whitespace character” http://www.regular-expressions.info/shorthand.html
\S+
-
一次和无限次之间,尽可能多次,根据需要回馈(贪婪) http://www.regular-expressions.info/repeat.html
+
$1_$2
-
插入捕获组编号 1 最后匹配的文本 http://www.regular-expressions.info/replacebackref.html
$1
-
按字面意思插入字符“_” http://www.regular-expressions.info/characters.html
_
-
插入捕获组编号 2 最后匹配的文本 http://www.regular-expressions.info/replacebackref.html
$2
创建于正则表达式好友 http://www.regexbuddy.com/
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)