Intro
(您可以跳至如果什么...如果您对介绍感到厌倦)
这个问题并不是特别针对VBScript(我只是在这种情况下使用它):我想找到一个用于一般正则表达式使用(包括编辑器)的解决方案。
当我想创作一个改编版时,这一切就开始了示例 4,其中使用 3 个捕获组将数据拆分到 MS Excel 中的 3 个单元格中 https://stackoverflow.com/a/22542835/1326147。
我需要捕获一整个模式,然后在其中捕获其他 3 个模式。然而,在同一个表达式中,我还需要捕获另一种模式,并再次捕获其中的其他 3 个模式(是的,我知道......但在指指点点之前,请先完成阅读)。
我首先想到的是命名捕获组 http://www.regular-expressions.info/named.html然后我意识到我不应该«混合命名和编号的捕获组»自从它«不推荐,因为口味在组的编号方式上不一致».
然后我调查了VBScript 子匹配 http://www.regular-expressions.info/vbscript.html and «非捕获» groups http://www.regular-expressions.info/brackets.html我得到了针对特定案例的可行解决方案:
For Each C In Myrange
strPattern = "(?:^([0-9]+);([0-9]+);([0-9]+)$|^.*:([0-9]+)\s.*:([0-9]+).*:([a-zA-Z0-9]+)$)"
If strPattern <> "" Then
strInput = C.Value
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
Set rgxMatches = regEx.Execute(strInput)
For Each mtx In rgxMatches
If mtx.SubMatches(0) <> "" Then
C.Offset(0, 1) = mtx.SubMatches(0)
C.Offset(0, 2) = mtx.SubMatches(1)
C.Offset(0, 3) = mtx.SubMatches(2)
ElseIf mtx.SubMatches(3) <> "" Then
C.Offset(0, 1) = mtx.SubMatches(3)
C.Offset(0, 2) = mtx.SubMatches(4)
C.Offset(0, 3) = mtx.SubMatches(5)
Else
C.Offset(0, 1) = "(Not matched)"
End If
Next
End If
Next
这是正则表达式的 Rubular 演示 http://www.rubular.com/r/8r6uUqfMSv。
在这些:
124;12;3
我的 id1:213 我的 id2:232 我的话:ins4yanrgx
:8587459 :18254182540215 :dcpt
0;1;2
It returns the first 2 cells with numbers and the 3rd with a number or a word.
Basically I used a non-capturing group with 2 "parent" patterns ("parents" = broad patterns where I want to detect other sub-patterns). If the 1st parent pattern has a matching sub-pattern (1st capture group) then I place its value and the remaining captured groups of this pattern in the 3 cells. If not, I check if the 4th capture group (belonging to the 2nd parent pattern) was matched and place the remaining sub-patterns in the same 3 cells.
如果什么...
而不是这样的东西:
(?:^(\d+);(\d+);(\d+)$|^.*:(\d+)\s.*:(\d+).*:(\w+)$|what(ever))
像这样的事情是可能的:
(#:^(\d+);(\d+);(\d+)$)|(#:^.*:(\d+)\s.*:(\d+).*:(\w+)$)|(#:what(ever))
Where (#:
而不是创建一个非捕获组,将创建一个“父”编号的捕获组。
这样我就可以做类似的事情实施例4 https://stackoverflow.com/a/22542835/1326147:
C.Offset(0, 1) = regEx.Replace(strInput, "#$1")
C.Offset(0, 2) = regEx.Replace(strInput, "#$2")
C.Offset(0, 3) = regEx.Replace(strInput, "#$3")
它将搜索父模式,直到在子模式中找到匹配项(将返回第一个匹配项,理想情况下,不会搜索其余匹配项)。
已经有这样的事情了吗?或者我完全错过了正则表达式中允许执行此操作的某些内容?
其他可能的变化:
- 直接引用父子模式,例如:
#2$3
(这相当于$6
以我的例子为例);
- create as many capturing groups as necessary within others (I guess it would be more complex but also the most interesting part as well), e.g.: with regex (same syntax) like
(#:^_(?:(#:(\d+):\w+-(\d))|(#:\w+:(\d+)-(\d+)))_$)|(#:^\w+:\s+(#:(\w+);\d-(\d+))$)
and fetching ##$1
in patterns like:
_123:smt-4_
它将匹配:123
_ott:432-10_
它将匹配:432
yant: special;3-45235
它将匹配:special
如果您发现此逻辑有任何错误或缺陷,请告诉我,我会尽快编辑。