我想这样的事情就是你想要的。
对于交替字符:
(?=(.)(?!\1)(.))(?:\1\2){2,}
\0
将是整个交替序列,\1
and \2
是两个(不同的)交替字符。
对于 N 和 M 个字符的运行,可能由其他字符分隔(替换N
and M
这里有数字):
(?=(.))\1{N}.*?(?=(?!\1)(.))\2{M}
\0
将是整个匹配,包括中缀。\1
字符是否重复(至少)N
times, \2
字符是否重复(至少)M
times.
这是 Java 中的测试工具。
import java.util.regex.*;
public class Regex3 {
static String runNrunM(int N, int M) {
return "(?=(.))\\1{N}.*?(?=(?!\\1)(.))\\2{M}"
.replace("N", String.valueOf(N))
.replace("M", String.valueOf(M));
}
static void dumpMatches(String text, String pattern) {
Matcher m = Pattern.compile(pattern).matcher(text);
System.out.println(text + " <- " + pattern);
while (m.find()) {
System.out.println(" match");
for (int g = 0; g <= m.groupCount(); g++) {
System.out.format(" %d: [%s]%n", g, m.group(g));
}
}
}
public static void main(String[] args) {
String[] tests = {
"foobababababaf foobaafoobaaaooo",
"xxyyyy axxayyyya zzzzzzzzzzzzzz"
};
for (String test : tests) {
dumpMatches(test, "(?=(.)(?!\\1)(.))(?:\\1\\2){2,}");
}
for (String test : tests) {
dumpMatches(test, runNrunM(3, 3));
}
for (String test : tests) {
dumpMatches(test, runNrunM(2, 4));
}
}
}
这会产生以下输出:
foobababababaf foobaafoobaaaooo <- (?=(.)(?!\1)(.))(?:\1\2){2,}
match
0: [bababababa]
1: [b]
2: [a]
xxyyyy axxayyyya zzzzzzzzzzzzzz <- (?=(.)(?!\1)(.))(?:\1\2){2,}
foobababababaf foobaafoobaaaooo <- (?=(.))\1{3}.*?(?=(?!\1)(.))\2{3}
match
0: [aaaooo]
1: [a]
2: [o]
xxyyyy axxayyyya zzzzzzzzzzzzzz <- (?=(.))\1{3}.*?(?=(?!\1)(.))\2{3}
match
0: [yyyy axxayyyya zzz]
1: [y]
2: [z]
foobababababaf foobaafoobaaaooo <- (?=(.))\1{2}.*?(?=(?!\1)(.))\2{4}
xxyyyy axxayyyya zzzzzzzzzzzzzz <- (?=(.))\1{2}.*?(?=(?!\1)(.))\2{4}
match
0: [xxyyyy]
1: [x]
2: [y]
match
0: [xxayyyy]
1: [x]
2: [y]
解释
-
(?=(.)(?!\1)(.))(?:\1\2){2,}
has two parts
-
(?=(.)(?!\1)(.))
establishes \1
and \2
using lookahead
- 嵌套负前瞻确保
\1
!= \2
- 使用前瞻来捕获 let
\0
拥有整个比赛(而不仅仅是“尾部”)
-
(?:\1\2){2,}
捕捉到\1\2
序列,该序列必须至少重复两次。
-
(?=(.))\1{N}.*?(?=(?!\1)(.))\2{M}
has three parts
-
(?=(.))\1{N}
captures \1
in a lookahead, and then match it N
times
-
.*?
允许中缀分隔两个运行,不愿意使其尽可能短
-
(?=(?!\1)(.))\2{M}
run 正则表达式将匹配更长的运行,例如run(2,2)
火柴"xxxyyy"
:
xxxyyy <- (?=(.))\1{2}.*?(?=(?!\1)(.))\2{2}
match
0: [xxxyy]
1: [x]
2: [y]
此外,它不允许重叠匹配。也就是说,只有一个run(2,3)
in "xx11yyy222"
.
xx11yyy222 <- (?=(.))\1{2}.*?(?=(?!\1)(.))\2{3}
match
0: [xx11yyy]
1: [x]
2: [y]