这两种语言都声称使用 Perl 风格的正则表达式。如果我用一种语言测试正则表达式的有效性,它在另一种语言中是否有效?正则表达式语法有何不同?
这里的用例是一个 C# (.NET) UI 与最终的 Java 后端实现对话,该实现将使用正则表达式来匹配数据。
请注意,我只需要担心匹配,而不需要担心提取匹配数据的部分。
有相当(很多)差异。
字符类
- Character classes subtraction
[abc-[cde]]
- .NET 是(2.0)
- Java:通过字符类交集和否定进行模拟:
[abc&&[^cde]]
)
- Character classes intersection
[abc&&[cde]]
- .NET:通过字符类减法和否定进行模拟:
[abc-[^cde]]
)
- Java YES
-
\p{Alpha}
POSIX character class
- Under
(?x)
mode COMMENTS http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#COMMENTS/IgnorePatternWhitespace http://msdn.microsoft.com/en-us/library/yd1hzczs%28v=vs.110%29.aspx, space (U+0020) in character class is significant.
-
Unicode Category http://en.wikipedia.org/wiki/Unicode_character_property#General_Category (L, M, N, P, S, Z, C)
- .NET YES:
\p{L}
仅形式
- Java YES:
- 从 Java 5 开始:
\pL
, \p{L}
, \p{IsL}
- 从 Java 7 开始:
\p{general_category=L}
, \p{gc=L}
-
Unicode Category http://en.wikipedia.org/wiki/Unicode_character_property#General_Category (Lu, Ll, Lt, ...)
- .NET YES:
\p{Lu}
仅形式
- Java YES:
- 从 Java 5 开始:
\p{Lu}
, \p{IsLu}
- 从 Java 7 开始:
\p{general_category=Lu}
, \p{gc=Lu}
-
Unicode Block http://en.wikipedia.org/wiki/Unicode_block
- .NET YES:
\p{IsBasicLatin}
only. (支持的命名块 http://msdn.microsoft.com/en-us/library/20bw873z%28v=vs.110%29.aspx#SupportedNamedBlocks)
- Java YES: (name of the block is free-casing)
- 从 Java 5 开始:
\p{InBasicLatin}
- 从 Java 7 开始:
\p{block=BasicLatin}
, \p{blk=BasicLatin}
- Spaces, and underscores allowed in all long block names (e.g.
BasicLatin
can be written as Basic_Latin
or Basic Latin
)
量词
-
?+
, *+
, ++
and {m,n}+
(possessive quantifiers)
引述
-
\Q...\E
escapes a string of metacharacters
-
\Q...\E
escapes a string of character class metacharacters (in character sets)
匹配结构
- Conditional matching
(?(?=regex)then|else)
, (?(regex)then|else)
, (?(1)then|else)
or (?(group)then|else)
- Named capturing group and named backreference
- .NET YES:
- 捕获组:
(?<name>regex)
or (?'name'regex)
- 反向引用:
\k<name>
or \k'name'
- Java YES (Java 7 http://docs.oracle.com/javase/7/docs/api/java/util/regex/Matcher.html#group(java.lang.String)):
- 捕获组:
(?<name>regex)
- 反向引用:
\k<name>
- Multiple capturing groups can have the same name
- Balancing group definition
(?<name1-name2>regex)
or (?'name1-name2'subexpression)
断言
-
(?<=text)
(positive lookbehind)
-
(?<!text)
(negative lookbehind)
模式选项/标志
-
ExplicitCapture http://msdn.microsoft.com/en-us/library/yd1hzczs%28v=vs.110%29.aspx option
(?n)
各种各样的
-
(?#comment)
inline comments
参考
- 正则表达式.info - 不同正则表达式风格的比较 http://www.regular-expressions.info/refflavors.html
- MSDN 库参考 - .NET Framework 4.5 - 正则表达式语言 http://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx
- 模式(Java 平台 SE 7) http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)