你可以这样做:
lexer grammar TLexer;
REGEX
: REGEX_DELIMITER ( {getText().charAt(0) != _input.LA(1)}? REGEX_ATOM )+ {getText().charAt(0) == _input.LA(1)}? .
| '{' REGEX_ATOM+ '}'
| '(' REGEX_ATOM+ ')'
;
ANY
: .
;
fragment REGEX_DELIMITER
: [/~@#]
;
fragment REGEX_ATOM
: '\\' .
| ~[\\]
;
如果您运行以下课程:
public class Main {
public static void main(String[] args) throws Exception {
TLexer lexer = new TLexer(new ANTLRInputStream("/foo/ /bar\\ ~\\~~ {mu} (bla("));
for (Token t : lexer.getAllTokens()) {
System.out.printf("%-20s %s\n", TLexer.VOCABULARY.getSymbolicName(t.getType()), t.getText().replace("\n", "\\n"));
}
}
}
您将看到以下输出:
REGEX /foo/
ANY
ANY /
ANY b
ANY a
ANY r
ANY \
ANY
REGEX ~\~~
ANY
REGEX {mu}
ANY
ANY (
ANY b
ANY l
ANY a
ANY (
The {...}?
称为谓词:
- Antlr4 中语义谓词的语法 https://stackoverflow.com/questions/12749230/syntax-of-semantic-predicates-in-antlr4
- ANTLR4 中的语义谓词? https://stackoverflow.com/questions/13661754/semantic-predicates-in-antlr4
The ( {getText().charAt(0) != _input.LA(1)}? REGEX_ATOM )+
部分告诉词法分析器继续匹配字符,只要字符匹配REGEX_DELIMITER
不在字符流前面。和{getText().charAt(0) == _input.LA(1)}? .
确保实际上有一个与第一个字符匹配的结束分隔符(这是一个REGEX_DELIMITER
, 当然)。
使用 ANTLR 4.5.3 进行测试
EDIT
并获得前面的分隔符m
+一些可选的工作空间,你可以尝试这样的事情(未经测试!):
lexer grammar TLexer;
@lexer::members {
boolean delimiterAhead(String start) {
return start.replaceAll("^m[ \t]*", "").charAt(0) == _input.LA(1);
}
}
REGEX
: '/' ( '\\' . | ~[/\\] )+ '/'
| 'm' SPACES? REGEX_DELIMITER ( {!delimiterAhead(getText())}? ( '\\' . | ~[\\] ) )+ {delimiterAhead(getText())}? .
| 'm' SPACES? '{' ( '\\' . | ~'}' )+ '}'
| 'm' SPACES? '(' ( '\\' . | ~')' )+ ')'
;
ANY
: .
;
fragment REGEX_DELIMITER
: [~@#]
;
fragment SPACES
: [ \t]+
;