为什么 ANTLR4 不匹配单词“of”和标点符号“,”？

2024-01-09

我有一个Hello.g4带有语法定义的语法文件：

definition : wordsWithPunctuation ;
words : (WORD)+ ;
wordsWithPunctuation : word ( word | punctuation word | word punctuation | '(' wordsWithPunctuation ')' | '"' wordsWithPunctuation '"' )*  ;
NUMBER : [0-9]+ ;
word : WORD ;
WORD : [A-Za-z-]+ ;
punctuation : PUNCTUATION ;
PUNCTUATION : (','|'!'|'?'|'\''|':'|'.') ;
WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines

现在，如果我尝试从以下输入构建解析树：

a b c d of at of abc bcd of
a b c d at abc, bcd
a b c d of at of abc, bcd of

它返回错误：

Hello::definition:1:31: extraneous input 'of' expecting {<EOF>, '(', '"', WORD, PUNCTUATION}

虽然：

a b c d  at:  abc bcd!

工作正常。

语法、输入或解释器有什么问题？

如果我修改wordsWithPunctuation规则，通过添加(... | 'of' | ',' word | ...)然后它完全匹配输入，但它对我来说看起来很可疑 - 这个词如何of和这个词不一样a or abc？或者为什么,与其他不同punctuation字符（即为什么它匹配: or !，但不是,?)?

Update1:

我正在使用 Eclipse 的 ANTLR4 插件，因此项目构建会产生以下输出：

ANTLR Tool v4.2.2 (/var/folders/.../antlr-4.2.2-complete.jar)
Hello.g4 -o /Users/.../eclipse_workspace/antlr_test_project/target/generated-sources/antlr4 -listener -no-visitor -encoding UTF-8

Update2:

上面给出的语法只是以下语法的一部分：

grammar Hello;

text : (entry)+ ;

entry : blub 'abrr' '-' ('1')? '.' ('(' NUMBER ')')? sims '-' '(' definitionAndExamples ')' 'Hello' 'all' 'the' 'people' 'of' 'the' 'world';

blub : WORD ;

sims : sim (',' sim)* ;
sim : words ;

definitionAndExamples : definitions (';' examples)? ;

definitions : definition (';' definition )* ;
definition : wordsWithPunctuation ;

examples : example (';' example )* ;
example : '"' wordsWithPunctuation '"' ;

words : (WORD)+ ;
wordsWithPunctuation : word ( word | punctuation word | word punctuation | '(' wordsWithPunctuation ')' | '"' wordsWithPunctuation '"' )*  ;

NUMBER : [0-9]+ ;
word : WORD ;
WORD : [A-Za-z-]+ ;
punctuation : PUNCTUATION ;
PUNCTUATION : (','|'!'|'?'|'\''|':'|'.') ;
WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines

现在我看来，来自entry规则以某种方式打破了其他规则entry规则。但为什么？它是语法中的一种反模式吗？

通过包括'of'在解析器规则中，ANTLR 创建一个隐式匿名标记来表示该输入。这个单词of将始终具有该特殊令牌类型，因此它永远不会具有该类型WORD。它可能出现在解析树中的唯一位置是'of'出现在解析器规则中。

您可以通过将语法分离为单独的语法来阻止 ANTLR 创建这些匿名标记类型lexer grammar HelloLexer in HelloLexer.g4 and parser grammar HelloParser in HelloParser.g4。我强烈推荐你always使用此表格的原因如下：

仅当您这样做时，词法分析器模式才有效。
隐式定义的标记是语法中最常见的错误来源之一，分离语法可以防止这种情况发生。

一旦你将语法分开，你就可以更新你的word允许特殊标记的解析器规则of被视为一个词。

word
  : WORD
  | 'of'
  | ... other keywords which are also "words"
  ;

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)