Boost::Spirit 表达式解析器

2024-01-04

我的 boost::spirit 解析器还有另一个问题。

template<typename Iterator>
struct expression: qi::grammar<Iterator, ast::expression(), ascii::space_type> {
    expression() :
        expression::base_type(expr) {
        number %= lexeme[double_];
        varname %= lexeme[alpha >> *(alnum | '_')];

        binop = (expr >> '+' >> expr)[_val = construct<ast::binary_op<ast::add>>(_1,_2)]
              | (expr >> '-' >> expr)[_val = construct<ast::binary_op<ast::sub>>(_1,_2)]
              | (expr >> '*' >> expr)[_val = construct<ast::binary_op<ast::mul>>(_1,_2)]
              | (expr >> '/' >> expr)[_val = construct<ast::binary_op<ast::div>>(_1,_2)] ;

        expr %= number | varname | binop;
    }

    qi::rule<Iterator, ast::expression(), ascii::space_type> expr;
    qi::rule<Iterator, ast::expression(), ascii::space_type> binop;
    qi::rule<Iterator, std::string(), ascii::space_type> varname;
    qi::rule<Iterator, double(), ascii::space_type> number;
};

这是我的解析器。它解析了"3.1415" and "var"很好，但是当我尝试解析时"1+2"它告诉我parse failed。然后我尝试改变binop rule to

    binop = expr >>
           (('+' >> expr)[_val = construct<ast::binary_op<ast::add>>(_1, _2)]
          | ('-' >> expr)[_val = construct<ast::binary_op<ast::sub>>(_1, _2)]
          | ('*' >> expr)[_val = construct<ast::binary_op<ast::mul>>(_1, _2)]
          | ('/' >> expr)[_val = construct<ast::binary_op<ast::div>>(_1, _2)]);

但现在它当然无法构建 AST，因为_1 and _2设置不同。我只见过类似的东西_r1提到过，但作为新手，我不太明白如何boost::phoenix and boost::spirit相互影响。

怎么解决这个问题呢？

我并不完全清楚你想要实现什么目标。最重要的是，您不担心运算符结合性吗？我将仅显示基于使用右递归的简单答案 - 这导致左结合运算符正在解析中。

直接回答你的问题visible问题是要兼顾fusion::vector2<char, ast::expression>- 这真的没什么好玩的，尤其是在菲尼克斯拉姆达语义动作。（我将在下面展示它是什么样子）。

同时我认为你应该阅读 Spirit 文档

here http://boost-spirit.com/old_docs/v1_6/doc/faq.html#left_recursion in the oldSpirit docs（消除左递归）；尽管语法不再适用，但 Spirit 仍然生成 LL 递归下降解析器，因此左递归背后的概念仍然适用。下面的代码显示了这应用于灵气
here http://www.boost.org/doc/libs/1_48_0/libs/spirit/example/qi/calc_utree_naive.cpp： Qi 示例包含三个calculator示例，这些示例应该会提示您为什么运算符结合性很重要，以及如何表达捕获二元运算符结合性的语法。显然，它也展示了如何支持带括号的表达式覆盖默认的评估顺序。

Code:

我有三个版本的代码可以工作，解析输入如下：

std::string input("1/2+3-4*5");

into an ast::expression分组如下（使用 BOOST_SPIRIT_DEBUG）：

<expr>
  ....
  <success></success>
  <attributes>[[1, [2, [3, [4, 5]]]]]</attributes>
</expr>

代码的链接在这里：

步骤_#1_reduce_semantic_actions.cpp https://gist.github.com/267acc43ac7ee276e889#file_step_1_reduce_semantic_actions.cpp

步骤_#2_drop_rule.cpp https://gist.github.com/267acc43ac7ee276e889#file_step_2_drop_rule.cpp

步骤_#0_vector2.cpp https://gist.github.com/267acc43ac7ee276e889#file_step_0_vector2.cpp

Step 1: 减少语义动作 https://gist.github.com/267acc43ac7ee276e889#file_step_1_reduce_semantic_actions.cpp

First thing, I'd get rid of the alternative parse expressions per operator; this leads to excessive backtracking¹. Also, as you've found out, it makes the grammar hard to maintain. So, here is a simpler variation that uses a function for the semantic action:

_{¹check that using BOOST_SPIRIT_DEBUG!}

static ast::expression make_binop(char discriminant, 
     const ast::expression& left, const ast::expression& right)
{
    switch(discriminant)
    {
        case '+': return ast::binary_op<ast::add>(left, right);
        case '-': return ast::binary_op<ast::sub>(left, right);
        case '/': return ast::binary_op<ast::div>(left, right);
        case '*': return ast::binary_op<ast::mul>(left, right);
    }
    throw std::runtime_error("unreachable in make_binop");
}

// rules:
number %= lexeme[double_];
varname %= lexeme[alpha >> *(alnum | '_')];

simple = varname | number;
binop = (simple >> char_("-+*/") >> expr) 
    [ _val = phx::bind(make_binop, qi::_2, qi::_1, qi::_3) ]; 

expr = binop | simple;

Step 2: 删除多余的规则，使用_val https://gist.github.com/267acc43ac7ee276e889#file_step_2_drop_rule.cpp

正如您所看到的，这有可能降低复杂性。现在只是一小步，删除 binop 中间体（它已经变得相当多余）：

number %= lexeme[double_];
varname %= lexeme[alpha >> *(alnum | '_')];

simple = varname | number;
expr = simple [ _val = _1 ] 
    > *(char_("-+*/") > expr) 
            [ _val = phx::bind(make_binop, qi::_1, _val, qi::_2) ]
    > eoi;

如你看到的，

内expr规则，即_val惰性占位符用作累积 binops 的伪局部变量。跨越规则，你必须使用qi::locals<ast::expression>对于这样的方法。（这是你关于_r1).
现在有明确的期望点，使语法更加健壮
the expr规则不再需要是自动规则（expr =代替expr %=)

Step 0: 直接摔跤融合类型 https://gist.github.com/267acc43ac7ee276e889#file_step_0_vector2.cpp

最后，为了有趣和血腥，让我向您展示如何处理您建议的代码，以及 _1、_2 等的移动绑定：

static ast::expression make_binop(
        const ast::expression& left, 
        const boost::fusion::vector2<char, ast::expression>& op_right)
{
    switch(boost::fusion::get<0>(op_right))
    {
        case '+': return ast::binary_op<ast::add>(left, boost::fusion::get<1>(op_right));
        case '-': return ast::binary_op<ast::sub>(left, boost::fusion::get<1>(op_right));
        case '/': return ast::binary_op<ast::div>(left, boost::fusion::get<1>(op_right));
        case '*': return ast::binary_op<ast::mul>(left, boost::fusion::get<1>(op_right));
    }
    throw std::runtime_error("unreachable in make_op");
}

// rules:
expression::base_type(expr) {
number %= lexeme[double_];
varname %= lexeme[alpha >> *(alnum | '_')];

simple = varname | number;
binop %= (simple >> (char_("-+*/") > expr)) 
    [ _val = phx::bind(make_binop, qi::_1, qi::_2) ]; // note _2!!!

expr %= binop | simple;

正如你所看到的，编写make_binop就这样发挥作用！

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)