Spirit-Qi：如何编写非终结符解析器？

2024-02-22

我想写一个可以使用的解析器（作为 qi 扩展）通过my_parser(p1, p2, ...) where p1, p2, ...是 qi 解析器表达式。

其实我想实现一个best_match解析器的工作方式类似于 qi 替代方案，但不选择第一个匹配规则，而是选择“解释”大部分输入的规则。

给定两个规则simple_id = +(qi::alpha) and complex_id = simple_id >> *(qi::string("::") > simple_id)它会选择complex_id在输入上willy::anton。并且这样做时不会产生中间属性。这些好处是通过运行时获得的，因为需要先行解析。

在我看来，这样的解析器构造有几个用例。benchmark(p1, ...)作为优化的支持解析器，仅举一个例子。一旦我知道如何提供，我就会提供。

这个解析器将是一个非终端。我（努力）尝试过，但我无法在 qi 中找到问题的答案。我的猜测是，C++ 机制与 c qi 紧密集成，以至于没有可理解的入口点，至少对我来说是这样。

实现我想要引入运算符的内容非常容易。我在下面附加了当前的解决方案。它按预期编译，但不完整。

qi 运算符获取准备好的 fusion::vector/sequence（无论什么）并对其进行操作。似乎没有图书馆提供可以解决我的问题。甚至make_nary_composite已经期望 args 被编译为Elements.

我尝试了很多，但都失败了，所以我不想让你感到厌烦。

我可以想象的一个解决方法是提供一个有状态的运算符,，这将使p1, ...到合法的气表达。然后实现一个unary_parserbest_match或处理该表达式的指令。这,将获得两种模式：获取当前（成功）解析器消耗的输入长度，并实际解析第 1 阶段中选定的输入。包装器首先在模式 1 中运行逗号解析器，然后在模式 2 中运行。它可能很难看，但是可以工作。

运营商实施p1 |= p2 |= ...对于 n > 2 的运行时间来说，是最昂贵的。我很乐意解决这个问题。

最小（但仍然合理）的开销将有best_match(p1, ...)。这可能吗？

因为我对 boost::fusion 不太有经验，所以我也在代码中添加了一些问题（有关融合的方法）。

将实际上是非终端解析器的东西压入一元解析器方案感觉是错误的。但由于我的理解很差，这似乎是完成它的唯一方法。我非常感谢任何帮助我摆脱困境的帮助。

我的环境：Boost 1.61，MSVC 2015 Upd 2，目标 win32console。

进步：

我想，我正在缓慢但坚定地得到它。我很高兴看到error_invalid_expression。由于以下原始表达式，编译失败：

boost::proto::exprns_::expr<
    boost::proto::tagns_::tag::function,boost::proto::argsns_::list5<
        const boost::proto::exprns_::expr<
            boost::proto::tagns_::tag::terminal,boost::proto::argsns_::term<
                mxc::qitoo::tag::best_match
            >
        ,0> &
        ,const boost::spirit::qi::rule<
            iterator_type,std::string (void),
            boost::spirit::standard::space_type,
            boost::spirit::unused_type,boost::spirit::unused_type> &
        ,const boost::spirit::qi::rule<
            iterator_type,std::string (void),
            boost::spirit::standard::space_type,
            boost::spirit::unused_type,
            boost::spirit::unused_type> &
        , const boost::spirit::qi::rule<
            iterator_type,std::string (void),
            boost::spirit::standard::space_type,
            boost::spirit::unused_type,
            boost::spirit::unused_type> &
        ,const boost::spirit::qi::rule<
            iterator_type,std::string (void),
            boost::spirit::standard::space_type,
            boost::spirit::unused_type,
            boost::spirit::unused_type> &
        >
    ,5>

这实际上是描述我的 best_match 解析器的函数风格用法的表达式。我的测试规则看起来像

qitoo::best_match(id, qualified_id, id, qualified_id)

where id and qualified_id是上面提到的规则。这一切都是我想要的。发生错误是因为该表达式没有成员parse。好的。深入挖掘后，我发现 qi 的元编译器确实支持一元、二进制和 nary（这意味着可变参数）解析器。但它不支持函数风格的语法。 Qi 似乎仅使用一元、二元和 nary 运算符。尽管支持 nary 类型，但它或多或少已经过时了，因为 qi 运算符是二进制 max。

结束进度编辑

#include <boost/spirit/home/qi.hpp>

namespace boost {
    namespace spirit
    {
        ///////////////////////////////////////////////////////////////////////////
        // Enablers
        ///////////////////////////////////////////////////////////////////////////
        template <>
        struct use_operator<qi::domain, proto::tag::bitwise_or_assign> // enables |=
            : mpl::true_ {};

        template <>
        struct flatten_tree<qi::domain, proto::tag::bitwise_or_assign> // flattens |=
            : mpl::true_ {};
    }
}

namespace mxc {
    namespace qitoo {

        namespace spirit = boost::spirit;
        namespace qi = spirit::qi;
        namespace fusion = boost::fusion;

        template <typename Elements>
        struct best_match_parser : qi::nary_parser<mxc::qitoo::best_match_parser<Elements>>
        {
            // This one does a lookahead parse of each qi expression to find out which rule matches 
            // most of the input 
            template <typename Iterator, typename Context, typename Skipper>
            struct best_function
            {
                best_function(Iterator& first, Iterator const& last, Context& context,
                    Skipper const& skipper, int& best, int& idx, int& size)
                    : first(first), last(last), context(context), skipper(skipper), best(best),
                    idx(idx), size(size) {};

                template <typename Component>
                void operator()(Component& component) const
                {
                    Iterator f = first;
                    if (component.parse(f, last, context, skipper, qi::unused)) {
                        // please have a look on how I transport best, idx and size
                        // into the parser context
                        //
                        // I guess, this is a nasty hack and could be done better
                        // Any advice would be highliy appreciated

                        int l = f - first;
                        if (l > size) {
                            size = l;
                            best = idx++;
                        }
                        idx++;
                    }
                }

            private:
                int& best;
                int& idx;
                int& size;

                Iterator&       first;
                Iterator const& last;
                Context&        context;
                Skipper const&  skipper;
            };

            template <typename Context, typename Iterator>
            struct attribute
            {
                // Put all the element attributes in a tuple
                typedef typename spirit::traits::build_attribute_sequence <
                    Elements, Context, spirit::traits::alternative_attribute_transform
                    , Iterator, qi::domain
                >::type all_attributes;

                // Ok, now make a variant over the attribute sequence. Note that
                // build_variant makes sure that 1) all attributes in the variant
                // are unique 2) puts the unused attribute, if there is any, to
                // the front and 3) collapses single element variants, variant<T>
                // to T.
                typedef typename
                    spirit::traits::build_variant<all_attributes>::type
                    type;
            };

            best_match_parser(Elements const& elements_) : elements(elements_), size(0), idx(0), best(-1) {}


            template <typename Iterator, typename Context, typename Skipper, typename Attribute>
            bool parse(Iterator& first, Iterator const& last
                , Context& context, Skipper const& skipper
                , Attribute& attr_) const
            {
                best_function<Iterator, Context, Skipper>  f(first, last, context, skipper, best, idx, size);

                // find out which parser matches most of the input
                fusion::for_each(elements, f);

                // best >= 0 if at least one parser was successful
                if (best >= 0) {
                    // now that I have the best parser identified, how do I access it?
                    // I understand that the next line can not work, but I'm looking for something like that

                    // --> auto v = fusion::at<boost::mpl::int_<best>>(elements);
                };

                return false;
            }

            template <typename Context>
            qi::info what(Context& context) const
            {
                qi::info result("best_match");
                fusion::for_each(elements,
                    spirit::detail::what_function<Context>(result, context));

                return result;
            }

            Elements elements;
            mutable int  best;
            mutable int  idx;
            mutable int  size;

        };
    }
}

namespace boost {
    namespace spirit {
        namespace qi {

            ///////////////////////////////////////////////////////////////////////////
            // Parser generators: make_xxx function (objects)
            ///////////////////////////////////////////////////////////////////////////
            template <typename Elements, typename Modifiers>
            struct make_composite<proto::tag::bitwise_or_assign, Elements, Modifiers>
                : make_nary_composite < Elements, mxc::qitoo::best_match_parser >
            {};
        }

        namespace traits {
            ///////////////////////////////////////////////////////////////////////////
            template <typename Elements>
            struct has_semantic_action<mxc::qitoo::best_match_parser<Elements> >
                : nary_has_semantic_action<Elements> {};

            ///////////////////////////////////////////////////////////////////////////
            template <typename Elements, typename Attribute, typename Context
                , typename Iterator>
                struct handles_container<mxc::qitoo::best_match_parser<Elements>, Attribute, Context
                , Iterator>
                : nary_handles_container<Elements, Attribute, Context, Iterator> {};
        }
    }
}
namespace qi = boost::spirit::qi;
namespace qitoo = mxc::qitoo;

using iterator_type = std::string::const_iterator;
using result_type = std::string;

template<typename Parser>
void parse(const std::string message, const std::string& input, const std::string& rule, const Parser& parser) {
    iterator_type iter = input.begin(), end = input.end();

    std::vector<result_type> parsed_result;

    std::cout << "-------------------------\n";
    std::cout << message << "\n";
    std::cout << "Rule: " << rule << std::endl;
    std::cout << "Parsing: \"" << input << "\"\n";

    bool result = qi::phrase_parse(iter, end, parser, qi::space, parsed_result);
    if (result)
    {
        std::cout << "Parser succeeded.\n";
        std::cout << "Parsed " << parsed_result.size() << " elements:";
        for (const auto& str : parsed_result)
            std::cout << "[" << str << "]";
        std::cout << std::endl;
    }
    else
    {
        std::cout << "Parser failed" << std::endl;
    }
    if (iter == end) {
        std::cout << "EOI reached." << std::endl;
    }
    else {
        std::cout << "EOI not reached. Unparsed: \"" << std::string(iter, end) << "\"" << std::endl;
    }
    std::cout << "-------------------------\n";

}


int main()
{
    namespace qi = boost::spirit::qi;

    qi::rule < iterator_type, std::string(), qi::space_type>
        id = (qi::alpha | qi::char_('_')) >> *(qi::alnum | qi::char_('_'));

    qi::rule < iterator_type, std::string(), qi::space_type>
        qualified_id = id >> *(qi::string("::") > id);


    namespace qitoo = mxc::qitoo;
    namespace qi = boost::spirit::qi;

    parse("best match operator, select second rule"
        , "willy::anton::lutz"
        , "id |= qualified_id"
        , id |= qualified_id);
}

你的例子

我不明白你的样品如何需要这些。只需重新排序你的分支，然后意识到短分支只是 n=1 的合格情况的一个特例：住在科里鲁 http://coliru.stacked-crooked.com/a/1191dadc88b9cea1¹ (or 使用X3版本 http://coliru.stacked-crooked.com/a/0f80074001bf8c39如果你更喜欢）。

一般情况

现在，说到 x3，它有能力让您的生活变得更加轻松！

在一般情况下，我认为您想要的是：

namespace parser {

    template <typename... Parsers>
    struct longest_parser : x3::parser_base {
        longest_parser(Parsers... sub) : _alternatives {sub...} { }

        template <typename It, typename Ctx, typename Other, typename Attr>
        bool parse(It& f, It l, Ctx& ctx, Other const& other, Attr& attr) const {
            auto const saved = f;

            //// To exclude pre-skip from length comparisons, do pre-skip here:
            // x3::skip_over(f, l, ctx);
            auto seq = std::make_index_sequence<sizeof...(Parsers)>();

            auto best = select_best(f, l, ctx, seq);
            //std::cout << "Longest match at index #" << best << "\n";

            bool ok = dispatch(f, l, ctx, other, attr, best, seq);

            if (!ok)
                f = saved;

            return ok;
        }

      private:
        template <typename It, typename Ctx, typename P>
        size_t length_of(It f, It l, Ctx ctx, P const& p) const {
            boost::iterator_range<It> matched;
            return x3::raw[p].parse(f, l, ctx, x3::unused, matched)? boost::size(matched) : 0;
        }

        template <typename It, typename Ctx, size_t... I>
            size_t select_best(It f, It l, Ctx& ctx, std::index_sequence<I...>) const {
                std::array<size_t, sizeof...(I)> lengths { length_of(f, l, ctx, std::get<I>(_alternatives))... };
                return std::distance(lengths.begin(), std::max_element(lengths.begin(), lengths.end()));
            }

        template <typename It, typename Ctx, typename Other, typename Attr, size_t... I>
        bool dispatch(It& f, It l, Ctx& ctx, Other const& other, Attr& attr, size_t targetIdx, std::index_sequence<I...>) const {
            //return (real_parse<I>(f, l, ctx, other, attr, targetIdx) || ...);
            std::array<bool, sizeof...(I)> b = { real_parse<I>(f, l, ctx, other, attr, targetIdx)... };

            return std::accumulate(b.begin(), b.end(), false, std::logical_or<bool>());
        }

        template <size_t Idx, typename It, typename Ctx, typename Other, typename Attr>
        bool real_parse(It& f, It l, Ctx& ctx, Other const& other, Attr& attr, size_t targetIdx) const {
            if (targetIdx != Idx)
                return false;

            return std::get<Idx>(_alternatives).parse(f, l, ctx, other, attr);
        }

        std::tuple<Parsers...> _alternatives;
    };

    template <typename... Ps>
        longest_parser<Ps...> longest(Ps... p) { return {x3::as_parser(p)...}; }
}

请注意您可以使用的折叠表达式dispatch如果你的编译器支持它（Coliru 支持，编辑它来查看 http://coliru.stacked-crooked.com/a/917588032326aa18!).

还要注意关于可跳过的微妙选择（可能是空格）；如果它对于长度比较不重要，请取消注释预跳过。

现场演示

Live On Coliru http://coliru.stacked-crooked.com/a/cb13c1d357cdc464

#include <boost/spirit/home/x3.hpp>
#include <type_traits>
#include <iostream>
#include <numeric>

namespace x3 = boost::spirit::x3;

namespace std {
    template <typename T> // just for easy debug printing; hack
    static std::ostream& operator<<(std::ostream& os, std::vector<T> const& v) {
        for (auto& el : v) std::cout << '[' << el << ']';
        return os;
    }
}

using string_vec  = std::vector<std::string>;
using result_type = boost::variant<std::string, double, string_vec>;

template <typename Parser>
void parse(const std::string message, const std::string &input, const std::string &rule, const Parser &parser) {
    auto iter = input.begin(), end = input.end();

    std::cout << "-------------------------\n";
    std::cout << message << "\n";
    std::cout << "Rule:     " << rule  << "\n";
    std::cout << "Parsing: '" << input << "'\n";

    result_type parsed_result;
    bool result = phrase_parse(iter, end, parser, x3::space, parsed_result);

    if (result) {
        std::cout << "Parsed " << parsed_result << "\n";
    } else {
        std::cout << "Parser failed\n";
    }
    if (iter != end)
        std::cout << "EOI not reached. Unparsed: '" << std::string(iter, end) << "'\n";
}

namespace parser {

    template <typename... Parsers>
    struct longest_parser : x3::parser_base {
        longest_parser(Parsers... sub) : _alternatives {sub...} { }

        template <typename It, typename Ctx, typename Other, typename Attr>
        bool parse(It& f, It l, Ctx& ctx, Other const& other, Attr& attr) const {
            auto const saved = f;

            //// To exclude pre-skip from length comparisons, do pre-skip here:
            // x3::skip_over(f, l, ctx);
            auto seq = std::make_index_sequence<sizeof...(Parsers)>();

            auto best = select_best(f, l, ctx, seq);
            //std::cout << "Longest match at index #" << best << "\n";

            bool ok = dispatch(f, l, ctx, other, attr, best, seq);

            if (!ok)
                f = saved;

            return ok;
        }

      private:
        template <typename It, typename Ctx, typename P>
        size_t length_of(It f, It l, Ctx ctx, P const& p) const {
            boost::iterator_range<It> matched;
            return x3::raw[p].parse(f, l, ctx, x3::unused, matched)? boost::size(matched) : 0;
        }

        template <typename It, typename Ctx, size_t... I>
            size_t select_best(It f, It l, Ctx& ctx, std::index_sequence<I...>) const {
                std::array<size_t, sizeof...(I)> lengths { length_of(f, l, ctx, std::get<I>(_alternatives))... };
                return std::distance(lengths.begin(), std::max_element(lengths.begin(), lengths.end()));
            }

        template <typename It, typename Ctx, typename Other, typename Attr, size_t... I>
        bool dispatch(It& f, It l, Ctx& ctx, Other const& other, Attr& attr, size_t targetIdx, std::index_sequence<I...>) const {
            //return (real_parse<I>(f, l, ctx, other, attr, targetIdx) || ...);
            std::array<bool, sizeof...(I)> b = { real_parse<I>(f, l, ctx, other, attr, targetIdx)... };

            return std::accumulate(b.begin(), b.end(), false, std::logical_or<bool>());
        }

        template <size_t Idx, typename It, typename Ctx, typename Other, typename Attr>
        bool real_parse(It& f, It l, Ctx& ctx, Other const& other, Attr& attr, size_t targetIdx) const {
            if (targetIdx != Idx)
                return false;

            return std::get<Idx>(_alternatives).parse(f, l, ctx, other, attr);
        }

        std::tuple<Parsers...> _alternatives;
    };

    template <typename... Ps>
        longest_parser<Ps...> longest(Ps... p) { return {x3::as_parser(p)...}; }
}

int main() {
    auto id        = x3::rule<void, std::string> {} = x3::lexeme [ x3::char_("a-zA-Z_") >> *x3::char_("a-zA-Z0-9_") ];
    auto qualified = x3::rule<void, string_vec>  {} = id % "::";

#define TEST_CASE(label, input, rule) parse(label, input, #rule, rule)
    TEST_CASE("unqualified"                , "willy"                , parser::longest(id, x3::int_, x3::double_));
    TEST_CASE("unqualified with whitespace", " willy \t"            , parser::longest(id, x3::int_, x3::double_));
    TEST_CASE("integral or number"         , "123.78::anton::lutz"  , parser::longest(id, x3::int_, x3::double_));
    TEST_CASE("qualified"                  , "willy anton::lutz"    , parser::longest(id, x3::int_, x3::double_));
    TEST_CASE("qualified with whitespace"  , "willy \tanton::lutz"  , parser::longest(id, x3::int_, x3::double_));

    TEST_CASE("unqualified"                , "willy"                , parser::longest(id, x3::int_, x3::double_, qualified));
    TEST_CASE("unqualified with whitespace", " willy \t"            , parser::longest(id, x3::int_, x3::double_, qualified));
    TEST_CASE("integral or number"         , "123.78::anton::lutz"  , parser::longest(id, x3::int_, x3::double_, qualified));
    TEST_CASE("qualified"                  , "willy::anton::lutz"   , parser::longest(id, x3::int_, x3::double_, qualified));
    TEST_CASE("qualified with whitespace"  , "willy ::\tanton::lutz", parser::longest(id, x3::int_, x3::double_, qualified));

    TEST_CASE("unqualified"                , "willy"                , parser::longest(x3::int_, x3::double_, qualified));
    TEST_CASE("unqualified with whitespace", " willy \t"            , parser::longest(x3::int_, x3::double_, qualified));
    TEST_CASE("integral or number"         , "123.78::anton::lutz"  , parser::longest(x3::int_, x3::double_, qualified));
    TEST_CASE("qualified"                  , "willy::anton::lutz"   , parser::longest(x3::int_, x3::double_, qualified));
    TEST_CASE("qualified with whitespace"  , "willy ::\tanton::lutz", parser::longest(x3::int_, x3::double_, qualified));
}

Prints

-------------------------
unqualified
Rule:     parser::longest(id, x3::int_, x3::double_)
Parsing: 'willy'
Parsed willy
-------------------------
unqualified with whitespace
Rule:     parser::longest(id, x3::int_, x3::double_)
Parsing: ' willy    '
Parsed willy
-------------------------
integral or number
Rule:     parser::longest(id, x3::int_, x3::double_)
Parsing: '123.78::anton::lutz'
Parsed 123.78
EOI not reached. Unparsed: '::anton::lutz'
-------------------------
qualified
Rule:     parser::longest(id, x3::int_, x3::double_)
Parsing: 'willy anton::lutz'
Parsed willy
EOI not reached. Unparsed: 'anton::lutz'
-------------------------
qualified with whitespace
Rule:     parser::longest(id, x3::int_, x3::double_)
Parsing: 'willy     anton::lutz'
Parsed willy
EOI not reached. Unparsed: 'anton::lutz'
-------------------------
unqualified
Rule:     parser::longest(id, x3::int_, x3::double_, qualified)
Parsing: 'willy'
Parsed willy
-------------------------
unqualified with whitespace
Rule:     parser::longest(id, x3::int_, x3::double_, qualified)
Parsing: ' willy    '
Parsed willy
-------------------------
integral or number
Rule:     parser::longest(id, x3::int_, x3::double_, qualified)
Parsing: '123.78::anton::lutz'
Parsed 123.78
EOI not reached. Unparsed: '::anton::lutz'
-------------------------
qualified
Rule:     parser::longest(id, x3::int_, x3::double_, qualified)
Parsing: 'willy::anton::lutz'
Parsed [willy][anton][lutz]
-------------------------
qualified with whitespace
Rule:     parser::longest(id, x3::int_, x3::double_, qualified)
Parsing: 'willy ::  anton::lutz'
Parsed [willy][anton][lutz]
-------------------------
unqualified
Rule:     parser::longest(x3::int_, x3::double_, qualified)
Parsing: 'willy'
Parsed [willy]
-------------------------
unqualified with whitespace
Rule:     parser::longest(x3::int_, x3::double_, qualified)
Parsing: ' willy    '
Parsed [willy]
-------------------------
integral or number
Rule:     parser::longest(x3::int_, x3::double_, qualified)
Parsing: '123.78::anton::lutz'
Parsed 123.78
EOI not reached. Unparsed: '::anton::lutz'
-------------------------
qualified
Rule:     parser::longest(x3::int_, x3::double_, qualified)
Parsing: 'willy::anton::lutz'
Parsed [willy][anton][lutz]
-------------------------
qualified with whitespace
Rule:     parser::longest(x3::int_, x3::double_, qualified)
Parsing: 'willy ::  anton::lutz'
Parsed [willy][anton][lutz]

请注意不同的结果取决于替代方案中的解析器表达式。

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)