如何将 C++ flex 与 C++ Bison 连接起来？

2024-05-03

我正在尝试将 C++ flex 与 C++ bison 连接起来，但我被难住了。 Bison 3.8.1 手册有一个带有 C Flex 的 C++ Bison 示例。 Flex 2.6.4 没有示例。我试图解决的问题是如何向 C++（或 C）Bison 提供指向 C++ Flex 对象的实例指针。我最好的想法是使用YY_DECL定义要使用的 Flex 扫描仪# define YY_DECL bison::symbol_type flx->yylex()并通过解析器调用序列将 flx 传递给 Bison，重新定义“解析”。这样对吗？还有更好的方法吗？

将 Flex 和 Bison 切换到 C++ 就像添加标志一样简单%option c++ https://ftp.gnu.org/old-gnu/Manuals/flex-2.5.4/html_chapter/flex_19.html#SEC19 and %language "c++" https://www.gnu.org/software/bison/manual/html_node/A-Simple-C_002b_002b-Example.html分别，在这两种情况下，这都会导致生成的代码变成可重入 https://stackoverflow.com/questions/2441351/what-is-a-re-entrant-parser，正如您所注意到的，这会干扰这两者之间的互操作性。

默认情况下，在 C 语言中，Flex 和 Bison 都将其状态存储在全局变量中。在 C++ 中，它们是面向对象的。 Flex 有一个类yyFlexLexer野牛有课yy::parser。在这种语言中，这是一种更自然的方法，此外，它还允许您通过创建这些类的新对象来多次运行解析器。您甚至可以在多线程程序中同时运行多个解析器。

然而，有一个问题。虽然词法分析器和解析器现在都是 C++ 且可重入，但它们仍然假设其对应部分是默认的不可重入代码。因此，他们试图访问不再存在的全局状态变量。解决这个问题需要一些修补。

一个最小的例子

一个完整的示例，可以复制粘贴作为新程序的基础，将比仅仅解释更有用。

让我们从一个简单的示例开始，该示例仅展示如何使 C++ Flex 和 Bison 进行通信。我们将编写一个简短的 Flex-Bison 程序，该程序需要格式输入Hello X!并打印回来Goodbye X!.

fooLexer.ll:

%{
    #include "FooLexer.hh"
    #include "fooParser.tab.hh"
    
    #undef  YY_DECL
    #define YY_DECL int FooLexer::yylex(std::string *const yylval)
%}

%option c++ noyywrap

%option yyclass="FooLexer"

%%

[[:space:]] ;
Hello { return yy::parser::token::HELLO; }
[[:alpha:]]+ { *yylval = std::string(yytext, yytext + yyleng); return yy::parser::token::WORLD; }
. { return yytext[0]; }

FooLexer.hh:

#pragma once

#include <string>
#if ! defined(yyFlexLexerOnce)
#include <FlexLexer.h>
#endif

class FooLexer : public yyFlexLexer
{
public:
    int yylex(std::string *const yylval);
};

这两个文件是我们的词法分析器。我们不使用默认的词法分析器类，而是定义自己的继承自它的类。我们这样做是因为默认实现不接受函数的参数yylex我们需要一个通过yylval进去。

让我们来分解一下最有趣的几行：

#undef YY_DECL- C++ Flex 仍然大量使用宏。YY_DECL https://ftp.gnu.org/old-gnu/Manuals/flex-2.5.4/html_chapter/flex_10.html#SEC10存储函数的声明yylval它将产生。我们删除默认值，即int FooLexer::yylex().
#define YY_DECL int FooLexer::yylex(std::string *const lval)- 现在，我们用我们需要的函数声明替换删除的值。
%option c++ https://ftp.gnu.org/old-gnu/Manuals/flex-2.5.4/html_chapter/flex_19.html#SEC19- 我们将输出语言切换为 C++。
%option yyclass="FooLexer"- 最后，我们设置词法分析器应该使用哪个类而不是yyFlexLexer。它将创建该方法yylex在这堂课上。
#include <FlexLexer.h>- 与 C 代码不同，Flex 生成的 C++ 代码需要外部标头FlexLexer.h https://ftp.gnu.org/old-gnu/Manuals/flex-2.5.4/html_chapter/flex_19.html#SEC19。它应该与 Flex 一起安装在您的系统中。
#if ! defined(yyFlexLexerOnce) & #endif- 我们使用 Flex 机制来确保 header<lexLexer.h>仅添加一次。（这是一个有点非标准的解决方案，但如果需要的话，我们可以多次包含它。）
int yylex(std::string *const yylval);- 我们确实声明了该函数，但定义是由 Flex 提供的。

fooParser.yy:

%require "3.2"
%language "c++"

%code requires {
    #include <string>
    #include "FooLexer.hh"
}

%define api.value.type {std::string}

%parse-param {FooLexer &lexer}

%header

%code {
    #define yylex lexer.yylex
}

%token HELLO
%token WORLD

%%

hello_world: HELLO WORLD '!' { std::cout << "Goodbye " << $WORLD << '!' << std::endl; }

%%

void yy::parser::error(const std::string &message)
{
    std::cerr << "Error: " << message << std::endl;
}

对于解析器，我们不创建自己的类。 Bison 在这方面更聪明一些，它使得调整代码变得更加简单。例如，它正确地猜测应该采取yylval作为一个论据，所以我们不需要担心这一点。

尽管如此，还是有一些值得注意的变化：

%require "3.2" https://www.gnu.org/software/bison/manual/html_node/Require-Decl.html- 该指令不仅确保安装的 Bison 版本支持 C++。它还可以防止创建冗余结果文件stack.hh.
%language "c++" https://www.gnu.org/software/bison/manual/html_node/A-Simple-C_002b_002b-Example.html- 我们将输出语言切换为 C++。
- 该指令向解析器类的构造函数添加了一个附加参数。我们用它来将词法分析器传递给解析器。
#define yylex lexer.yylex- 解析器仍然假设yylex是一个全局函数。我们使用预处理器将其更改为我们传递给构造函数的词法分析器的方法。
void yy::parser::error(const std::string &message)- 我们不再需要在文件开头声明错误处理程序。然而，我们仍然需要定义它。该定义现在指向一个命名空间yy和班级parser这是解析器类的默认位置。

main.cc:

#include "FooLexer.hh"
#include "fooParser.tab.hh"

int main()
{
    FooLexer lexer;
    yy::parser parser(lexer);
    return parser();
}

现在我们只需要创建词法分析器和解析器类的对象就可以了。解析器类是functor https://stackoverflow.com/questions/356950/what-are-c-functors-and-their-uses所以我们可以简单地调用它。

Bonus - makefile:

.RECIPEPREFIX = >

prog: main.o fooParser.tab.o lex.yy.o
> g++ $^ -o $@

main.o: main.cc FooLexer.hh fooParser.tab.hh
> g++ -c $< -o $@

lex.yy.o: lex.yy.cc FooLexer.hh fooParser.tab.hh
> g++ -c $< -o $@

fooParser.tab.o: fooParser.tab.cc FooLexer.hh
> g++ -c $< -o $@

lex.yy.cc: fooLexer.ll
> flex $<

fooParser.tab.hh fooParser.tab.cc fooParser.output: fooParser.yy
> bison $<

.PHONY: clean
clean:
> rm -f prog main.o lex.* fooParser.tab.* stack.hh

一个扩展的例子

让我们扩展这个示例，一方面了解如何添加/修改 C++ 解析器的各个方面，另一方面将其转换为可在实际应用程序中使用的代码。

目前，词法分析器和解析器位于不同的命名空间中，因此我们将它们放在同一个命名空间中（foo）。我们还将把他们的名字改为我们选择的名字。（这也包括原始词法分析器类的名称，出于稍后解释的技术原因。）

我们将修改词法分析器的构造函数，以便能够将文件传递给它，而不是读取标准输入。

我们将向解析器添加位置，以跟踪输入行号并给出更有意义的错误消息。

我们还将向程序添加打印调试日志的功能，以帮助编写复杂的解析器。

最后，我们将启用一些有用的杂项选项并添加一些辅助函数。

location_t.hh:

#pragma once

#include <cstddef>
#include <ostream>
#include <utility>

namespace foo
{
    using position_t = std::size_t;
    using location_t = std::pair<std::size_t, std::size_t>;
}

inline std::ostream& operator<<(std::ostream& os, const foo::location_t& loc)
{
    return os << "[" << loc.first << "-" << loc.second << "]";
}

为了在 Bison 中跟踪令牌位置，我们可以使用默认提供的位置类实现，也可以创建我们自己的位置类。我发现默认实现有点缺乏，所以我们采用了第二个选项。

Bison 将与位置相关的类型命名如下：

“位置” - 文件中的特定点（默认 Bison 实现 https://www.gnu.org/software/bison/manual/html_node/C_002b_002b-position.html),
“location” - 由其开始和结束位置定义的令牌位置（默认 Bison 实现 https://www.gnu.org/software/bison/manual/html_node/C_002b_002b-location.html).

为了保持一致性，我们在实现中使用了相同的约定。

这是一个非常简单的实现，其中位置只是一个整数，存储行号。在实际程序中，我建议至少跟踪行号和列，甚至可能跟踪文件中的绝对位置。

我们还添加了operator<<为我们的位置。它通常很有用，但在我们的例子中，它是严格必需的，因为 Bison 在调试日志中使用它（我们将启用它）。

fooLexer.ll:

%{
    #include "FooLexer.hh"
    #include "fooParser.tab.hh"
    
    using namespace foo;
    
    #undef  YY_DECL
    #define YY_DECL int FooLexer::yylex(std::string *const lval, location_t *const lloc)
    
    #define YY_USER_INIT yylval = lval; yylloc = lloc;
    
    #define YY_USER_ACTION copyLocation();
%}

%option c++ noyywrap debug

%option yyclass="FooLexer"
%option prefix="yy_foo_"

%%

%{
    using Token = FooBisonParser::token;
%}

\n { ++currentLine; }
[[:space:]] ;
Hello { return Token::HELLO; }
[[:alpha:]]+ { copyValue(); return Token::WORLD; }
. { return yytext[0]; }

FooLexer.hh:

#pragma once

#include <string>
#if ! defined(yyFlexLexerOnce)
#define yyFlexLexer yy_foo_FlexLexer
#include <FlexLexer.h>
#undef yyFlexLexer
#endif
#include "location_t.hh"

namespace foo
{
    class FooLexer : public yy_foo_FlexLexer
    {
        std::size_t currentLine = 1;
        
        std::string *yylval = nullptr;
        location_t *yylloc = nullptr;
        
        void copyValue(const std::size_t leftTrim = 0, const std::size_t rightTrim = 0, const bool trimCr = false);
        void copyLocation() { *yylloc = location_t(currentLine, currentLine); }
        
    public:
        FooLexer(std::istream &in, const bool debug) : yy_foo_FlexLexer(&in) { yy_foo_FlexLexer::set_debug(debug); }
        
        int yylex(std::string *const lval, location_t *const lloc);
    };
    
    inline void FooLexer::copyValue(const std::size_t leftTrim, const std::size_t rightTrim, const bool trimCr)
    {
        std::size_t endPos = yyleng - rightTrim;
        if (trimCr && endPos != 0 && yytext[endPos - 1] == '\r')
            --endPos;
        *yylval = std::string(yytext + leftTrim, yytext + endPos);
    }
}

我们的词法分析器有很多变化，其中大部分启用了位置，少数是编辑命名空间和名称，其余的只是为了我们将来的方便：

using namespace foo;- 我们无法将词法分析器的整个代码放入命名空间中，因此这是下一个最佳选择。（这被认为是一种不好的做法 https://stackoverflow.com/questions/1452721/why-is-using-namespace-std-considered-bad-practice但我认为在这种特殊情况下它是无害的。）
#define YY_DECL int FooLexer::yylex(std::string *const lval, location_t *const lloc)- 我们添加了一个参数lloc到解析器，这是解析器传递的位置。（YY_DECL https://ftp.gnu.org/old-gnu/Manuals/flex-2.5.4/html_chapter/flex_10.html#SEC10)
#define YY_USER_INIT yylval = lval; yylloc = lloc;- 我们无法编写自己的实现yylex but YY_USER_INIT https://ftp.gnu.org/old-gnu/Manuals/flex-2.5.4/html_chapter/flex_14.html#SEC14让我们在默认实现的开头插入一些额外的代码。我们用它来将函数参数保存到对象的字段中。这将使我们可以轻松地通过其他方法访问它们。
#define YY_USER_ACTION copyLocation(); - YY_USER_ACTION https://ftp.gnu.org/old-gnu/Manuals/flex-2.5.4/html_chapter/flex_14.html#SEC14插入到词法分析器中每个操作的前面。我们用它来将每个令牌的位置复制到yylloc.
%option prefix="yy_foo_"- 我们更改了默认值prefix https://ftp.gnu.org/old-gnu/Manuals/flex-2.5.4/html_chapter/flex_17.html#SEC17 yyFlex 用于yy_foo_。实际上，这会将内部词法分析器类（我们继承的类）的名称更改为yy_foo_FlexLexer。如果我们的程序中需要多个词法分析器，那么这是必要的。在这种情况下，每个词法分析器都需要不同的前缀以避免名称冲突。
using Token = FooBisonParser::token;- 这只是让我们写Token实际行动而不是完整行动FooBisonParser::token https://www.gnu.org/software/bison/manual/html_node/C_002b_002b-Parser-Interface.html#index-token-1.
\n { ++currentLine; }- 我们仍然不会在任何空白处发出标记，但每次遇到换行符时我们都需要增加内部行计数器。
#define yyFlexLexer yy_foo_FlexLexer & #undef yyFlexLexer- 并非所有词法分析器的代码都会生成。我们还包括了不知道我们已经更改了词法分析器前缀的头文件。这trick https://ftp.gnu.org/old-gnu/Manuals/flex-2.5.4/html_chapter/flex_19.html#SEC19解决这个问题。（如果您有多个词法分析器，则需要多次包含此标头，并使用不同的#defines.)
std::size_t currentLine = 1;- 我们的内部字段，我们用来跟踪当前行号yylloc.
std::string *yylval = nullptr; & location_t *yylloc = nullptr;- 带有解析器传递的指针副本的字段yylex。它们在这里是为了更容易地在类的其他方法中访问这些指针。
void copyValue(const std::size_t leftTrim = 0, const std::size_t rightTrim = 0, const bool trimCr = false);- 一个方便的方法，让我们轻松复制当前内容yytext into yylval。我们可以在行动中使用它。我发现从字符串的开头和结尾截去一些字符的选项非常有用，例如当我们匹配一个字符串文字并且只想复制其内容而不需要复制它时"。删除尾随的选项'\r'也有用途。
void copyLocation()- 将当前令牌的位置保存到的便捷方法yylloc。如果语法中有多行标记，情况会变得更加复杂。
FooLexer(std::istream &in, const bool debug) : yy_foo_FlexLexer(&in) { yy_foo_FlexLexer::set_debug(debug); }- 我们向构造函数添加了更多参数，这让我们可以选择输入源，并在词法分析器中打开调试日志。

fooParser.yy:

%require "3.2"
%language "c++"

%code requires {
    #include <string>
    #include "location_t.hh"
    #include "FooLexer.hh"
}

%define api.namespace {foo}
%define api.parser.class {FooBisonParser}
%define api.value.type {std::string}
%define api.location.type {location_t}

%locations
%define parse.error detailed
%define parse.trace

%header
%verbose

%parse-param {FooLexer &lexer}
%parse-param {const bool debug}

%initial-action
{
    #if YYDEBUG != 0
        set_debug_level(debug);
    #endif
};

%code {
    namespace foo
    {
        template<typename RHS>
        void calcLocation(location_t &current, const RHS &rhs, const std::size_t n);
    }
    
    #define YYLLOC_DEFAULT(Cur, Rhs, N) calcLocation(Cur, Rhs, N)
    #define yylex lexer.yylex
}

%token HELLO
%token WORLD

%expect 0

%%

hello_world: HELLO WORLD '!' { std::cout << "Goodbye " << $WORLD << '!' << std::endl; }

%%

namespace foo
{
    template<typename RHS>
    inline void calcLocation(location_t &current, const RHS &rhs, const std::size_t n)
    {
        current = location_t(YYRHSLOC(rhs, 1).first, YYRHSLOC(rhs, n).second);
    }
    
    void FooBisonParser::error(const location_t &location, const std::string &message)
    {
        std::cerr << "Error at lines " << location << ": " << message << std::endl;
    }
}

当涉及到我们即将进行的更改时，Bison 界面比 Flex 更加用户友好，但添加自定义位置仍然需要大量代码。

%define api.namespace {foo} https://www.gnu.org/software/bison/manual/html_node/C_002b_002b-Bison-Interface.html- 我们已指示 Bison 将其所有代码放入命名空间中foo而不是默认的yy.
%define api.parser.class {FooBisonParser} https://www.gnu.org/software/bison/manual/html_node/_0025define-Summary.html#index-_0025define-api_002eparser_002eclass- 我们已经指示 Bison 命名它的解析器类FooBisonParser而不是默认的parser.
%define api.location.type {location_t} https://www.gnu.org/software/bison/manual/html_node/_0025define-Summary.html#index-_0025define-api_002elocation_002etype- 我们已指示 Bison 使用我们的位置类型而不是默认位置类型。（see also https://www.gnu.org/software/bison/manual/html_node/User-Defined-Location-Type.html)
%locations https://www.gnu.org/software/bison/manual/html_node/Decl-Summary.html#index-_0025locations我们已指示 Bison 生成处理位置所需的代码。这会导致一些方法的声明获得一个附加参数 - 位置。（这包括yylex.) 我们还需要编写一个新函数来计算由多个较小令牌组成的令牌的位置。
%define parse.error detailed https://www.gnu.org/software/bison/manual/html_node/_0025define-Summary.html#index-_0025define-parse_002eerror- 我们已指示 Bison 生成更详细的错误消息，而不仅仅是“语法错误”。
%define parse.trace https://www.gnu.org/software/bison/manual/html_node/Enabling-Traces.html#index-_0025define-parse_002etrace-1- 我们已指示 Bison 生成可以在执行期间打印调试日志的代码。
%verbose https://www.gnu.org/software/bison/manual/html_node/Decl-Summary.html#index-_0025verbose- 我们已指示 Bison 生成额外的输出文件fooParser.output其中包含生成的状态机的人类可读的描述。作为解释调试日志的参考非常有用。
%parse-param {const bool debug} https://www.gnu.org/software/bison/manual/html_node/Parser-Function.html#index-_0025parse_002dparam- 我们向解析器的构造函数添加了一个附加参数。
set_debug_level(debug); https://www.gnu.org/software/bison/manual/html_node/C_002b_002b-Parser-Interface.html#index-set_005fdebug_005flevel-on-parser- 我们使用 new 构造函数参数的值来决定是否打印调试日志。（%initial-action https://www.gnu.org/software/bison/manual/html_node/Initial-Action-Decl.html#index-_0025initial_002daction-1)
#if YYDEBUG != 0 & #endif- 这是一个额外的故障保护，如果没有，则允许编译%define parse.trace https://www.gnu.org/software/bison/manual/html_node/Enabling-Traces.html#index-_0025define-parse_002etrace-1. (YYDEBUG https://www.gnu.org/software/bison/manual/html_node/Enabling-Traces.html#index-YYDEBUG)
void calcLocation(location_t &current, const RHS &rhs, const std::size_t n);- 该函数将获取较大令牌的所有子令牌的位置，并计算其位置。在我们的例子中，我们只获取第一个标记的开始位置和最后一个标记的结束位置。
#define YYLLOC_DEFAULT(Cur, Rhs, N) calcLocation(Cur, Rhs, N) https://www.gnu.org/software/bison/manual/html_node/Location-Default-Action.html- 我们已指示 Bison 使用我们的函数来计算位置。
%expect 0 https://www.gnu.org/software/bison/manual/html_node/Expect-Decl.html- 此行确保语法中不存在冲突。它对于跟踪我们已经知道和允许的冲突数量很有用。
void FooBisonParser::error(const location_t &location, const std::string &message)- 现在，打印错误消息的函数还需要获取错误的位置。

main.cc:

#include <cstring>
#include <iostream>
#include "FooLexer.hh"
#include "fooParser.tab.hh"

int main(int argc, char* argv[])
{
    const bool debug = argc > 1 && std::strcmp(argv[1], "--debug") == 0;
    foo::FooLexer lexer(std::cin, debug);
    foo::FooBisonParser parser(lexer, debug);
    return parser();
}

main 函数的主要变化是它检查程序是否使用标志调用--debug并将此信息传递给词法分析器和解析器。

我们还明确地通过std::cin https://en.cppreference.com/w/cpp/io/cin作为词法分析器的输入。与前面的示例相比，这并没有改变任何内容，但我们可以轻松地将其更改为std::istream https://en.cppreference.com/w/cpp/io/basic_istream打开一个文件，甚至是程序中的一些内部流。

Bonus - makefile:

.RECIPEPREFIX = >

prog: main.o fooParser.tab.o lex.yy_foo_.o
> g++ $^ -o $@

main.o: main.cc FooLexer.hh fooParser.tab.hh location_t.hh
> g++ -c $< -o $@

lex.yy_foo_.o: lex.yy_foo_.cc FooLexer.hh fooParser.tab.hh location_t.hh
> g++ -c $< -o $@

fooParser.tab.o: fooParser.tab.cc FooLexer.hh location_t.hh
> g++ -c $< -o $@

lex.yy_foo_.cc: fooLexer.ll
> flex $<

fooParser.tab.hh fooParser.tab.cc fooParser.output: fooParser.yy
> bison $<

.PHONY: clean
clean:
> rm -f prog main.o lex.* fooParser.tab.* fooParser.output

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)