FParsec:如何组合解析器以便它们以任意顺序匹配

2024-04-30

任务是找到特定的键值对并解析它们。这些对可以按任何顺序出现。我的部分工作尝试:

open FParsec

type Parser<'a> = Parser<'a, unit>
type Status = Running | Done

type Job = 
    { Id: int
      Status: Status
      Count: int }

let ws = spaces

let jobId: Parser<int> = ws >>. skipStringCI "Job id" >>. ws >>. skipChar '=' >>. ws >>. pint32

let status: Parser<Status> = 
    ws >>. skipStringCI "Status" >>. ws >>. skipChar '=' >>. ws >>. (
        (skipStringCI "Running" >>% Running) <|> (skipStringCI "Done" >>% Done))

let count: Parser<int> = ws >>. skipStringCI "Count" >>. ws >>. skipChar '=' >>. ws >>. pint32

let parse: Parser<Job> = parse {
    do! skipCharsTillStringCI "Job id" false 1000
    let! id = jobId
    do! skipCharsTillStringCI "Status" false 1000
    let! status = status
    do! skipCharsTillStringCI "Count" false 1000
    let! count = count
    return { Id = id; Status = status; Count = count }}

[<EntryPoint>]
let main argv = 
    let sample = """
Some irrelevant text.
Job id = 33
Some other text.
Status = Done
And another text.
Count = 10
Trailing text.
"""
    printfn "%A" (run parse sample)
    0
(* 
result:
 Success: {Id = 33;
 Status = Done;
 Count = 10;} 
*)

所以,它可以工作,但有两个问题:明显的重复(jobId函数中的“Job id”和顶级解析器中的“Job id”等),并且它需要“Job id”,“Status”和“Count” “按照这个特定的顺序进行排序,这根据要求是错误的。

我有一种强烈的感觉,有一个优雅的解决方案可以解决这个问题。

Thanks!


第一个问题(重复)可以通过少量重构来解决。基本思想是将每个解析器包装到一个可以执行跳过的包装器中。
Note由于这段代码还远未达到完美,我只是尝试使重构尽可能小。

let jobId: Parser<int> = pint32

let status: Parser<Status> = 
    (skipStringCI "Running" >>% Running) <|> (skipStringCI "Done" >>% Done)

let count: Parser<int> = pint32

let skipAndParse prefix parser =
    skipCharsTillStringCI prefix false 1000
    >>. ws >>. skipStringCI prefix >>. ws >>. skipChar '=' >>. ws >>. parser

let parse: Parser<Job> = parse {
    let! id = skipAndParse "Job id" jobId
    let! status = skipAndParse "Status"  status
    let! count = skipAndParse "Count" count
    return { Id = id; Status = status; Count = count }}

第二个问题更复杂。如果您希望数据线以自由顺序出现,则必须考虑以下情况:

  • not all存在数据线;
  • 出现某条数据线twice或者更多;

为了缓解这种情况,您需要生成一个list找到的数据行,分析所需的一切是否都存在,并决定如何处理任何可能的重复项。

请注意,每个数据行不能再有“跳过”部分,因为它可能会在实际解析器之前跳过信息行。

let skipAndParse2 prefix parser =
    ws >>. skipStringCI prefix >>. ws >>. skipChar '=' >>. ws >>. parser

// Here, you create a DU that will say which data line was found
type Result =
    | Id of int
    | Status of Status
    | Count of int
    | Irrelevant of string

// here's a combinator parser
let parse2 =
    // list of possible data line parsers
    // Note they are intentionally reordered
    [
    skipAndParse2 "Count" count |>> Count
    skipAndParse2 "Status"  status |>> Status
    skipAndParse2 "Job id" jobId |>> Id
    // the trailing one would skip a line in case if it has not
    // been parsed by any of prior parsers
    // a guard rule is needed because of specifics of
    // restOfLine behavior at the end of input: namely, it would
    // succeed without consuming an input, which leads
    // to an infinite loop. Actually FParsec handles this and
    // raises an exception
    restOfLine true .>> notFollowedByEof |>> Irrelevant
    ]
    |> List.map attempt // each parser is optional
    |> choice // on each iteration, one of the parsers must succeed
    |> many // a loop

运行代码:

let sample = "
Some irrelevant text.\n\
Job id = 33\n\
Some other text.\n\
Status = Done\n\
And another text.\n\
Count = 10\n\
Trailing text.\n\
"

sample |> run parse2 |> printfn "%A "

将产生以下输出:

Success: [Irrelevant ""; Irrelevant "Some irrelevant text."; Id 33;
Irrelevant ""; Irrelevant "Some other text."; Status Done; Irrelevant "";
Irrelevant "And another text."; Count 10; Irrelevant ""]

进一步处理需要过滤Irrelevant元素,检查重复或缺失的项目,并形成Job记录或提出错误。


更新:进一步处理隐藏的简单示例Result并返回Job option反而:

// naive implementation of the record maker
// return Job option
// ignores duplicate fields (uses the first one)
// returns None if any field is missing
let MakeJob arguments =
    let a' =
        arguments
        |> List.filter (function |Irrelevant _ -> false | _ -> true)

    try
        let theId     = a' |> List.pick (function |Id x -> Some x | _ -> None)
        let theStatus = a' |> List.pick (function |Status x -> Some x | _ -> None)
        let theCount  = a' |> List.pick (function |Count x -> Some x | _ -> None)
        Some { Id=theId; Status = theStatus; Count = theCount }
    with
        | :?System.Collections.Generic.KeyNotFoundException -> None

要使用它,只需将以下行添加到代码中parse2:

|>> MakeJob
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

FParsec:如何组合解析器以便它们以任意顺序匹配 的相关文章

随机推荐