使用 Pyparse 解析多个项目并将其分组在一起

2024-04-22

这是建立在构建一个简单的解析器,能够使用 PyParse 解析不同的日期格式 https://stackoverflow.com/questions/28113532/build-a-simple-parser-that-is-able-to-parse-different-date-formats-using-pyparse

我有一个解析器,应该将一个或多个用户分组到一个列表中 所以a.parser('show abc, xyz commits from "Jan 10,2015" to "27/1/2015"')应该将两个用户名分组到一个列表中 [abc,xyz]

对于我有的用户:

keywords = ["select", "show", "team", "from", "to", "commits", "and", "or"]
    [select, show, team, _from, _to,  commits, _and, _or] = [ CaselessKeyword(word) for word in keywords ]

user = Word(alphas+"."+alphas)
user2 = Combine(user + "'s")
users = OneOrMore((user|user2))

语法是

bnf = (show|select)+Group(users).setResultsName("users")+Optional(team)+(commits).setResultsName("stats")\
    +Optional(_from + quotedString.setParseAction(removeQuotes)('from') +\
    _to + quotedString.setParseAction(removeQuotes)('to'))

这是错误的。谁能引导我走向正确的方向。 另外, pyparse 中是否有一种方法可以有选择地决定该单词应属于哪一组。我的意思是“xyz”独立应该位于我的用户列表下。但“xyz 团队”应该位于团队列表下。如果提供了可选关键字 team,那么 pyparse 应该对其进行不同的分组。

我无法在网上找到我要找的东西。或者也许我没有在谷歌上正确地提出我的问题?


您走在正确的道路上,请参阅解析器更新中嵌入的注释:

from pyparsing import *

keywords = ["select", "show", "team", "from", "to", "commits", "and", "or"]
[select, show, team, _from, _to,  commits, _and, _or] = [ CaselessKeyword(word) for word in keywords ]

# define an expression to prevent matching keywords as user names - used below in users expression
keyword = MatchFirst(map(CaselessKeyword, keywords))

user = Word(alphas+"."+alphas)  # ??? what are you trying to define here?
user2 = Combine(user + "'s")
# must not confuse keywords like commit with usernames - and use ungroup to 
# unpack single-element token lists
users = ungroup(~keyword + (user|user2))

#~ bnf = (show|select)+Group(users).setResultsName("users")+Optional(team)+(commits).setResultsName("stats") \
    #~ + Optional(_from + quotedString.setParseAction(removeQuotes)('from') +
                    #~ _to + quotedString.setParseAction(removeQuotes)('to'))

def convertToDatetime(tokens):
    # change this code to do your additional parsing/conversion to a Python datetime
    return tokens[0] 
timestamp = quotedString.setParseAction(removeQuotes, convertToDatetime)

# similar to your expression
# - use delimitedList instead of OneOrMore to handle comma-separated list of items
# - add distinction of "xxx team" vs "xxx"
# - dropped expr.setResultsName("name") in favor of short notation expr("name")
# - results names with trailing '*' will accumulate like elements into a single
#   named result (short notation for setResultsName(name, listAllValues=True) )
# - dropped setResultsName("stats") on keyword "commits", no point to this, commits must always be present
#
bnf = ((show|select)("command") + delimitedList(users("team*") + team | users("user*")) + commits + 
            Optional(_from + timestamp('from') + _to + timestamp('to')))

test = 'show abc, def team, xyz commits from "Jan 10,2015" to "27/1/2015"'

print bnf.parseString(test).dump()

Prints:

['show', 'abc', 'def', 'team', 'xyz', 'commits', 'from', 'Jan 10,2015', 'to', '27/1/2015']
- command: show
- from: Jan 10,2015
- team: ['def']
- to: 27/1/2015
- user: ['abc', 'xyz']
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

使用 Pyparse 解析多个项目并将其分组在一起 的相关文章

随机推荐