带有 lxml 子路径的 XPath 谓词?

2024-02-20

我试图理解发送给我的用于 ACORD XML 表单(保险中的常见格式)的 XPath。他们发给我的 XPath 是(为了简洁而被截断):

./PersApplicationInfo/InsuredOrPrincipal[InsuredOrPrincipalInfo/InsuredOrPrincipalRoleCd="AN"]/GeneralPartyInfo

我遇到麻烦的地方是Python的lxml library http://lxml.de/告诉我[InsuredOrPrincipalInfo/InsuredOrPrincipalRoleCd="AN"] is an invalid predicate。我无法在任何地方找到谓词的 XPath 规范 http://www.w3.org/TR/xpath/#NT-Predicate它标识了这个语法,以便我可以修改这个谓词以使其工作。

有没有关于这个谓词到底选择什么的文档?另外,这是否是一个有效的谓词,或者是否有什么地方被破坏了?

可能相关:

我相信与我合作的公司是一家 MS 商店,因此该 XPath 在 C# 或该堆栈中的其他语言中可能有效?我不太确定。

Updates:

根据评论要求,这里有一些附加信息。

XML 示例:

<ACORD>
  <InsuranceSvcRq>
    <HomePolicyQuoteInqRq>
      <PersPolicy>
        <PersApplicationInfo>
            <InsuredOrPrincipal>
                <InsuredOrPrincipalInfo>
                    <InsuredOrPrincipalRoleCd>AN</InsuredOrPrincipalRoleCd>
                </InsuredOrPrincipalInfo>
                <GeneralPartyInfo>
                    <Addr>
                        <Addr1></Addr1>
                    </Addr>
                </GeneralPartyInfo>
            </InsuredOrPrincipal>
        </PersApplicationInfo>
      </PersPolicy>
    </HomePolicyQuoteInqRq>
  </InsuranceSvcRq>
</ACORD>

代码示例(使用完整的 XPath 而不是片段):

>>> from lxml import etree
>>> tree = etree.fromstring(raw)
>>> tree.find('./InsuranceSvcRq/HomePolicyQuoteInqRq/PersPolicy/PersApplicationInfo/InsuredOrPrincipal[InsuredOrPrincipalInfo/InsuredOrPrincipalRoleCd="AN"]/GeneralPartyInfo/Addr/Addr1')
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "lxml.etree.pyx", line 1409, in lxml.etree._Element.find (src/lxml/lxml.etree.c:39972)
  File "/Library/Python/2.5/site-packages/lxml-2.3-py2.5-macosx-10.3-i386.egg/lxml/_elementpath.py", line 271, in find
    it = iterfind(elem, path, namespaces)
  File "/Library/Python/2.5/site-packages/lxml-2.3-py2.5-macosx-10.3-i386.egg/lxml/_elementpath.py", line 261, in iterfind
    selector = _build_path_iterator(path, namespaces)
  File "/Library/Python/2.5/site-packages/lxml-2.3-py2.5-macosx-10.3-i386.egg/lxml/_elementpath.py", line 245, in _build_path_iterator
    selector.append(ops[token[0]](_next, token))
  File "/Library/Python/2.5/site-packages/lxml-2.3-py2.5-macosx-10.3-i386.egg/lxml/_elementpath.py", line 207, in prepare_predicate
    raise SyntaxError("invalid predicate")
SyntaxError: invalid predicate

Change tree.find to tree.xpath. find and findall存在于 lxml 中以提供与 ElementTree 的其他实现的兼容性。这些方法并没有实现整个 XPath 语言 http://lxml.de/FAQ.html#what-are-the-findall-and-xpath-methods-on-element-tree。要使用包含更高级功能的 XPath 表达式,请使用xpath方法,将XPath类,或XPathEvaluator.

例如:

import io
import lxml.etree as ET

content='''\
<ACORD>
  <InsuranceSvcRq>
    <HomePolicyQuoteInqRq>
      <PersPolicy>
        <PersApplicationInfo>
            <InsuredOrPrincipal>
                <InsuredOrPrincipalInfo>
                    <InsuredOrPrincipalRoleCd>AN</InsuredOrPrincipalRoleCd>
                </InsuredOrPrincipalInfo>
                <GeneralPartyInfo>
                    <Addr>
                        <Addr1></Addr1>
                    </Addr>
                </GeneralPartyInfo>
            </InsuredOrPrincipal>
        </PersApplicationInfo>
      </PersPolicy>
    </HomePolicyQuoteInqRq>
  </InsuranceSvcRq>
</ACORD>
'''
tree=ET.parse(io.BytesIO(content))
path='//PersApplicationInfo/InsuredOrPrincipal[InsuredOrPrincipalInfo/InsuredOrPrincipalRoleCd="AN"]/GeneralPartyInfo'
result=tree.xpath(path)
print(result)

yields

[<Element GeneralPartyInfo at b75a8194>]

while tree.find yields

SyntaxError: invalid node predicate
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

带有 lxml 子路径的 XPath 谓词? 的相关文章

随机推荐