如何使用 lxml 从本地文件或 url 解析 xml?



ValueError: invalid \x escape


from lxml import etree

我是 lxml 的新手。请帮我解决这个问题。 这是我的xml内容

<?xml version="1.0"?>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <description>An in-depth look at creating applications 
      with XML.</description>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <description>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world.</description>
   <book id="bk103">
      <author>Corets, Eva</author>
      <title>Maeve Ascendant</title>
      <description>After the collapse of a nanotechnology 
      society in England, the young survivors lay the 
      foundation for a new society.</description>
   <book id="bk104">
      <author>Corets, Eva</author>
      <title>Oberon's Legacy</title>
      <description>In post-apocalypse England, the mysterious 
      agent known only as Oberon helps to create a new life 
      for the inhabitants of London. Sequel to Maeve 
   <book id="bk105">
      <author>Corets, Eva</author>
      <title>The Sundered Grail</title>
      <description>The two daughters of Maeve, half-sisters, 
      battle one another for control of England. Sequel to 
      Oberon's Legacy.</description>
   <book id="bk106">
      <author>Randall, Cynthia</author>
      <title>Lover Birds</title>
      <description>When Carla meets Paul at an ornithology 
      conference, tempers fly as feathers get ruffled.</description>
   <book id="bk107">
      <author>Thurman, Paula</author>
      <title>Splish Splash</title>
      <description>A deep sea diver finds true love twenty 
      thousand leagues beneath the sea.</description>
   <book id="bk108">
      <author>Knorr, Stefan</author>
      <title>Creepy Crawlies</title>
      <description>An anthology of horror stories about roaches,
      centipedes, scorpions  and other insects.</description>
   <book id="bk109">
      <author>Kress, Peter</author>
      <title>Paradox Lost</title>
      <genre>Science Fiction</genre>
      <description>After an inadvertant trip through a Heisenberg
      Uncertainty Device, James Salway discovers the problems 
      of being quantum.</description>
   <book id="bk110">
      <author>O'Brien, Tim</author>
      <title>Microsoft .NET: The Programming Bible</title>
      <description>Microsoft's .NET initiative is explored in 
      detail in this deep programmer's reference.</description>
   <book id="bk111">
      <author>O'Brien, Tim</author>
      <title>MSXML3: A Comprehensive Guide</title>
      <description>The Microsoft MSXML3 parser is covered in 
      detail, with attention to XML DOM interfaces, XSLT processing, 
      SAX and more.</description>
   <book id="bk112">
      <author>Galos, Mike</author>
      <title>Visual Studio 7: A Comprehensive Guide</title>
      <description>Microsoft Visual Studio 7 is explored in depth,
      looking at how Visual Basic, Visual C++, C#, and ASP+ are 
      integrated into a comprehensive development 

另外,我们可以使用 lxml 从 url 解析 xml。


您收到错误消息的原因invalid \x escape是你正在使用etree.fromstring()尝试从文件加载 XML。该函数用于直接从字符串加载 XML,并且您向其传递一个路径\ in it.

实际上,该函数正在尝试将您的文件路径解析为 XML。该路径包含\转义字符后面带有无效字符(即\n将是有效的换行符)

要从文件加载 XML,您需要使用etree.parse()函数如下:

from lxml import etree

root = etree.parse(r'C:\Users\hptphuong\Desktop\xmltest.xml')
# Print the loaded XML
print etree.tostring(root)

将文件路径传递给 Python 函数时,通常应该在字符串前面加上前缀r告诉Python不要尝试逃避\你的路径中的字符。例如c:\temp实际上会导致通过c:<tab character>emp,即\t被转换为制表符。添加r从一开始就阻止了这种情况的发生。


path = "c:\\folder1\\folder2\\myfile.xml"

