如何在 Python/ElementTree 中输出 XML 声明

2024-02-29

我正在尝试为 XML 格式的单词参考源文件创建一个 XML 文件。当我写入文件时，仅显示“xml_decation=True”<?xml version='1.0' encoding='us-ascii'?>但我想要它的形式<?xml version="1.0"?>.

from xml.etree.ElementTree import ElementTree
from xml.etree.ElementTree import Element
import xml.etree.ElementTree as ET
import uuid
from lxml import etree

root=Element('b:sources')
root.set('SelectedStyle','')
root.set('xmlns:b','http://schemas.openxmlformats.org/officeDocument/2006/bibliography')
root.set('xmlns','http://schemas.openxmlformats.org/officeDocument/2006/bibliography')
#root.attrib=('SelectedStyle'='', 'xmlns:b'='"http://schemas.openxmlformats.org/officeDocument/2006/bibliography"', 'xmlns:b'='"http://schemas.openxmlformats.org/officeDocument/2006/bibliography"','xmlns'='"http://schemas.openxmlformats.org/officeDocument/2006/bibliography"')


source=ET.SubElement(root, 'b:source')
ET.SubElement(source,'b:Tag')
ET.SubElement(source,'b:SourceType').text='Misc'
ET.SubElement(source,'b:guid').text=str(uuid.uuid1())

Author=ET.SubElement(source,'b:Author')
Author2=ET.SubElement(Author,'b:Author')
ET.SubElement(Author2,'b:Corporate').text='Norsk olje og gass'

ET.SubElement(source, 'b:Title').text='R-002'
ET.SubElement(source, 'b:Year').text='2019'
ET.SubElement(source, 'b:Month').text='10'
ET.SubElement(source, 'b:Day').text='27'


tree=ElementTree(root)

tree.write('Sources.xml', xml_declaration=True, method='xml')

Answer:

使用时xml.etree.ElementTree无法避免在声明中包含编码属性。如果您根本不需要 XML 声明中的编码属性，则需要使用xml.dom.minidom not xml.etree.ElementTree.

这是设置示例的片段：

import xml.etree.ElementTree
a = xml.etree.ElementTree.Element('a')
tree = xml.etree.ElementTree.ElementTree(element=a)
root = tree.getroot()

省略编码：

out = xml.etree.ElementTree.tostring(root, xml_declaration=True)

b"<?xml version='1.0' encoding='us-ascii'?>\n<a />"

编码`us-ascii`:

out = xml.etree.ElementTree.tostring(root, encoding='us-ascii', xml_declaration=True)

b"<?xml version='1.0' encoding='us-ascii'?>\n<a />"

编码`unicode`:

out = xml.etree.ElementTree.tostring(root, encoding='unicode', xml_declaration=True)

"<?xml version='1.0' encoding='UTF-8'?>\n<a />"

Using `minidom`:

让我们采用上面的第一个示例，省略编码并使用变量out作为输入xml.dom.minidom您将看到您正在寻找的输出。

import xml.dom.minidom
dom = xml.dom.minidom.parseString(out)
dom.toxml()

'<?xml version="1.0" ?><a/>'

还有一个漂亮的打印选项：

dom.toprettyxml()

'<?xml version="1.0" ?>\n<a/>\n'

Note

查看源代码，您可以看到编码是硬编码在输出中的。

        with _get_writer(file_or_filename, encoding) as (write, declared_encoding):
            if method == "xml" and (xml_declaration or
                    (xml_declaration is None and
                     declared_encoding.lower() not in ("utf-8", "us-ascii"))):
                write("<?xml version='1.0' encoding='%s'?>\n" % (
                    declared_encoding,))

https://github.com/python/cpython/blob/550c44b89513ea96d209e2ff761302238715f082/Lib/xml/etree/ElementTree.py#L731-L736 https://github.com/python/cpython/blob/550c44b89513ea96d209e2ff761302238715f082/Lib/xml/etree/ElementTree.py#L731-L736

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)