一尘不染

防止lxml创建自动关闭标签

python

我有一个(旧的)工具,该工具不了解像这样的自动关闭标签<STATUS/>。所以,我们需要序列与这样的开启/关闭的标签我们的XML文件:<STATUS></STATUS>

目前我有:

>>> from lxml import etree

>>> para = """<ERROR>The status is <STATUS></STATUS>.</ERROR>"""
>>> tree = etree.XML(para)
>>> etree.tostring(tree)
'<ERROR>The status is <STATUS/>.</ERROR>'

如何使用打开/关闭的标签进行序列化?

<ERROR>The status is <STATUS></STATUS>.</ERROR>

由wildwilhelm给出,如下:

>>> from lxml import etree

>>> para = """<ERROR>The status is <STATUS></STATUS>.</ERROR>"""
>>> tree = etree.XML(para)
>>> for status_elem in tree.xpath("//STATUS[string() = '']"):
...     status_elem.text = ""
>>> etree.tostring(tree)
'<ERROR>The status is <STATUS></STATUS>.</ERROR>'

阅读 174

收藏
2021-01-20

共1个答案

一尘不染

似乎<STATUS>标签已分配了的text属性None

>>> tree[0]
<Element STATUS at 0x11708d4d0>
>>> tree[0].text
>>> tree[0].text is None
True

如果text<STATUS>标记的属性设置为空字符串,则应获得所需的内容:

>>> tree[0].text = ''
>>> etree.tostring(tree)
'<ERROR>The status is <STATUS></STATUS>.</ERROR>'

考虑到这一点,您可能可以text在编写XML之前遍历DOM树并修复属性。像这样:

# prevent creation of self-closing tags
for node in tree.iter():
    if node.text is None:
        node.text = ''
2021-01-20