Node.getTextContent() 有没有办法获取当前节点的文本内容,而不是后代的文本

2023-12-23

Node.getTextContent() 返回当前节点及其后代的文本内容。

有没有办法获取当前节点的文本内容,而不是后代的文本。

Example

<paragraph>
    <link>XML</link>
    is a 
    <strong>browser based XML editor</strong>
    editor allows users to edit XML data in an intuitive word processor.
</paragraph>

预期产出

paragraph = is a editor allows users to edit XML data in an intuitive word processor.
link = XML
strong = browser based XML editor

我尝试了下面的代码

String str =            "<paragraph>"+
                            "<link>XML</link>"+
                            " is a "+ 
                            "<strong>browser based XML editor</strong>"+
                            "editor allows users to edit XML data in an intuitive word processor."+
                        "</paragraph>";

        org.w3c.dom.Document domDoc = null;
        DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder docBuilder;

        try {
            docBuilder = docFactory.newDocumentBuilder();
            ByteArrayInputStream bis = new ByteArrayInputStream(str.getBytes());
            domDoc = docBuilder.parse(bis);         
        } catch (ParserConfigurationException e1) {         
            e1.printStackTrace();
        } catch (SAXException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }       

        DocumentTraversal traversal = (DocumentTraversal) domDoc;
        NodeIterator iterator = traversal.createNodeIterator(
                domDoc.getDocumentElement(), NodeFilter.SHOW_ELEMENT, null, true);

        for (Node n = iterator.nextNode(); n != null; n = iterator.nextNode()) {           
            String tagname = ((Element) n).getTagName();
            System.out.println(tagname + "=" + ((Element)n).getTextContent());
        }

但它给出这样的输出

paragraph=XML is a browser based XML editoreditor allows users to edit XML data in an intuitive word processor.
link=XML
strong=browser based XML editor

请注意段落元素包含的文本link and strong标签,我不想要。 请提出一些想法?


您想要的是过滤节点的子节点<paragraph>只保留具有节点类型的Node.TEXT_NODE.

这是一个方法示例,它将返回您所需的内容

public static String getFirstLevelTextContent(Node node) {
    NodeList list = node.getChildNodes();
    StringBuilder textContent = new StringBuilder();
    for (int i = 0; i < list.getLength(); ++i) {
        Node child = list.item(i);
        if (child.getNodeType() == Node.TEXT_NODE)
            textContent.append(child.getTextContent());
    }
    return textContent.toString();
}

在您的示例中,这意味着:

String str = "<paragraph>" + //
        "<link>XML</link>" + //
        " is a " + //
        "<strong>browser based XML editor</strong>" + //
        "editor allows users to edit XML data in an intuitive word processor." + //
        "</paragraph>";
Document domDoc = null;
try {
    DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
    DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
    ByteArrayInputStream bis = new ByteArrayInputStream(str.getBytes());
    domDoc = docBuilder.parse(bis);
} catch (Exception e) {
    e.printStackTrace();
}
DocumentTraversal traversal = (DocumentTraversal) domDoc;
NodeIterator iterator = traversal.createNodeIterator(domDoc.getDocumentElement(), NodeFilter.SHOW_ELEMENT, null, true);
for (Node n = iterator.nextNode(); n != null; n = iterator.nextNode()) {
    String tagname = ((Element) n).getTagName();
    System.out.println(tagname + "=" + getFirstLevelTextContent(n));
}

Output:

paragraph= is a editor allows users to edit XML data in an intuitive word processor.
link=XML
strong=browser based XML editor

它的作用是迭代节点的所有子节点,仅保留文本(从而排除注释、节点等)并累积它们各自的文本内容。

没有直接的方法Node or Element仅获取第一级的文本内容。

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

Node.getTextContent() 有没有办法获取当前节点的文本内容,而不是后代的文本 的相关文章

随机推荐