使用 Clojure 删除特定 XML 节点

2023-12-05

我有以下 XML 结构:

(def xmlstr
"<ROOT>
  <Items>
    <Item><Type>A</Type><Note>AA</Note></Item>
    <Item><Type>B</Type><Note>BB</Note></Item>
    <Item><Type>C</Type><Note>CC</Note></Item>
    <Item><Type>A</Type><Note>AA</Note></Item>
  </Items>
</ROOT>")

我想删除任何项目,如果它具有类型 B 或 C。结果应该类似于:

<ROOT>
  <Items>
    <Item><Type>A</Type><Note>AA</Note></Item>
    <Item><Type>A</Type><Note>AA</Note></Item>
  </Items>
</ROOT>

我发现使用 data.xml 和 data.xml.zip 查询此类结构非常简单,例如:

;; lein try org.clojure/data.xml org.clojure/data.zip
(def xmldoc (clojure.data.xml/parse-str xmlstr))
(def zipxml (clojure.zip/xml-zip xmldoc))

(clojure.data.zip.xml/xml-> zipxml :Items :Item [:Type "A"] :Note clojure.data.zip.xml/text)
;; => ("AA" "AA")

但没有找到类似的用于删除/编辑子项的声明性功能。


图珀洛图书馆可以使用轻松解决这个问题tupelo.forest。你可以找到GitHub 页面上的 API 文档。下面是使用您的示例的测试用例。

这里我们加载你的xml数据并首先将其转换为enlive,然后转换为nativetree使用的结构tupelo.forest:

(ns tst.tupelo.forest-examples
  (:use tupelo.forest tupelo.test )
  (:require
    [clojure.data.xml :as dx]
    [clojure.java.io :as io]
    [clojure.set :as cs]
    [net.cgrand.enlive-html :as en-html]
    [schema.core :as s]
    [tupelo.core :as t]
    [tupelo.string :as ts]))
(t/refer-tupelo)

; Discard any xml nodes of Type="A" or Type="B" (plus blank string nodes)
(dotest
  (with-forest (new-forest)
    (let [xml-str         "<ROOT>
                            <Items>
                              <Item><Type>A</Type><Note>AA1</Note></Item>
                              <Item><Type>B</Type><Note>BB1</Note></Item>
                              <Item><Type>C</Type><Note>CC1</Note></Item>
                              <Item><Type>A</Type><Note>AA2</Note></Item>
                            </Items>
                          </ROOT>"
          enlive-tree     (->> xml-str
                            java.io.StringReader.
                            en-html/html-resource
                            first)
          root-hid        (add-tree-enlive enlive-tree)
          tree-1          (hid->tree root-hid)

The hid后缀代表“十六进制 ID”,它是唯一的十六进制值,其作用类似于指向树中节点/叶子的指针。在这个阶段,我们刚刚加载了森林数据结构中的数据,创建tree-1看起来像:

 (is= tree-1
   {:attrs {:tag :ROOT},
    :kids  [{:attrs {:tag :tupelo.forest/raw},
             :value "\n                            "}
            {:attrs {:tag :Items},
             :kids  [{:attrs {:tag :tupelo.forest/raw},
                      :value "\n                              "}
                     {:attrs {:tag :Item},
                      :kids  [{:attrs {:tag :Type}, :value "A"}
                              {:attrs {:tag :Note}, :value "AA1"}]}
                     {:attrs {:tag :tupelo.forest/raw},
                      :value "\n                              "}
                     {:attrs {:tag :Item},
                      :kids  [{:attrs {:tag :Type}, :value "B"}
                              {:attrs {:tag :Note}, :value "BB1"}]}
                     {:attrs {:tag :tupelo.forest/raw},
                      :value "\n                              "}
                     {:attrs {:tag :Item},
                      :kids  [{:attrs {:tag :Type}, :value "C"}
                              {:attrs {:tag :Note}, :value "CC1"}]}
                     {:attrs {:tag :tupelo.forest/raw},
                      :value "\n                              "}
                     {:attrs {:tag :Item},
                      :kids  [{:attrs {:tag :Type}, :value "A"}
                              {:attrs {:tag :Note}, :value "AA2"}]}
                     {:attrs {:tag :tupelo.forest/raw},
                      :value "\n                            "}]}
            {:attrs {:tag :tupelo.forest/raw},
             :value "\n                          "}]})

接下来,我们使用以下代码删除所有空白字符串:

blank-leaf-hid? (fn [hid] (and (leaf-hid? hid) ; ensure it is a leaf node
                            (let [value (hid->value hid)]
                              (and (string? value)
                                (or (zero? (count value)) ; empty string
                                  (ts/whitespace? value)))))) ; all whitespace string

blank-leaf-hids (keep-if blank-leaf-hid? (all-hids))
>>              (apply remove-hid blank-leaf-hids)
tree-2          (hid->tree root-hid)

屈服tree-2看起来更整洁:

(is= tree-2
  {:attrs {:tag :ROOT},
   :kids  [{:attrs {:tag :Items},
            :kids  [{:attrs {:tag :Item},
                     :kids  [{:attrs {:tag :Type}, :value "A"}
                             {:attrs {:tag :Note}, :value "AA1"}]}
                    {:attrs {:tag :Item},
                     :kids  [{:attrs {:tag :Type}, :value "B"}
                             {:attrs {:tag :Note}, :value "BB1"}]}
                    {:attrs {:tag :Item},
                     :kids  [{:attrs {:tag :Type}, :value "C"}
                             {:attrs {:tag :Note}, :value "CC1"}]}
                    {:attrs {:tag :Item},
                     :kids  [{:attrs {:tag :Type}, :value "A"}
                             {:attrs {:tag :Note}, :value "AA2"}]}]}]})

最终的代码片段删除 Type="B" 或 Type="C" 节点:

type-bc-hid?    (fn [hid] (pos? (count (glue
                            (find-leaf-hids hid [:** :Type] "B")
                            (find-leaf-hids hid [:** :Type] "C")))))

type-bc-hids    (find-hids-with root-hid [:** :Item] type-bc-hid?)
>>              (apply remove-hid type-bc-hids)
tree-3          (hid->tree root-hid)
tree-3-hiccup   (hid->hiccup root-hid) ]

产生两者中显示的最终结果树tree格式和hiccup format:

(is= tree-3
  {:attrs {:tag :ROOT},
   :kids
          [{:attrs {:tag :Items},
            :kids  [{:attrs {:tag :Item},
                     :kids  [{:attrs {:tag :Type}, :value "A"}
                             {:attrs {:tag :Note}, :value "AA1"}]}
                    {:attrs {:tag :Item},
                     :kids  [{:attrs {:tag :Type}, :value "A"}
                             {:attrs {:tag :Note}, :value "AA2"}]}]}]})
(is= tree-3-hiccup
  [:ROOT
   [:Items
    [:Item [:Type "A"] [:Note "AA1"]]
    [:Item [:Type "A"] [:Note "AA2"]]]]))))

完整的例子可以找到in the forest-examples单元测试.

Update

这是删除了额外功能的最紧凑版本:

(dotest
  (with-forest (new-forest)
    (let [xml-str         "<ROOT>
                            <Items>
                              <Item><Type>A</Type><Note>AA1</Note></Item>
                              <Item><Type>B</Type><Note>BB1</Note></Item>
                              <Item><Type>C</Type><Note>CC1</Note></Item>
                              <Item><Type>A</Type><Note>AA2</Note></Item>
                            </Items>
                          </ROOT>"
          enlive-tree     (->> xml-str
                            java.io.StringReader.
                            en-html/xml-resource
                            first)
          root-hid        (add-tree-enlive enlive-tree)
          blank-leaf-hid? (fn [hid] (ts/whitespace? (hid->value hid)))
          has-bc-leaf?    (fn [hid] (or (has-child-leaf? hid [:** :Type] "B")
                                        (has-child-leaf? hid [:** :Type] "C")))
          blank-leaf-hids (keep-if blank-leaf-hid? (all-leaf-hids))
          >>              (apply remove-hid blank-leaf-hids)
          bc-item-hids    (find-hids-with root-hid [:** :Item] has-bc-leaf?)]
      (apply remove-hid bc-item-hids)
      (is= (hid->hiccup root-hid)
        [:ROOT
         [:Items
          [:Item [:Type "A"] [:Note "AA1"]]
          [:Item [:Type "A"] [:Note "AA2"]]]]))))
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

使用 Clojure 删除特定 XML 节点 的相关文章

随机推荐