图珀洛图书馆可以使用轻松解决这个问题tupelo.forest
。你可以找到GitHub 页面上的 API 文档。下面是使用您的示例的测试用例。
这里我们加载你的xml数据并首先将其转换为enlive,然后转换为nativetree
使用的结构tupelo.forest
:
(ns tst.tupelo.forest-examples
(:use tupelo.forest tupelo.test )
(:require
[clojure.data.xml :as dx]
[clojure.java.io :as io]
[clojure.set :as cs]
[net.cgrand.enlive-html :as en-html]
[schema.core :as s]
[tupelo.core :as t]
[tupelo.string :as ts]))
(t/refer-tupelo)
; Discard any xml nodes of Type="A" or Type="B" (plus blank string nodes)
(dotest
(with-forest (new-forest)
(let [xml-str "<ROOT>
<Items>
<Item><Type>A</Type><Note>AA1</Note></Item>
<Item><Type>B</Type><Note>BB1</Note></Item>
<Item><Type>C</Type><Note>CC1</Note></Item>
<Item><Type>A</Type><Note>AA2</Note></Item>
</Items>
</ROOT>"
enlive-tree (->> xml-str
java.io.StringReader.
en-html/html-resource
first)
root-hid (add-tree-enlive enlive-tree)
tree-1 (hid->tree root-hid)
The hid
后缀代表“十六进制 ID”,它是唯一的十六进制值,其作用类似于指向树中节点/叶子的指针。在这个阶段,我们刚刚加载了森林数据结构中的数据,创建tree-1
看起来像:
(is= tree-1
{:attrs {:tag :ROOT},
:kids [{:attrs {:tag :tupelo.forest/raw},
:value "\n "}
{:attrs {:tag :Items},
:kids [{:attrs {:tag :tupelo.forest/raw},
:value "\n "}
{:attrs {:tag :Item},
:kids [{:attrs {:tag :Type}, :value "A"}
{:attrs {:tag :Note}, :value "AA1"}]}
{:attrs {:tag :tupelo.forest/raw},
:value "\n "}
{:attrs {:tag :Item},
:kids [{:attrs {:tag :Type}, :value "B"}
{:attrs {:tag :Note}, :value "BB1"}]}
{:attrs {:tag :tupelo.forest/raw},
:value "\n "}
{:attrs {:tag :Item},
:kids [{:attrs {:tag :Type}, :value "C"}
{:attrs {:tag :Note}, :value "CC1"}]}
{:attrs {:tag :tupelo.forest/raw},
:value "\n "}
{:attrs {:tag :Item},
:kids [{:attrs {:tag :Type}, :value "A"}
{:attrs {:tag :Note}, :value "AA2"}]}
{:attrs {:tag :tupelo.forest/raw},
:value "\n "}]}
{:attrs {:tag :tupelo.forest/raw},
:value "\n "}]})
接下来,我们使用以下代码删除所有空白字符串:
blank-leaf-hid? (fn [hid] (and (leaf-hid? hid) ; ensure it is a leaf node
(let [value (hid->value hid)]
(and (string? value)
(or (zero? (count value)) ; empty string
(ts/whitespace? value)))))) ; all whitespace string
blank-leaf-hids (keep-if blank-leaf-hid? (all-hids))
>> (apply remove-hid blank-leaf-hids)
tree-2 (hid->tree root-hid)
屈服tree-2
看起来更整洁:
(is= tree-2
{:attrs {:tag :ROOT},
:kids [{:attrs {:tag :Items},
:kids [{:attrs {:tag :Item},
:kids [{:attrs {:tag :Type}, :value "A"}
{:attrs {:tag :Note}, :value "AA1"}]}
{:attrs {:tag :Item},
:kids [{:attrs {:tag :Type}, :value "B"}
{:attrs {:tag :Note}, :value "BB1"}]}
{:attrs {:tag :Item},
:kids [{:attrs {:tag :Type}, :value "C"}
{:attrs {:tag :Note}, :value "CC1"}]}
{:attrs {:tag :Item},
:kids [{:attrs {:tag :Type}, :value "A"}
{:attrs {:tag :Note}, :value "AA2"}]}]}]})
最终的代码片段删除 Type="B" 或 Type="C" 节点:
type-bc-hid? (fn [hid] (pos? (count (glue
(find-leaf-hids hid [:** :Type] "B")
(find-leaf-hids hid [:** :Type] "C")))))
type-bc-hids (find-hids-with root-hid [:** :Item] type-bc-hid?)
>> (apply remove-hid type-bc-hids)
tree-3 (hid->tree root-hid)
tree-3-hiccup (hid->hiccup root-hid) ]
产生两者中显示的最终结果树tree
格式和hiccup
format:
(is= tree-3
{:attrs {:tag :ROOT},
:kids
[{:attrs {:tag :Items},
:kids [{:attrs {:tag :Item},
:kids [{:attrs {:tag :Type}, :value "A"}
{:attrs {:tag :Note}, :value "AA1"}]}
{:attrs {:tag :Item},
:kids [{:attrs {:tag :Type}, :value "A"}
{:attrs {:tag :Note}, :value "AA2"}]}]}]})
(is= tree-3-hiccup
[:ROOT
[:Items
[:Item [:Type "A"] [:Note "AA1"]]
[:Item [:Type "A"] [:Note "AA2"]]]]))))
完整的例子可以找到in the forest-examples单元测试.
Update
这是删除了额外功能的最紧凑版本:
(dotest
(with-forest (new-forest)
(let [xml-str "<ROOT>
<Items>
<Item><Type>A</Type><Note>AA1</Note></Item>
<Item><Type>B</Type><Note>BB1</Note></Item>
<Item><Type>C</Type><Note>CC1</Note></Item>
<Item><Type>A</Type><Note>AA2</Note></Item>
</Items>
</ROOT>"
enlive-tree (->> xml-str
java.io.StringReader.
en-html/xml-resource
first)
root-hid (add-tree-enlive enlive-tree)
blank-leaf-hid? (fn [hid] (ts/whitespace? (hid->value hid)))
has-bc-leaf? (fn [hid] (or (has-child-leaf? hid [:** :Type] "B")
(has-child-leaf? hid [:** :Type] "C")))
blank-leaf-hids (keep-if blank-leaf-hid? (all-leaf-hids))
>> (apply remove-hid blank-leaf-hids)
bc-item-hids (find-hids-with root-hid [:** :Item] has-bc-leaf?)]
(apply remove-hid bc-item-hids)
(is= (hid->hiccup root-hid)
[:ROOT
[:Items
[:Item [:Type "A"] [:Note "AA1"]]
[:Item [:Type "A"] [:Note "AA2"]]]]))))