Excel's FILTERXML
uses XPATH 1.0
不幸的是,这意味着它并不像我们希望的那样多样化。另外,Excel 似乎not允许返回修改后的节点值,并且专门允许您按出现顺序选择节点。然而,我们仍然可以使用相当一部分功能。可以找到更多相关信息here https://www.w3.org/TR/1999/REC-xpath-19991116/.
该函数有两个参数:=FILTERXML(<A string in valid XML format>,<A string in valid XPATH format>)
比如说细胞A1
保存字符串:ABC|123|DEF|456|XY-1A|ZY-2F|XY-3F|XY-4f|xyz|123
。为了创建有效的 XML 字符串,我们使用SUBSTITUTE
将分隔符更改为有效的结束标记和开始标记结构。因此,要为给定示例获取有效的 XML 构造,我们可以执行以下操作:
"<t><s>"&SUBSTITUTE(A1,"|","</s><s>")&"</s></t>"
出于可读性原因,我将使用单词来引用上述结构<XML>
as a 占位符。下面你会发现不同的有用XPATH
有效构造中的函数用于过滤节点:
1) All Elements:
=FILTERXML(<XML>,"//s")
Returns: ABC
, 123
, DEF
, 456
, XY-1A
, ZY-2F
, XY-3F
, XY-4f
, xyz
and 123
(all nodes)
2) Elements by position:
=FILTERXML(<XML>,"//s[position()=4]")
Or:
=FILTERXML(<XML>,"//s[4]")
Returns: 456
(node on index 4)†
=FILTERXML(<XML>,"//s[position()<4]")
Returns: ABC
, 123
and DEF
(nodes on index < 4)
=FILTERXML(<XML>,"//s[position()=2 or position()>5]")
Returns: 123
, ZY-2F
, XY-3F
, XY-4f
, xyz
and 123
(nodes on index 2 or > 5)
=FILTERXML(<XML>,"//s[last()]")
Returns: 123
(node on last index)
=FILTERXML(<XML>,"//s[position() mod 2 = 1]")
Returns: ABC
, DEF
, XY-1A
, XY-3F
and xyz
(odd nodes)
=FILTERXML(<XML>,"//s[position() mod 2 = 0]")
Returns: 123
, 456
, ZF-2F
, XY-4f
and 123
(even nodes)
3) (Non) numeric elements:
=FILTERXML(<XML>,"//s[number()=.]")
Or:
=FILTERXML(<XML>,"//s[.*0=0]")
Returns: 123
, 456
, and 123
(numeric nodes)
=FILTERXML(<XML>,"//s[not(number()=.)]")
Or:
=FILTERXML(<XML>,"//s[.*0!=0)]")
Returns: ABC
, DEF
, XY-1A
, ZY-2F
, XY-3F
, XY-4f
and xyz
(non-numeric nodes)
4) Elements that (not) contain:
=FILTERXML(<XML>,"//s[contains(., 'Y')]")
Returns: XY-1A
, ZY-2F
, XY-3F
and XY-4f
(containing 'Y', notice XPATH
is case sensitive, exclusing xyz
)
=FILTERXML(<XML>,"//s[not(contains(., 'Y'))]")
Returns: ABC
, 123
, DEF
, 456
, xyz
and 123
(not containing 'Y', notice XPATH
is case sensitive, including xyz
)
5) Elements that (not) start or/and end with:
=FILTERXML(<XML>,"//s[starts-with(., 'XY')]")
Returns: XY-1A
, XY-3F
and XY-4f
(starting with 'XY')
=FILTERXML(<XML>,"//s[not(starts-with(., 'XY'))]")
Returns: ABC
, 123
, DEF
, 456
, ZY-2F
, xyz
and 123
(don't start with 'XY')
=FILTERXML(<XML>,"//s[substring(., string-length(.) - string-length('F') +1) = 'F']")
Returns: DEF
, ZY-2F
and XY-3F
(end with 'F', notice XPATH 1.0
does not support ends-with
)
=FILTERXML(<XML>,"//s[not(substring(., string-length(.) - string-length('F') +1) = 'F')]")
Returns: ABC
, 123
, 456
, XY-1A
, XY-4f
, xyz
and 123
(don't end with 'F')
=FILTERXML(<XML>,"//s[starts-with(., 'X') and substring(., string-length(.) - string-length('A') +1) = 'A']")
Returns: XY-1A
(start with 'X' and end with 'A')
6) Elements that are upper- or lowercase:
=FILTERXML(<XML>,"//s[translate(.,'abcdefghijklmnopqrstuvwxyz','ABCDEFGHIJKLMNOPQRSTUVWXYZ')=.]")
Returns: ABC
, 123
, DEF
, 456
, XY-1A
, ZY-2F
, XY-3F
and 123
(uppercase nodes)
=FILTERXML(<XML>,"//s[translate(.,'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz')=.]")
Returns: 123
, 456
, xyz
and 123
(lowercase nodes)
NOTE: Unfortunately XPATH 1.0
does not support upper-case()
nor lower-case()
so the above is a workaround. Add special characters if need be.
7) Elements that (not) contain any number:
=FILTERXML(<XML>,"//s[translate(.,'1234567890','')!=.]")
Returns: 123
, 456
, XY-1A
, ZY-2F
, XY-3F
, XY-4f
and 123
(contain any digit)
=FILTERXML(<XML>,"//s[translate(.,'1234567890','')=.]")
Returns: ABC
, DEF
and xyz
(don't contain any digit)
=FILTERXML(<XML>,"//s[translate(.,'1234567890','')!=. and .*0!=0]")
Returns: XY-1A
, ZY-2F
, XY-3F
and XY-4f
(holding digits but not a a number on it's own)
8) Unique elements or duplicates:
=FILTERXML(<XML>,"//s[preceding::*=.]")
Returns: 123
(duplicate nodes)
=FILTERXML(<XML>,"//s[not(preceding::*=.)]")
Returns: ABC
, 123
, DEF
, 456
, XY-1A
, ZY-2F
, XY-3F
, XY-4f
and xyz
(unique nodes)
=FILTERXML(<XML>,"//s[not(following::*=. or preceding::*=.)]")
Returns: ABC
, DEF
, 456
, XY-1A
, ZY-2F
, XY-3F
and XY-4f
(nodes that have no similar sibling)
9) Elements of certain length:
=FILTERXML(<XML>,"//s[string-length()=5]")
Returns: XY-1A
, ZY-2F
, XY-3F
and XY-4f
(5 characters long)
=FILTERXML(<XML>,"//s[string-length()<4]")
Returns: ABC
, 123
, DEF
, 456
, xyz
and 123
(shorter than 4 characters)
10) Elements based on preceding/following:
=FILTERXML(<XML>,"//s[preceding::*[1]='456']")
Returns: XY-1A
(previous node equals '456')
=FILTERXML(<XML>,"//s[starts-with(preceding::*[1],'XY')]")
Returns: ZY-2F
, XY-4f
, and xyz
(previous node starts with 'XY')
=FILTERXML(<XML>,"//s[following::*[1]='123']")
Returns: ABC
, and xyz
(following node equals '123')
=FILTERXML(<XML>,"//s[contains(following::*[1],'1')]")
Returns: ABC
, 456
, and xyz
(following node contains '1')
=FILTERXML(<XML>,"//s[preceding::*='ABC' and following::*='XY-3F']")
Or:
=FILTERXML(<XML>,"//s[.='ABC']/following::s[following::s='XY-3F']")
Returns: 123
, DEF
, 456
, XY-1A
and ZY-2F
(everything between 'ABC' and 'XY-3F')
11) Elements based on sub-strings:
=FILTERXML(<XML>,"//s[substring-after(., '-') = '3F']")
Returns: XY-3F
(nodes ending with '3F' after hyphen)
=FILTERXML(<XML>,"//s[contains(substring-after(., '-') , 'F')]")
Returns: ZY-2F
and XY-3F
(nodes containing 'F' after hyphen)
=FILTERXML(<XML>,"//s[substring-before(., '-') = 'ZY']")
Returns: ZY-2F
(nodes starting with 'ZY' before hyphen)
=FILTERXML(<XML>,"//s[contains(substring-before(., '-'), 'Y')]")
Returns: XY-1A
, ZY-2F
, XY-3F
and XY-4f
(nodes containing 'Y' before hyphen)
12) Elements based on concatenation:
=FILTERXML(<XML>,"//s[concat(., '|', following::*[1])='ZY-2F|XY-3F']")
Returns: ZY-2F
(nodes when concatenated with '|' and following sibling equals 'ZY-2F|XY-3F')
=FILTERXML(<XML>,"//s[contains(concat(., preceding::*[2]), 'FA')]")
Returns: DEF
(nodes when concatenated with sibling two indices to the left contains 'FA')
13) Empty vs. Non-empty:
=FILTERXML(<XML>,"//s[count(node())>0]")
Or:
=FILTERXML(<XML>,"//s[node()]")
Returns: ABC
, 123
, DEF
, 456
, XY-1A
, ZY-2F
, XY-3F
, XY-4f
, xyz
and 123
(all nodes that are not empty)
=FILTERXML(<XML>,"//s[count(node())=0]")
Or:
=FILTERXML(<XML>,"//s[not(node())]")
Returns: None (all nodes that are empty)
14) Preceding or Following:
=FILTERXML(<XML>,"//s[substring(., string-length(.) - string-length('F') +1) = 'F'][last()]/following::*")
Returns: XY-4f
, xyz
and 123
(all nodes to the right of the last node that ends with an uppercase 'F')
=FILTERXML(<XML>,"//s[substring(., string-length(.) - string-length('F') +1) = 'F'][1]/preceding::*")
Returns: ABC
and 123
(all nodes to the left of the first node that ends with an uppercase 'F')
15) (Preceding or Following) and self:
=FILTERXML(<XML>,"(//s[.*0!=0][last()]|//s[.*0!=0][last()]/preceding::*)")
Returns: ABC
, 123
, DEF
, 456
, XY-1A
, ZY-2F
, XY-3F
, XY-4f
and xyz
(trim all numeric nodes from the right)††
=FILTERXML(<XML>,"(//s[.*0=0][1]|//s[.*0=0][1]/following::*)")
Returns: 123
, DEF
, 456
, XY-1A
, ZY-2F
, XY-3F
, XY-4f
, xyz
and 123
(trim all non-numeric nodes from the left)
16) Maximum or Minimum:
=FILTERXML(<XML>,"(//s[.*0=0][not(.<//s[.*0=0])])[1]")
Returns: 456
(The maximum value looking at numeric nodes)
=FILTERXML(<XML>,"(//s[.*0=0][not(.>//s[.*0=0])])[1]")
Returns: 123
(The minimum value looking at numeric nodes)
NOTE: This is the equivalent to returning all numeric nodes as per #3 and post-process the array using Excel's MIN()
and MAX()
functions.
现在显然上面是可能性的演示XPATH 1.0
功能,您可以获得上述及更多功能的一系列组合!我试图涵盖最常用的字符串函数。如果您有遗漏,请随时评论。
尽管问题本身相当广泛,但我希望就如何使用提供一些总体指导FILTERXML
对于手头的查询。该公式返回要以任何其他方式使用的节点数组。很多时候我会用它TEXTJOIN()
or INDEX()
。但我想其他选项将是新的 DA 功能来泄露结果。
请注意,在解析字符串时FILTERXML()
、与号字符 (&) 和左尖括号 (not以字面形式出现。它们分别需要替换为&
or <
。另一种选择是使用 ISO/IEC 10646 数字字符code https://m.tonysweb.be/ISO_IEC%2010646-12000(E).html being &
or <
分别。解析后,该函数将以文字形式将这些字符返回给您。不用说,用分号分割字符串就变得很棘手。
‡ Each predicate, the structure between the opening and closing square brackets, is a filter of a given nodelist. To write multiple of these structures is in fact anding https://stackoverflow.com/questions/62969654/xpath-what-do-nested-square-brackets-mean/62970207#62970207 such predicates.
‡‡ There isn't really an easy, following/preceding siblings and self, construct. Therefor I used the Union https://examples.javacodegeeks.com/core-java/xml/xpath/xpath-union-operator-example/ operator. This, however, requires multiple expressions to be inside paranthesis https://stackoverflow.com/a/26598269/9758194. Much like alternations within a capture group if one would think about regular expressions.