我正在解析 XML 文件getNodeSet()
。假设我有一个来自书店的 XML 文件,其中列出了 4 本书,但其中一本书缺少“作者”标签。
如果我使用以下方法解析 XML 中的标签“authors”data.nodes.2 <- getNodeSet(data,'//*/authors')
, R 返回 3 个元素的列表。
然而,这并不完全是我想要的。如何让“getNodeSet()”返回一个包含 4 个元素而不是 3 个元素的列表,即一个元素缺少值,其中标签“authors”不存在。
我很感激任何帮助。
library(XML)
file <- "<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?>\r\n<!-- Edited by XMLSpy® -->\r\n<bookstore>\r\n<book category=\"cooking\">\r\n<title lang=\"en\">Everyday Italian</title>\r\n<authors>\r\n<author>Giada De Laurentiis</author>\r\n</authors>\r\n<year>2005</year>\r\n<price>30.00</price>\r\n</book>\r\n<book category=\"children\">\r\n<title lang=\"en\">Harry Potter</title>\r\n<authors>\r\n<author>J K. Rowling</author>\r\n</authors>\r\n<year>2005</year>\r\n<price>29.99</price>\r\n</book>\r\n<book category=\"web\">\r\n<title lang=\"en\">XQuery Kick Start</title>\r\n<authors>\r\n<author>James McGovern</author>\r\n<author>Per Bothner</author>\r\n<author>Kurt Cagle</author>\r\n<author>James Linn</author>\r\n<author>Vaidyanathan Nagarajan</author>\r\n</authors>\r\n<year>2003</year>\r\n<price>49.99</price>\r\n</book>\r\n<book category=\"web\" cover=\"paperback\">\r\n<title lang=\"en\">Learning XML</title>\r\n\r\n<year>2003</year>\r\n<price>39.95</price>\r\n</book>\r\n</bookstore>"
data <- xmlParse(file)
data.nodes.1 <- getNodeSet(data,'//*/book')
data.nodes.2 <- getNodeSet(data,'//*/authors')
# Data
# <?xml version="1.0" encoding="ISO-8859-1"?>
# <!-- Edited by XMLSpy® -->
# <bookstore>
# <book category="cooking">
# <title lang="en">Everyday Italian</title>
# <authors>
# <author>Giada De Laurentiis</author>
# </authors>
# <year>2005</year>
# <price>30.00</price>
# </book>
# <book category="children">
# <title lang="en">Harry Potter</title>
# <authors>
# <author>J K. Rowling</author>
# </authors>
# <year>2005</year>
# <price>29.99</price>
# </book>
# <book category="web">
# <title lang="en">XQuery Kick Start</title>
# <authors>
# <author>James McGovern</author>
# <author>Per Bothner</author>
# <author>Kurt Cagle</author>
# <author>James Linn</author>
# <author>Vaidyanathan Nagarajan</author>
# </authors>
# <year>2003</year>
# <price>49.99</price>
# </book>
# <book category="web" cover="paperback">
# <title lang="en">Learning XML</title>
# <year>2003</year>
# <price>39.95</price>
# </book>
# </bookstore>
一种选择是使用 R 的列表处理从每个节点中提取作者
books <- getNodeSet(doc, "//book")
authors <- lapply(books, xpathSApply, ".//author", xmlValue)
authors[sapply(authors, is.list)] <- NA
并用书籍级别的信息来消化它
title <- sapply(books, xpathSApply, "string(.//title/text())")
giving
> data.frame(Title=rep(title, sapply(authors, length)),
+ Author=unlist(authors))
Title Author
1 Everyday Italian Giada De Laurentiis
2 Harry Potter J K. Rowling
3 XQuery Kick Start James McGovern
4 XQuery Kick Start Per Bothner
5 XQuery Kick Start Kurt Cagle
6 XQuery Kick Start James Linn
7 XQuery Kick Start Vaidyanathan Nagarajan
8 Learning XML <NA>
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)