我尝试使用名字和姓氏在网站上进行多次搜索
(https://npiregistry.cms.hhs.gov/registry/ https://npiregistry.cms.hhs.gov/registry/),然后创建输出的数据框
我发现这与中描述的类似,但由于某些原因我收到了错误
“错误:无法加载外部实体”`
下面是我用来提取记录的代码
fn = rep(c('HARVEY','HARVEY'));
ln = rep(c('BIDWELL','ADELSON'));
mydf = data.frame(fn,ln);
get_data = function(df){
library(XML);
root = 'http://npiregistry.cms.hhs.gov/'
u = paste(root,'registry/search-results-table?','first_name=', df$fn, '&last_name=',
df$ln, sep = "");
# encode url correctly
url = URLencode(u);
# extract data from the right table
data = readHTMLTable(url);
}
library(plyr)
mydata = adply(mydf, 1, get_data);
谢谢您的帮助
调用需要的是 https: 而不是 http:。我还删除了仅使用 R 基础的 plyr 库:
library(rvest)
fn = rep(c('HARVEY','HARVEY'));
ln = rep(c('BIDWELL','ADELSON'));
mydf = data.frame(fn,ln);
get_data = function(df){
root = 'https://npiregistry.cms.hhs.gov/'
u = paste(root,'registry/search-results-table?','first_name=', df[1], '&last_name=',
df[2], sep = "");
# encode url correctly
url = URLencode(u);
#print(url)
# extract data from the right table
data = read_html(url);
newresult<- html_nodes(data, "table")[1] %>%html_table()
# convert result into a data frame
newresult<-as.data.frame(newresult)
}
mydata = apply(mydf, 1, function(x) { get_data(x)})
#mydata is a list of data frames, do.call creates a single data.frame
finalanswer<-do.call(rbind, mydata)
#finalanswer needs some clean up.
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)