我正在尝试使用 XML2 包从 ESPN.com 中抓取一些表格。举例来说,我想将第 7 周的梦幻四分卫排名抓取到 R 中,其 URL 为:
http://www.espn.com/fantasy/football/story/_/page/16ranksWeek7QB/fantasy-football-week-7-quarterback-rankings http://www.espn.com/fantasy/football/story/_/page/16ranksWeek7QB/fantasy-football-week-7-quarterback-rankings
我尝试使用“read_html()”函数来执行此操作,因为这是我最熟悉的。这是我的语法及其错误:
> wk.7.qb.rk = read_html("www.espn.com/fantasy/football/story/_/page/16ranksWeek7QB/fantasy-football-week-7-rankings-quarterbacks", which = 1)
Error: 'www.espn.com/fantasy/football/story/_/page/16ranksWeek7QB/fantasy-football-week-7-rankings-quarterbacks' does not exist in current working directory ('C:/Users/Brandon/Documents/Fantasy/Football/Daily').
我也尝试过“read_xml()”,但得到了同样的错误:
> wk.7.qb.rk = read_xml("www.espn.com/fantasy/football/story/_/page/16ranksWeek7QB/fantasy-football-week-7-rankings-quarterbacks", which = 1)
Error: 'www.espn.com/fantasy/football/story/_/page/16ranksWeek7QB/fantasy-football-week-7-rankings-quarterbacks' does not exist in current working directory ('C:/Users/Brandon/Documents/Fantasy/Football/Daily').
为什么 R 在工作目录中寻找这个 URL?我已经用其他 URL 尝试过此功能并取得了一些成功。这个特定 URL 的什么原因使它看起来与其他 URL 的位置不同?而且,我该如何改变呢?
当我循环运行 read_html 以浏览 20 个页面时,出现此错误。在第 20 页之后,循环仍在运行,没有 url,因此它开始使用 NA 调用 read_html 进行其他循环迭代。希望这会有所帮助!
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)