我想获取嵌套标签内的数字。我该怎么做?
我的代码输出这个,但我想得到 #40,而不是整两行:
<span class="rankings-score">
<span>#40</span>
这是我的代码:
from bs4 import BeautifulSoup
import requests
import csv
site = "http://www.usnews.com/education/best-high-schools/national-rankings/page+2"
fields = ['national_rank','school','address','school_page','medal','ratio','size_desc','students','teachers']
r = requests.get(site)
html_source = r.text
soup = BeautifulSoup(html_source)
table = soup.find('table')
rows_list = []
for row in table.find_all('tr'):
d = dict()
d['national_rank'] = row.find("span", 'rankings-score')
print d['national_rank']
我收到此错误:
AttributeError: 'NoneType' object has no attribute 'span'
当我尝试这个时:
d['national_rank'] = row.find("span", 'rankings-score').span.text
访问嵌套范围的文本:
score_span = row.find("span", 'rankings-score')
if score_span is not None:
print score_span.span.text
你需要确保row.find("span", 'rankings-score')
确实发现了一些东西;上面我测试了那里is确实是一个<span>
found.
The .find()
方法返回None
如果没有找到匹配的对象,那么一般来说,每当你得到一个AttributeError: 'NoneType' object has no attribute ...
异常,涉及您尝试加载的对象Element.find()
,那么你需要测试None
before试图进一步获取信息。
这适用于object.find
, object.find_all
, object[...]
标签属性访问,object.<tagname>
, object.select
等等等等。
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)