我是网络抓取新手。我正在使用 Python 来抓取数据。
有人可以帮助我如何从以下位置提取数据:
<div class="dept"><strong>LENGTH:</strong> 15 credits</div>
我的输出应该是 LENGTH:15 credits
这是我的代码:
from urllib.request import urlopen
from bs4 import BeautifulSoup
length=bsObj.findAll("strong")
for leng in length:
print(leng.text,leng.next_sibling)
Output:
DELIVERY: Campus
LENGTH: 2 years
OFFERED BY: Olin Business School
但我只想有长度。
网站:http://www.mastersindatascience.org/specialties/business-analytics/ http://www.mastersindatascience.org/specialties/business-analytics/
您应该稍微改进您的代码以找到strong
元素by text:
soup.find("strong", text="LENGTH:").next_sibling
或者,对于多个长度:
for length in soup.find_all("strong", text="LENGTH:"):
print(length.next_sibling.strip())
Demo:
>>> import requests
>>> from bs4 import BeautifulSoup
>>>
>>> url = "http://www.mastersindatascience.org/specialties/business-analytics/"
>>> response = requests.get(url)
>>> soup = BeautifulSoup(response.content, "html.parser")
>>> for length in soup.find_all("strong", text="LENGTH:"):
... print(length.next_sibling.strip())
...
33 credit hours
15 months
48 Credits
...
12 months
1 year
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)