考虑:
<div class="someClass">
<a href="href">
<img alt="some" src="some"/>
</a>
</div>
我想提取来源(即src)来自图像的属性(即img)使用 Beautiful Soup 标签。我使用 Beautiful Soup 4,但无法使用a.attrs['src']
得到src
,但我可以得到href
。我应该怎么办?
您可以使用 Beautiful Soup 来提取srcHTML 的属性img
标签。在我的例子中,htmlText
包含img
标签本身,但这也可以用于 URL,以及urllib2
.
For URLs
from BeautifulSoup import BeautifulSoup as BSHTML
import urllib2
page = urllib2.urlopen('http://www.youtube.com/')
soup = BSHTML(page)
images = soup.findAll('img')
for image in images:
# Print image source
print(image['src'])
# Print alternate text
print(image['alt'])
对于带有 img 标签的文本
from BeautifulSoup import BeautifulSoup as BSHTML
htmlText = """<img src="https://src1.com/" <img src="https://src2.com/" /> """
soup = BSHTML(htmlText)
images = soup.findAll('img')
for image in images:
print(image['src'])
Python 3:
from bs4 import BeautifulSoup as BSHTML
import urllib
page = urllib.request.urlopen('https://github.com/abushoeb/emotag')
soup = BSHTML(page)
images = soup.findAll('img')
for image in images:
# Print image source
print(image['src'])
# Print alternate text
print(image['alt'])
如果需要安装模块
# Python 3
pip install beautifulsoup4
pip install urllib3
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)