您可以使用find_all()
搜索每一个<div>
元素与foo
作为属性并为每个属性使用find()
对于那些有bar
作为属性,例如:
from bs4 import BeautifulSoup
import sys
soup = BeautifulSoup(open(sys.argv[1], 'r'), 'html')
for foo in soup.find_all('div', attrs={'class': 'foo'}):
bar = foo.find('div', attrs={'class': 'bar'})
print(bar.text)
像这样运行它:
python3 script.py htmlfile
得出:
I want this
UPDATE: 假设可能存在多个<div>
元素与bar
属性,以前的脚本将不起作用。它只会找到第一个。但你可以得到它们的后代并迭代它们,例如:
from bs4 import BeautifulSoup
import sys
soup = BeautifulSoup(open(sys.argv[1], 'r'), 'html')
for foo in soup.find_all('div', attrs={'class': 'foo'}):
foo_descendants = foo.descendants
for d in foo_descendants:
if d.name == 'div' and d.get('class', '') == ['bar']:
print(d.text)
输入如下:
<div class="foo">
<div class="bar">I want this</div>
<div class="unwanted">Not this</div>
<div class="bar">Also want this</div>
</div>
它将产生:
I want this
Also want this