尝试这个:
from bs4 import BeautifulSoup as bs
html='''<div class="legend-block legend-block--pageviews">
<h5>Pageviews</h5><hr>
<div class="legend-block--body">
<div class="linear-legend--counts">
Pageviews:
<span class="pull-right">101,172
</span>
</div>
<div class="linear-legend--counts">
Daily average:
<span class="pull-right">
4,818
</span>
</div></div></div>'''
soup = bs(html, 'html.parser')
div = soup.find("div", {"class": "linear-legend--counts"})
span = div.find('span')
text = span.get_text()
print(text)
output:
101,172
简单地说:
soup = bs(html, 'html.parser')
result = soup.find("div", {"class": "linear-legend--counts"}).find('span').get_text()
EDIT:
由于OP发布了另一个问题,该问题可能与此问题重复,因此他找到了答案。对于正在寻找类似问题答案的人,我将发布该问题的已接受答案。可以找到here https://stackoverflow.com/a/51985365/5430055.
如果您使用 requests.get 检索页面,则 javascript 代码将不会被执行。因此应改用硒。它将模仿用户在浏览器中打开页面的行为,因此将执行 js 代码。
要启动 selenium,您需要安装pip install selenium
。然后要检索您的物品,请使用以下代码:
from selenium import webdriver
browser = webdriver.Firefox()
# List of the page url and selector of element to retrieve.
wiki_pages = [("https://tools.wmflabs.org/pageviews/?project=en.wikipedia.org&platform=all-access&agent=user&range=latest-20&pages=Star_Wars:_The_Last_Jedi",
".summary-column--container .legend-block--pageviews .linear-legend--counts:first-child span.pull-right"),]
for wiki_page in wiki_pages:
url = wiki_page[0]
selector = wiki_page[1]
browser.get(wiki_page)
page_views_count = browser.find_element_by_css_selector(selector)
print page_views_count.text
browser.quit()
NOTE:如果您需要运行无头浏览器,请考虑使用Py虚拟显示 https://pypi.org/project/PyVirtualDisplay/(一个包装器Xvfb https://en.wikipedia.org/wiki/Xvfb) 运行无头 WebDriver 测试,请参阅 '如何在 Xvfb 中运行 Selenium? https://stackoverflow.com/questions/6183276/how-do-i-run-selenium-in-xvfb' 了解更多信息。