beautifulsoup

Python get 请求返回与查看源代码不同的 HTML

我正在尝试从我们自己的 URL 存档中提取同人小说以便使用 NLTK 库对其进行一些语言分析然而每次从 URL 中抓取 HTML 的尝试都会返回除同人小说以及我不需要的评论表单之外的所有内容首先我尝试使用内置的 urllib 库

python selenium webscraping beautifulsoup PhantomJS

抓取大量带有 url 的 Google Scholar 页面

我正在尝试使用 BeautifulSoup 从 Google 学者的作者那里获取所有出版物的完整作者列表由于作者的主页只有每篇论文的作者列表因此我必须打开论文的链接才能获取完整列表结果我每隔几次尝试就会遇到验证码有没有办法避免验证

webscraping beautifulsoup captcha googlescholar

python 中的字数统计

我想计算从网站上获取的文本的字数我正在尝试下面的代码 import requests from bs4 import BeautifulSoup from urllib request import urlopen def get tex

python URL beautifulsoup htmlparsing wordcount

Requests.content 与 Chrome 检查元素不匹配

我正在使用 BeautifulSoup 和 Requests 来抓取所有食谱用户数据当检查 HTML 代码时我发现我想要的数据包含在

python html beautifulsoup pythonrequests

抓取
标签和包含链接的数据列表时出现问题

这是我用 Python Beautifulsoup 抓取的 HTML 示例 dl dd strong a href http www eslcafe com jobs china index cgi read 45790 Monthly 1

python beautifulsoup

在
中查找特定的

我尝试从此 HTML 中提取价格 2 890 000K 和地址有 12 个相同的 div class list items content list items content 1 div class list items conten

python beautifulsoup

BeautifulSoup 获取 href [重复]

这个问题在这里已经有答案了我有以下内容soup a href some url next a span class class span 我想从中提取 href some url 如果我只有一个标签就可以做到但这里有两个标签我也能得到

python tags beautifulsoup

从 BeautifulSoup 中的 JSON 对象中解析出特定值

import urllib from urllib import request from bs4 import BeautifulSoup url http mygene info v3 query q symbol CDK2 speci

json python3x Parsing beautifulsoup

BeautifulSoup 未提取所有 html

我们正在尝试从 Forever 21 网站的此页面获取产品网址由于某种原因 BeautifulSoup 没有获取类为 item pic 的元素即使它们位于站点 html 中我们尝试过使用 requests mechanize sele

python beautifulsoup mechanize urllib

Perl html 解析 lib/工具

是否有一些强大的 perl 工具库例如 BeautifulSoup 到 python Thanks HTML TreeBuilder XPath http p3rl org HTML 3a 3aTreeBuilder 3a 3aXPat

perl beautifulsoup

BeautifulSoup 中有等效的 InnerText 吗？

使用下面的代码 soup BeautifulSoup page read fromEncoding utf 8 result soup find div class flagPageTitle 我得到以下 html div class fl

python beautifulsoup

utf-8字符编码问题

我通过使用美丽的汤库从网页获得链接a get href 链接中有一个奇怪的字符但当我得到它时它变成了我怎样才能正确地对其进行编码我已经在页面开头添加了 coding utf 8 r requests get url soup Bea

python UTF8 beautifulsoup pythonrequests mojibake

需要使用“显示更多”按钮从网页中抓取信息，有什么建议吗？

目前出于教育原因正在开发爬虫一切工作正常我可以提取 url 和信息并将其保存在 json 文件中一切都很好除了该页面有一个加载更多按钮我需要与之交互以便爬虫继续寻找更多网址这就是我可以利用你们这些出色的男孩和女孩的地方

python Web webscraping beautifulsoup screenscraping

BeautifulSoup - 如何单独查找特定的类名

如何找到li带有特定类名而不是其他类名的标签例如 li no wanted li li class a not his one li li class a z neither this one li li class b z neithe

python beautifulsoup

Python Web 抓取（Beautiful Soup、Selenium 和 PhantomJS）：仅抓取整个页面的一部分

您好我在尝试从网站上抓取数据以进行建模时遇到问题 fantsylabs dotcom 我只是一个黑客所以请原谅我对计算机术语的无知我想要完成的是使用selenium登录网站并导航到有数据的页面 Initialize and load

python27 selenium webscraping beautifulsoup PhantomJS

Python 网页抓取被阻止

我想抓取德国房地产网站 immobilienscout24 de 的网页我想下载给定 URL 的 HTML 然后离线使用该 HTML 它不适合商业用途或出版我也不打算向该网站发送垃圾邮件它只是用于编码练习我想编写一个 python

python webscraping beautifulsoup proxy

如何用BeautifulSoup找到评论标签？

我尝试了 soup find 但它似乎不起作用提前致谢编辑感谢您提供有关如何查找所有评论的提示我有一个后续问题具体如何搜索评论例如我有以下评论标签我真的只想要这个东西 i Wednesday 110518 i 110518

python html tags beautifulsoup

使用 BS4“lxml”抓取 XML 数据

尝试解决与此非常相似的问题使用 beautifulsoup 抓取 XML 元素属性 https stackoverflow com questions 37968565 scraping xml element attributes wi

python python3x beautifulsoup lxml elementtree

无法使用 urllib2 从网络保存图像

我想使用 python 保存网站上的一些图像urllib2但是当我运行代码时它会保存其他东西这是我的代码 user agent Mozilla 4 0 compatible MSIE 5 5 Windows NT headers User

python python27 beautifulsoup urllib2

Python - 从网站抓取数据时重音字符的问题

我是 Nicola 一名 Python 新用户没有真正的计算机编程背景因此我确实需要一些帮助来解决我遇到的问题我编写了一段代码来从此网页抓取数据基本上我的代码的目标是从页面中的所有表中抓取数据并将它们写入 txt 文件中这里我

python Unicode beautifulsoup webscraping diacritics