beautifulsoup

BeautifulSoup find_all() 不返回任何数据

我对 Python 很陌生我最近的项目是从博彩网站抓取数据我想要抓取的是网页上的赔率信息这是我的代码 from urllib request import urlopen as uReq from bs4 import Beautif

python html webscraping beautifulsoup screenscraping

在 BeautifulSoup 中处理无限滚动 UI

我正在研究如何抓取 Linkedin 源代码 https www linkedin com mynetwork invite connect connections https www linkedin com mynetwork invi

python beautifulsoup

如何提取div标签中的强元素

我是网络抓取新手我正在使用 Python 来抓取数据有人可以帮助我如何从以下位置提取数据 div class dept strong LENGTH strong 15 credits div 我的输出应该是 LENGTH 15 cred

python webscraping beautifulsoup

使用 bs4 查找特定链接文本

我正在尝试抓取一个网站并找到提要的所有标题我在获取文本时遇到问题a我需要的标签这是 html 的示例 td class m a href QSYcfT target blank TF4 Oreos a a href font class

python html webscraping beautifulsoup

在理解 BeautifulSoup 过滤时遇到问题

有人可以解释一下美丽汤的过滤是如何工作的吗我得到了下面的 HTML 我正在尝试从中过滤特定数据但我似乎无法访问它我尝试过各种方法从收集所有class g是为了只抓取该特定 div 中感兴趣的项目但我只是没有返回或没有打印每个页面

python html beautifulsoup

UnicodeEncodeError：“ascii”编解码器无法对位置 20 中的字符 u'\xa0' 进行编码：序号不在范围内（128）

我在处理从不同网页在不同站点上获取的文本中的 unicode 字符时遇到问题我正在使用美丽汤问题是错误并不总是可重现的它有时可以处理某些页面有时它会通过抛出一个UnicodeEncodeError 我已经尝试了几乎所有我能想到

python Unicode beautifulsoup python2x pythonunicode

无需安装即可使用/导入 Beautiful Soup 4

正如美丽汤文档所说如果一切都失败了 Beautiful Soup 的许可证允许您将整个库与您的应用程序打包在一起您可以下载 tarball 将其 bs4 目录复制到应用程序的代码库中然后使用 Beautiful Soup 而无需安装它

python beautifulsoup

如何从 Google 新闻 RSS 中抓取 Google 新闻文章内容？

将来可能还很遥远因为我还是个新手我想做数据分析基于我从Google News RSS获得的新闻内容但为此我需要访问该内容这就是我的问题使用网址 https news google cl news rss https news

python webscraping beautifulsoup RSS

使用 Python 从 Javascript 中提取文本

我一直在查看如何执行此操作的示例但不太明白我正在使用 beautifulsoup 来抓取一些数据我可以使用它来查找我想要的数据但它包含在以下代码块中我正在尝试从中提取时间戳信息我有一种感觉正则表达式在这里工作但我似乎无法弄清楚

javascript python beautifulsoup

beautifulsoup，找到包含文本“价格”的第一个，然后从下一个获取价格

我的 html 看起来像 td table tr th price th th 99 99 th tr table td 那么我在当前表格单元格中如何获得 99 99 值到目前为止我有 td 3 findChild th 但我需要这样做

python beautifulsoup

如何使用 BeautifulSoup 抓取页面？页面源与检查元素不匹配

我正在尝试从中抓取一些东西这个梦幻篮球页面 http fantasy espn com basketball league scoreboard leagueId 633975 我使用 Python 3 5 中的 BeautifulSoup

python webscraping beautifulsoup

如何使用 BeautifulSoup 获取标签内的 html 文本

如何从示例 HTML 中提取数据beautifulsoup

python html python3x beautifulsoup

如何将 BeautifulSoup.ResultSet 转换为字符串

所以我解析了一个html页面 findAll BeautifulSoup 到名为变量result 如果我输入result在 Python shell 中然后按 Enter 我看到了预期的普通文本但是当我想将此结果作为字符串对象进行后处理时

python Unicode beautifulsoup

属性错误：“NoneType”对象没有属性“parent”

from urllib request import urlopen from bs4 import BeautifulSoup html urlopen http www pythonscraping com pages page3 ht

python webscraping beautifulsoup urllib AttributeError

删除 BeautifulSoup 分解后变空的行

我试图从文件中删除某些 HTML 标签及其内容BeautifulSoup 如何删除应用后变为空的行decompose 在这个例子中我想要之间的线a and 3消失因为这是 span span 块了但到底不行 from bs4 impo

python beautifulsoup

Python BeautifulSoup 为 findAll 提供多个标签

我正在寻找一种使用 findAll 来获取两个标签的方法按照它们在页面上出现的顺序目前我有 import requests import BeautifulSoup def get soup url request requests g

python beautifulsoup

在 BeautifulSoup 中匹配部分 id

我在用着美丽汤我必须找到任何参考 div 带有 id 的标签如 post 例如 div div div div 我努力了 html div div div div soupHandler BeautifulSoup html print

python beautifulsoup

使用BeautifulSoup提取两个节点之间的兄弟节点

我有一个这样的文档 p class top I don t want this p p I want this p table table img p and all that stuff too p p class p

python beautifulsoup

从 url 不变的网站中抓取响应表

我希望从该网站抓取价格历史记录单击价格历史记录按钮后表格将被加载但网址保持不变我想刮掉桌子上的负载 import requests from bs4 import BeautifulSoup rr requests get url

python webscraping beautifulsoup request

BeautifulSoup - findAll 不在特定标签内

因此我试图找到一种方法来查找 BeautifulSoup 对象中具有某个标签但不在某个其他标签内的所有项目例如 td class disabled first div class dayContainer p class day 29

python beautifulsoup