beautifulsoup

使用 BeautifulSoup 在 python 中抓取多个页面

我已经设法编写代码来从第一页中抓取数据现在我不得不在这段代码中编写一个循环来抓取接下来的 n 页下面是代码如果有人可以指导帮助我编写从剩余页面中抓取数据的代码我将不胜感激 Thanks from bs4 import Beauti

python html webscraping beautifulsoup

Beautiful Soup 中 find_all 方法的返回类型是什么？

from bs4 import BeautifulSoup SoupStrainer from urllib request import urlopen import pandas as pd import numpy as np imp

python regex webscraping beautifulsoup

Nonetype 错误/使用 python 的 beautifulsoup 没有打印任何元素

所以我尝试使用 python 比较 2 个列表其中一个包含我从网站获取的 1000 个链接另一个包含一些单词这些单词可能包含在第一个列表的链接中如果是这种情况我想得到一个输出我打印了第一个列表它确实有效例如如果链接是 ht

python selenium seleniumwebdriver beautifulsoup

如何让Python bs4在XML上正常工作？

我正在尝试使用 Python 和 BeautifulSoup 4 bs4 将 Inkscape SVG 转换为某些专有软件的类似 XML 的格式我似乎无法让 bs4 正确解析一个最小的示例我需要解析器尊重自闭标签处理 unicode

python xml Unicode beautifulsoup

由于 bs4 与 BeautifulSoup 导致的导入错误

我正在尝试使用beautifulsoup兼容的lxml它给了我一个错误 from lxml html soupparser import fromstring Traceback most recent call last File

python lxml beautifulsoup

在需要身份验证的地方使用 BeautifulSoup

我正在使用 BeautifulSoup4 和 Python 请求为公司项目抓取 LAN 数据由于该网站有登录界面因此我无权访问数据登录界面是一个弹出窗口不允许我在没有登录的情况下访问页面源或检查页面元素我得到的错误是这样的访问错

python webscraping beautifulsoup lan intranet

BeautifulSoup：AttributeError：“NavigableString”对象没有属性“name”

你知道为什么 BeautifulSoup 教程中的第一个例子吗http www crummy com software BeautifulSoup documentation html QuickStart http www crummy

python beautifulsoup

在 Python 中使用 Selenium 进行导航并使用 BeautifulSoup 进行抓取

好的这就是我想要实现的目标调用带有动态过滤搜索结果列表的 URL 点击第一个搜索结果 5 页抓取标题段落和图像并将它们作为 json 对象存储在单独的文件中例如 Title 单个条目的标题元素 Content 各个条目的 DOM

python selenium Dynamic beautifulsoup pagination

Beautiful Soup 找不到我想要的 HTML 部分

我使用 BeautifulSoup 进行网页抓取已经有一段时间了这是我第一次遇到这样的问题我试图在代码中选择数字 101 172 但即使我使用 find 或 select 输出始终只有标签而不是数字我之前曾处理过类似的数据收集工作

python html webscraping beautifulsoup pythonbeautifultable

美丽汤无法“获取”完整网页

我正在使用 BeautifulSoup 来解析来自的一堆链接但它并没有提取我想要的所有链接为了尝试找出原因我将 html 下载到 web page html 并运行 soup BeautifulSoup open web page ht

python html webscraping beautifulsoup

使用 BeautifulSoup 抓取网页中的链接标题和 URL

我有一个流行文章的网页我想抓取每个引用网页的超链接及其所显示文章的标题我的脚本所需的输出是一个 CSV 文件其中在一行中列出了每个标题和文章内容因此如果该网页上有 50 篇文章我想要一个包含 50 行和 100 个数据点的文件

python html Text webscraping beautifulsoup

美汤元素如何添加元素

如果我有这样的 bs4 元素它被称为tab window uls 1 ul li b Cut b Sits low on the waist li li b Fit b Skinny through the leg li li b Leg

python beautifulsoup

HTTP 错误 999：请求被拒绝

我正在尝试使用 BeautifulSoup 从 LinkedIn 抓取一些网页但不断收到错误 HTTP 错误 999 请求被拒绝有没有办法避免这个错误如果您查看我的代码我尝试过 Mechanize 和 URLLIB2 两者都给了我相

python webscraping beautifulsoup linkedinapi mechanize

Python 请求：requests.exceptions.TooManyRedirects：超过 30 个重定向

我试图使用 python requests 库抓取此页面 import requests from lxml import etree html url http www amazon in b ref sa menu mobile ele

python python27 beautifulsoup pythonrequests

无法使用 Beautiful Soup 解析 html 表

我对使用 Beautiful Soup 很陌生我正在尝试从下面的 url 导入数据作为 pandas 数据框但是最终结果具有正确的列名称但没有行号我应该做什么呢这是我的代码 from bs4 import BeautifulSo

python html pandas Parsing beautifulsoup

使用 BeautifulSoup 进行网页抓取时，我可以接受或忽略 Google 隐私声明吗？

从控制台运行以下代码时我无法查看 Google 新闻页面的 HTML 我看到的 HTML 是 Google 隐私声明的 HTML 以在继续之前开头的 HTML from bs4 import BeautifulSoup import

python webscraping beautifulsoup

AttributeError: 'NoneType' 对象没有属性 'text' ，我不明白如何修复它

我正在尝试使用 python 读取文件并将每一行作为函数的参数我收到 AttributeError NoneType object has no attribute text 错误我不明白如何修复它 from bs4 import Be

python Parsing beautifulsoup pythonrequestshtml

Python 美丽汤论

我有这段代码使用 BeautifulSoup 从页面中获取一些文本 soup BeautifulSoup html body soup find div id body print body 我想将其作为一个可重用的函数它接受一些 ht

python beautifulsoup

加载巨大的 XML 文件并处理 MemoryError

我有一个非常大的 XML 文件准确地说是 20GB 是的我需要全部当我尝试加载该文件时收到此错误 Python 23358 malloc mmap size 140736680968192 failed error code 12

python xml beautifulsoup mediawiki

抓取和解析多页（aspx）表

我正在尝试搜集有关灰狗比赛的信息例如我想刮http www gbgb org uk RaceCard aspx dogName Hardwick 20Serena http www gbgb org uk RaceCard aspx d

python webscraping beautifulsoup