beautifulsoup

使用 Beautiful Soup 从非类部分获取数据

我还是个新手正在学习 python 和 beautiful soup 我一直困扰于如何从非类 HTML 片段中获取文本这是我正在使用的 HTML 片段 section class userbody section

python Parsing python27 htmlparsing beautifulsoup

Python - 请求/RoboBrowser - ASPX POST JavaScript

我正在移植一个 bash 脚本该脚本使用curl 并将代码中的有效负载 POST 到 URL 并且可以正常工作基本问题是使用 robobrowser 我在使用页面表单发布时遇到了麻烦逐步浏览该网站登录 SubLogin aspx

javascript python aspnet beautifulsoup robobrowser

名称错误“html”未使用 beautifulsoup4 定义

我的 python 3 4 4 代码是 import urllib request from bs4 import BeautifulSoup from html parser import HTMLParser urls file C U

python html python3x beautifulsoup

Python - beautifulsoup，应用于文件夹中的每个文本文件并生成新的文本文件

我正在使用以下 Python Beautifulsoup 代码从文本文件中删除 html 元素 from bs4 import BeautifulSoup with open textFileWithHtml txt as markup s

python html tags beautifulsoup

计算div标签的平均高度和平均宽度

我需要获取 html 文档的平均 div 高度和宽度我已经尝试过这个解决方案但它不起作用 import numpy as np average width np mean div attrs width for div in my do

python html beautifulsoup

比“尝试”和“例外”更快的方法？ - Python

我经常将代码写成如下 try self title item title content string except AttributeError e self title None 有没有更快的方法来处理这个问题一行您遇到了哪些例外情

python beautifulsoup

'NoneType' 对象在 BeautifulSoup 中没有属性 'text'

当我搜索时我试图抓取谷歌结果什么是2 2 但返回以下代码 NoneType object has no attribute text 请帮助我实现所需的目标 text What is 2 2 search text replace l

python webscraping beautifulsoup

使用请求登录有问题的站点

我正在尝试使用 requests 模块在 python 中创建一个脚本来登录到此site 我正在使用我的凭据但我找不到任何方法来这样做因为我看不到与请求一起发送所需的参数在 Chrome 开发工具中 username SIMMTH i

python python3x webscraping beautifulsoup pythonrequests

如何使用python通过beautifulsoup中的lxml从网页中提取img src？

我是 python 新手正在从事亚马逊的网页抓取项目我在如何使用 BeautifulSoup 通过 lxml 从产品页面中提取产品 img src 时遇到问题我尝试使用以下代码来提取它但它没有显示 img 的 url 这是我的代码

python3x webscraping beautifulsoup lxml

Headless 无法使用 Playwright 和 BeautifulSoup 4 工作

这段代码正在运行 from playwright sync api import sync playwright from bs4 import BeautifulSoup from datetime import datetime imp

python beautifulsoup Headless Playwright playwrightpython

使用 BeautifulSoup 仅从 blogspot 提取特定部分的链接

我正在尝试仅从 Blogspot 中提取某些部分的链接但输出显示代码提取了页面内的所有链接这是代码 import urlparse import urllib from bs4 import BeautifulSoup url http

python beautifulsoup webcrawler

如何修复 AttributeError: 'NoneType' 对象没有属性 'text'...循环时

我是初学者这个论坛上的答案非常宝贵我正在使用 Python 3 和 Beautiful Soup 通过循环页码从同一网站上的多个网页中抓取非表数据它有效但我不断收到 AttributeError NoneType object

python3x pandas webscraping beautifulsoup

使用 Jinja 过滤器创建内容片段

我想为我的主页创建内容片段示例帖子看起来像 p Your favorite Harry Potter characters enter the Game of Thrones universe and you ll never guess

python Flask beautifulsoup Jinja2

Python 将 html 转换为文本并模仿格式

我正在学习 BeautifulSoup 并发现了许多 html2text 解决方案但我正在寻找的解决方案应该模仿格式 ul li One li li Two li ul 会成为 One Two and Some text blockquo

python html beautifulsoup

Python Beautiful Soup 如何将 JSON 解码为“dict”？

我是 Python 中 BeautifulSoup 的新手我正在尝试提取dict来自美丽汤我使用 BeautifulSoup 提取 JSON 并得到beautifulsoup beautifulsoup多变的soup 我试图从中获取价值

python webscraping beautifulsoup

在 Asyncio Web 抓取应用程序中将 BeautifulSoup 代码放在哪里

我需要抓取并获取许多每天 5 10k 新闻文章的正文段落的原始文本我已经编写了一些线程代码但考虑到这个项目的高度 I O 绑定性质我正在涉足asyncio 下面的代码片段并不比 1 线程版本快而且比我的线程版本差得多谁能告诉我我

python Asynchronous beautifulsoup pythonasyncio aiohttp

如何使用beautiful soup和python获取favicon

我写了一些愚蠢的代码只是为了学习但它不适用于任何网站这是代码 import urllib2 re from BeautifulSoup import BeautifulSoup as Soup class Founder def Fin

python beautifulsoup favicon

使用 beautifulsoup python 更改内部标签的文本

我想改变inner text使用获得的 HTML 中的标签Beautifulsoup Example a href index html Foo a 变成 a href index html Bar a 我已经设法通过它的 id 获取标签

python beautifulsoup

BeautifulSoup find_all() 是否保留标签顺序？

我希望使用 BeautifulSoup 来解析一些 HMTL 我有一张有几行的桌子我试图找到满足某些条件某些属性值的行并稍后在我的代码中使用该行的索引问题是 find all 保留它返回的结果集中的行顺序我在中没有找到这个do

python python27 beautifulsoup

Beautiful Soup 4 find_all 找不到 Beautiful Soup 3 找到的链接

我注意到一个非常烦人的错误 BeautifulSoup4 包 bs4 经常会发现比以前版本更少的标签包 BeautifulSoup 这是该问题的一个可重现的实例 import requests import bs4 import Beau

python Web webscraping beautifulsoup