在 python 中打印第一段

2024-01-07

我有一本书的文本文件，我需要打印每个部分的第一段。我想如果我在 \n\n 和 \n 之间找到文本我就能找到答案。这是我的代码，但它不起作用。你能告诉我我哪里错了吗？

lines = [line.rstrip('\n') for line in open('G:\\aa.txt')]

check = -1
first = 0
last = 0

for i in range(len(lines)):
    if lines[i] == "": 
            if lines[i+1]=="":
                check = 1
                first = i +2
    if i+2< len(lines):
        if lines[i+2] == "" and check == 1:
            last = i+2
while (first < last):
    print(lines[first])
    first = first + 1

我还在 stackoverflow 中找到了一个代码，我也尝试了它，但它只是打印了一个空数组。

f = open("G:\\aa.txt").readlines()
flag=False
for line in f:
        if line.startswith('\n\n'):
            flag=False
        if flag:
            print(line)
        elif line.strip().endswith('\n'):
            flag=True

我在下面分享了这本书的示例部分。

地形

人类感兴趣的广阔领域就在我们的门外，但迄今为止却很少被探索。这是动物智力领域。

在研究世界野生动物的各种兴趣中，没有什么能超越对它们的思想、道德以及它们作为心理过程结果而执行的行为的研究。

野生动物的气质和个性

我在这里想做的是，找到大写行，并将它们全部放入一个数组中。然后，使用索引方法，我将通过比较我创建的数组的这些元素的索引来找到每个部分的第一段和最后一段。

输出应该是这样的：

人类感兴趣的广阔领域就在我们的门外，但迄今为止却很少被探索。这是动物智力领域。

如果您想对可以使用的部分进行分组itertools.groupby使用空行作为分隔符：

from itertools import groupby
with open("in.txt") as f:
    for k, sec in groupby(f,key=lambda x: bool(x.strip())):
        if k:
            print(list(sec))

通过更多 itertools foo，我们可以使用大写标题作为分隔符来获取节：

from itertools import groupby, takewhile

with open("in.txt") as f:
    grps = groupby(f,key=lambda x: x.isupper())
    for k, sec in grps:
        # if we hit a title line
        if k: 
            # pull all paragraphs
            v = next(grps)[1]
            # skip two empty lines after title
            next(v,""), next(v,"")

            # take all lines up to next empty line/second paragraph
            print(list(takewhile(lambda x: bool(x.strip()), v)))

这会给你：

['There is a vast field of fascinating human interest, lying only just outside our doors, which as yet has been but little explored. It is the Field of Animal Intelligence.\n']
['What I am trying to do here is, find the uppercase lines, and put them all in an array. Then, using the index method, I will find the first and last paragraphs of each section by comparing the indexes of these elements of this array I created.']

每个部分的开头都有一个全大写的标题，因此一旦我们点击该标题，我们就知道有两个空行，然后第一段和模式就会重复。

要将其分解为使用循环：

from itertools import groupby  
from itertools import groupby
def parse_sec(bk):
    with open(bk) as f:
        grps = groupby(f, key=lambda x: bool(x.isupper()))
        for k, sec in grps:
            if k:
                print("First paragraph from section titled :{}".format(next(sec).rstrip()))
                v = next(grps)[1]
                next(v, ""),next(v,"")
                for line in v:
                    if not line.strip():
                        break
                    print(line)

对于您的文本：

In [11]: cat -E in.txt

THE LAY OF THE LAND$
$
$
There is a vast field of fascinating human interest, lying only just outside our doors, which as yet has been but little explored. It is the Field of Animal Intelligence.$
$
Of all the kinds of interest attaching to the study of the world's wild animals, there are none that surpass the study of their minds, their morals, and the acts that they perform as the results of their mental processes.$
$
$
WILD ANIMAL TEMPERAMENT & INDIVIDUALITY$
$
$
What I am trying to do here is, find the uppercase lines, and put them all in an array. Then, using the index method, I will find the first and last paragraphs of each section by comparing the indexes of these elements of this array I created.

美元符号是新行，输出是：

In [12]: parse_sec("in.txt")
First paragraph from section titled :THE LAY OF THE LAND
There is a vast field of fascinating human interest, lying only just outside our doors, which as yet has been but little explored. It is the Field of Animal Intelligence.

First paragraph from section titled :WILD ANIMAL TEMPERAMENT & INDIVIDUALITY
What I am trying to do here is, find the uppercase lines, and put them all in an array. Then, using the index method, I will find the first and last paragraphs of each section by comparing the indexes of these elements of this array I created.

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

python

Text

paragraph

在 python 中打印第一段的相关文章

使用 python requests 模块时出现 HTTP 503 错误

我正在尝试发出 HTTP 请求但当前可以从 Firefox 浏览器访问的网站响应 503 错误代码本身非常简单在网上搜索一番后我添加了user Agent请求参数但也没有帮助有人能解释一下如何消除这个 503 错误吗顺便说一句
为什么从 Pandas 1.0 中删除了日期时间？

我在 pandas 中处理大量数据分析并每天使用 pandas datetime 最近我收到警告 FutureWarning pandas datetime 类已弃用并将在未来版本中从 pandas 中删除改为从 datetime 模块
如何用python脚本控制TP LINK路由器

我想知道是否有一个工具可以让我连接到路由器并关闭它然后从 python 脚本重新启动它我知道如果我写 import os os system ssh l root 192 168 2 1 我可以通过 python 连接到我的路由器但是
使用 Python 从文本中删除非英语单词

我正在 python 上进行数据清理练习我正在清理的文本包含我想删除的意大利语单词我一直在网上搜索是否可以使用像 nltk 这样的工具包在 Python 上执行此操作例如给出一些文本 Io andiamo to the beach w
将 python2.7 与 Emacs 24.3 和 python-mode.el 一起使用

我是 Emacs 新手我正在尝试设置我的 python 环境到目前为止我已经了解到在 python 缓冲区中使用 python mode el C c C c将当前缓冲区的内容加载到交互式 python shell 中显然使用了什么
独立滚动矩阵的行

我有一个矩阵准确地说是 2d numpy ndarray A np array 4 0 0 1 2 3 0 0 5 我想滚动每一行A根据另一个数组中的滚动值独立地 r np array 2 0 1 也就是说我想这样做 print np
您可以格式化 pandas 整数以进行显示，例如浮点数的“pd.options.display.float_format”？

我见过this https stackoverflow com questions 18404946 py pandas formatdataframe and this https stackoverflow com questions
YOLOv8获取预测边界框

我想将 OpenCV 与 YOLOv8 集成ultralytics 所以我想从模型预测中获取边界框坐标我该怎么做呢 from ultralytics import YOLO import cv2 model YOLO yolov8n pt
从Python中的字典列表中查找特定值

我的字典列表中有以下数据 data I versicolor 0 Sepal Length 7 9 I setosa 0 I virginica 1 I versicolor 0 I setosa 1 I virginica 0 Sepal
如何使用 Mysql Python 连接器检索二进制数据？

如果我在 MySQL 中创建一个包含二进制数据的简单表 CREATE TABLE foo bar binary 4 INSERT INTO foo bar VALUES UNHEX de12 然后尝试使用 MySQL Connector P
如何使用 pybrain 黑盒优化训练神经网络来处理监督数据集？

我玩了一下 pybrain 了解如何生成具有自定义架构的神经网络并使用反向传播算法将它们训练为监督数据集然而我对优化算法以及任务学习代理和环境的概念感到困惑例如我将如何实现一个神经网络例如 1 以使用 pybrain 遗传算法
javascript 是否有等效的 __repr__ ？

我最接近Python的东西repr这是 function User name password this name name this password password User prototype toString function r
pip 列出活动 virtualenv 中的全局包

将 pip 从 1 4 x 升级到 1 5 后pip freeze输出我的全局安装系统软件包的列表而不是我的 virtualenv 中安装的软件包的列表我尝试再次降级到 1 4 但这并不能解决我的问题这有点类似于这个问题 http
Python3 在 DirectX 游戏中移动鼠标

我正在尝试构建一个在 DirectX 游戏中执行一些操作的脚本除了移动鼠标之外我一切都正常是否有任何可用的模块可以移动鼠标适用于 Windows python 3 Thanks I used pynput https pypi or
使用特定颜色和抖动在箱形图上绘制数据点

我有一个plotly graph objects Box图我显示了箱形图中的所有点我需要根据数据的属性为标记着色如下所示我还想抖动这些点下面未显示 Using Box我可以绘制点并抖动它们但我不认为我可以给它们着色 fig a
python import inside函数隐藏现有变量

我在我正在处理的多子模块项目中遇到了一个奇怪的 UnboundLocalError 分配之前引用的局部变量问题并将其精简为这个片段使用标准库中的日志记录模块 import logging def foo logging info fo
使用for循环时如何获取前一个元素？ [复制]

这个问题在这里已经有答案了可能的重复 Python 循环内的上一个和下一个值 https stackoverflow com questions 1011938 python previous and next values inside
Pandas 每周计算重复值

我有一个Dataframe包含按周分组的日期和 ID df date id 2022 02 07 1 3 5 4 2022 02 14 2 1 3 2022 02 21 9 10 1 2022 05 16 我想计算每周有多少 id 与上周重
更改 Tk 标签小部件中单个单词的颜色

我想更改 Tkinter 标签小部件中单个单词的字体颜色我知道可以使用文本小部件来实现与我想要完成的类似的事情例如使单词 YELLOW 显示为黄色 self text tag config tag yel fg clr yellow s
使用 z = f(x, y) 形式的 B 样条方法来拟合 z = f(x)

作为一个潜在的解决方案这个问题 https stackoverflow com questions 76476327 how to avoid creating many binary switching variables in gekk

随机推荐

HTML 中的“href”值可在 Android 上的 YouTube 应用或市场 (Google Play) 中打开视频

我正在制作一个显示 360 度视频的网页但我最近注意到 Android 浏览器中不支持 360 度功能因此视频无法正确显示所以经过大量搜索后我发现最好的选择是尝试使用本开发人员教程中解释的 Android Intent 在 YouTu
如何在 Python 中迭代坐标列表并计算它们之间的距离

我有一个包含 20 个坐标 x 和 y 坐标的列表我可以计算任意两个坐标之间的距离但我很难编写一个算法来迭代列表并计算第一个节点与每个其他节点之间的距离例如 ListOfCoordinates 1 2 3 4 5 6 7 8 9 1
如何在 tcl 中使用 split 删除不需要的字符

这是一个例子 Interface IP Address OK Method Status Protocol FastEthernet0 0 unassigned YES unset administratively down down Fa
此类不符合键的键值编码

我对快速开发非常陌生我正在努力本节 https developer apple com library ios referencelibrary GettingStarted DevelopiOSAppsSwift Lesson7 htm
MacOS：以编程方式向图像添加一些文本？

我正在将一些代码从 Linux 转换到 Mac 如何以编程方式用文本覆盖图像类似于 ImageMagick 转换命令由于各种原因我不能依赖安装 ImageMagick convert draw text 50 800 hello wo
我们可以迭代 Amazon S3 中的完整对象集吗

我尝试打印 S3 存储桶中所有对象的元数据但是它不会返回超过 1000 个对象的结果我尝试过实施objectListing isTruncated 但这没有帮助下面是我列出 1000 多个对象的示例代码 ListObjectsReq
Hask 局部很小吗？

haskell 对象的类别 Hask 是局部小类别的示例吗 http ncatlab org nlab show locally small category http ncatlab org nlab show locally small
使用 mod_cgi 和 mod_perl 捕获错误

提前感谢大家我一直在对错误处理进行一些研究但我觉得我并没有对我应该做什么有充分的了解前言我的代码位于 Apache 中并在浏览器中执行我的目标不包括命令行执行我希望具有 CGI Carp fatalsToBrowser 的行为
window.open 无法打开两个以上的链接

根据我的要求我需要创建一个 Google Chrome 扩展程序只需在单个 Chrome 窗口的不同选项卡中单击一次即可打开多个链接 25 该代码在 Chrome 18 之前一直运行良好现在我使用的是 chrome 24 该代码停止
有选择地对数组中的元素取反

我正在寻找有关 numpy 中如何选择性地否定数组的值的一些帮助已经尝试过了 numpy where and numpy negative但无法对选定的少数人实施条件 import numpy as np arr np arange
使用 ggplot 在 x 轴上显示有限的时间范围

我希望下图中的 x 轴从 06 00 开始到 22 00 结束每 4 小时休息一次但是我无法弄清楚以下内容 a 如何使x轴从06 00开始 06 00之前没有任何空白 b 如何使x轴在22 00结束 22 00之后没有任何空白现在
“subl”命令无法正常工作

在终端中使用 subl 命令时遇到问题它曾经工作正常但最近当我运行它时它确实打开了我想要的文件但我无法编辑它们而且 Sublime Text 没有显示在我的 mac 的顶部栏中就好像它根本没有运行一样有除了扩展坞中的图标之外也
如何防止加载谷歌图表表格CSS

每次我使用 Google Charts Table 时 Google 加载程序都会加载一个http ajax googleapis com ajax static modules gviz 1 0 table table css这总是并且几
Gulp AssertionError [ERR_ASSERTION]：必须指定任务函数

我正在尝试为构建的 Web 应用程序的演示自定义模板AngularJS using MacOS 塞拉利昂 10 13 6 我已经安装了Gulp但当我启动时gulp serve返回此错误而不启动本地服务器 assert js 337 抛出错误
C - 如何释放动态分配的内存？

看看这段代码它是链表的一部分 int main List head1 NULL insertFront head1 1 insertFront head1 2 print head1 free head1 return 0 另一个函数是
MFC应用程序标题

我正在使用 MFC 创建一个简单的时钟应用程序我的应用程序标题如下 CLOCK CLOCK1 如何将其重置为简单的 CLOCK 仅供参考我已经启用了文档视图架构放入 MFC 标题的重写 void CMainFrame OnUpdate
日期时间 x 轴 matplotlib 标签导致不受控制的重叠

我正在尝试绘制一只熊猫series with a pandas tseries index DatetimeIndex x 轴标签顽固地重叠即使有几个建议的解决方案我也无法使它们美观我试过stackoverflow 解决方案建议使用a
如何解决“找不到模块‘请求上下文’的声明文件。”？

我目前正在处理三个文件即index js index main js 和app js 我正在使用请求上下文从index main js 获取变量并将其传递给index js 在 app js 我在服务器文件夹中创建的文件中我有以下代码
删除字符串中特定字符之后的字符，然后删除子字符串？

当这看起来很简单并且关于字符串字符正则表达式有大量问题时我觉得发布这篇文章有点愚蠢但我找不到我需要的东西除了另一种语言删除特定点之后的所有文本 https stackoverflow com questions 2176544
在 python 中打印第一段

我有一本书的文本文件我需要打印每个部分的第一段我想如果我在 n n 和 n 之间找到文本我就能找到答案这是我的代码但它不起作用你能告诉我我哪里错了吗 lines line rstrip n for line in open G a

在 python 中打印第一段

在 python 中打印第一段 的相关文章

随机推荐

热门标签

在 python 中打印第一段的相关文章