多处理 - 管道与队列

2023-11-27

队列和管道之间的根本区别是什么Python 的多处理包?

在什么情况下应该选择其中一种而不是另一种？什么时候使用比较有利Pipe()？什么时候使用比较有利Queue()?

简短的摘要

截至 CY2023，此答案中描述的技术已经过时。这些天，您可以使用pebble, mpire or concurrent.futures.ProcessPoolExecutor()...

无论您使用哪种 python 并发工具，OP 问题的答案仍然有效，如下所示。

ProcessPoolExector()不需要Pipe() or Queue()传达任务/结果。

原答案

A Pipe()只能有两个端点。
A Queue()可以有多个生产者和消费者。

何时使用它们

如果您需要两个以上的点进行通信，请使用Queue().

如果您需要绝对的性能，Pipe()速度更快，因为Queue()是建立在Pipe().

性能基准测试

假设您想要生成两个进程并尽快在它们之间发送消息。这些是使用类似测试进行拉力赛的计时结果Pipe() and Queue()...

仅供参考，我输入了结果SimpleQueue() and JoinableQueue()作为奖励。

JoinableQueue()考虑任务时queue.task_done()被调用（它甚至不知道具体的任务，它只是计算队列中未完成的任务），这样queue.join()知道工作已经完成。

这个答案底部的每个代码......

# This is on a Thinkpad T430, VMWare running Debian 11 VM, and Python 3.9.2

$ python multi_pipe.py
Sending 10000 numbers to Pipe() took 0.14316844940185547 seconds
Sending 100000 numbers to Pipe() took 1.3749017715454102 seconds
Sending 1000000 numbers to Pipe() took 14.252539157867432 seconds
$  python multi_queue.py
Sending 10000 numbers to Queue() took 0.17014789581298828 seconds
Sending 100000 numbers to Queue() took 1.7723784446716309 seconds
Sending 1000000 numbers to Queue() took 17.758610725402832 seconds
$ python multi_simplequeue.py
Sending 10000 numbers to SimpleQueue() took 0.14937686920166016 seconds
Sending 100000 numbers to SimpleQueue() took 1.5389132499694824 seconds
Sending 1000000 numbers to SimpleQueue() took 16.871352910995483 seconds
$ python multi_joinablequeue.py
Sending 10000 numbers to JoinableQueue() took 0.15144729614257812 seconds
Sending 100000 numbers to JoinableQueue() took 1.567549228668213 seconds
Sending 1000000 numbers to JoinableQueue() took 16.237736225128174 seconds



# This is on a Thinkpad T430, VMWare running Debian 11 VM, and Python 3.7.0

(py37_test) [mpenning@mudslide ~]$ python multi_pipe.py
Sending 10000 numbers to Pipe() took 0.13469791412353516 seconds
Sending 100000 numbers to Pipe() took 1.5587594509124756 seconds
Sending 1000000 numbers to Pipe() took 14.467186689376831 seconds
(py37_test) [mpenning@mudslide ~]$ python multi_queue.py
Sending 10000 numbers to Queue() took 0.1897726058959961 seconds
Sending 100000 numbers to Queue() took 1.7622203826904297 seconds
Sending 1000000 numbers to Queue() took 16.89015531539917 seconds
(py37_test) [mpenning@mudslide ~]$ python multi_joinablequeue.py
Sending 10000 numbers to JoinableQueue() took 0.2238149642944336 seconds
Sending 100000 numbers to JoinableQueue() took 1.4744081497192383 seconds
Sending 1000000 numbers to JoinableQueue() took 15.264554023742676 seconds


# This is on a ThinkpadT61 running Ubuntu 11.10, and Python 2.7.2

mpenning@mpenning-T61:~$ python multi_pipe.py 
Sending 10000 numbers to Pipe() took 0.0369849205017 seconds
Sending 100000 numbers to Pipe() took 0.328398942947 seconds
Sending 1000000 numbers to Pipe() took 3.17266988754 seconds
mpenning@mpenning-T61:~$ python multi_queue.py 
Sending 10000 numbers to Queue() took 0.105256080627 seconds
Sending 100000 numbers to Queue() took 0.980564117432 seconds
Sending 1000000 numbers to Queue() took 10.1611330509 seconds
mpnening@mpenning-T61:~$ python multi_joinablequeue.py 
Sending 10000 numbers to JoinableQueue() took 0.172781944275 seconds
Sending 100000 numbers to JoinableQueue() took 1.5714070797 seconds
Sending 1000000 numbers to JoinableQueue() took 15.8527247906 seconds
mpenning@mpenning-T61:~$

总之：

在Python 2.7下，Pipe()比a快约300%Queue()。甚至不要考虑JoinableQueue()除非你确实必须拥有这些好处。
在 python 3.x 下，Pipe()仍然比其他国家有（大约 20%）优势Queue()s，但之间的性能差距Pipe() and Queue()不像 Python 2.7 中那样引人注目。各种种类Queue()各实施方案之间的差异大约在 15% 以内。我的测试也使用整数数据。有些人评论说，他们发现多处理所使用的数据类型存在性能差异。

python 3.x 的底线：YMMV...考虑使用您自己的数据类型（即整数/字符串/对象）运行您自己的测试，以得出有关您自己感兴趣的平台和用例的结论.

我还应该提到，我的 python3.x 性能测试不一致并且有所不同。我在几分钟内进行了多次测试，以获得每种情况的最佳结果。我怀疑这些差异与在 VMWare/虚拟化下运行我的 python3 测试有关；然而，虚拟化诊断只是猜测。

*** 响应关于测试技术的评论 ***

评论里@JJCsaid:

更公平的比较是运行 N 个工作线程，每个工作线程通过点对点管道与主线程通信，与运行 N 个工作线程全部从单个点对多点队列拉取的性能相比。

本来这个答案只考虑了一名工人和一名生产者的表现；这是基准用例Pipe()。您的评论需要为多个工作进程添加不同的测试。虽然这是对常见现象的有效观察Queue()用例，它可以轻松地沿全新的轴分解测试矩阵（即添加具有不同数量的工作进程的测试）。

奖励材料 2

多重处理引入了信息流的微妙变化，这使得调试变得困难，除非您知道一些快捷方式。例如，您可能有一个脚本，在许多条件下通过字典进行索引时都可以正常工作，但在处理某些输入时很少会失败。

通常当整个Python进程崩溃时我们就能得到失败的线索；但是，如果多处理函数崩溃，您不会将未经请求的崩溃回溯打印到控制台。如果不知道进程崩溃的原因，追踪未知的多处理崩溃是很困难的。

我发现跟踪多处理崩溃信息的最简单方法是将整个多处理函数包装在一个try / except并使用traceback.print_exc():

import traceback
def run(self, args):
    try:
        # Insert stuff to be multiprocessed here
        return args[0]['that']
    except:
        print "FATAL: reader({0}) exited while multiprocessing".format(args) 
        traceback.print_exc()

现在，当您发现崩溃时，您会看到类似以下内容：

FATAL: reader([{'crash': 'this'}]) exited while multiprocessing
Traceback (most recent call last):
  File "foo.py", line 19, in __init__
    self.run(args)
  File "foo.py", line 46, in run
    KeyError: 'that'

源代码：

"""
multi_pipe.py
"""
from multiprocessing import Process, Pipe
import time

def reader_proc(pipe):
    ## Read from the pipe; this will be spawned as a separate Process
    p_output, p_input = pipe
    p_input.close()    # We are only reading
    while True:
        msg = p_output.recv()    # Read from the output pipe and do nothing
        if msg=='DONE':
            break

def writer(count, p_input):
    for ii in range(0, count):
        p_input.send(ii)             # Write 'count' numbers into the input pipe
    p_input.send('DONE')

if __name__=='__main__':
    for count in [10**4, 10**5, 10**6]:
        # Pipes are unidirectional with two endpoints:  p_input ------> p_output
        p_output, p_input = Pipe()  # writer() writes to p_input from _this_ process
        reader_p = Process(target=reader_proc, args=((p_output, p_input),))
        reader_p.daemon = True
        reader_p.start()     # Launch the reader process

        p_output.close()       # We no longer need this part of the Pipe()
        _start = time.time()
        writer(count, p_input) # Send a lot of stuff to reader_proc()
        p_input.close()
        reader_p.join()
        print("Sending {0} numbers to Pipe() took {1} seconds".format(count,
            (time.time() - _start)))

"""
multi_queue.py
"""

from multiprocessing import Process, Queue
import time
import sys

def reader_proc(queue):
    ## Read from the queue; this will be spawned as a separate Process
    while True:
        msg = queue.get()         # Read from the queue and do nothing
        if (msg == 'DONE'):
            break

def writer(count, queue):
    ## Write to the queue
    for ii in range(0, count):
        queue.put(ii)             # Write 'count' numbers into the queue
    queue.put('DONE')

if __name__=='__main__':
    pqueue = Queue() # writer() writes to pqueue from _this_ process
    for count in [10**4, 10**5, 10**6]:             
        ### reader_proc() reads from pqueue as a separate process
        reader_p = Process(target=reader_proc, args=((pqueue),))
        reader_p.daemon = True
        reader_p.start()        # Launch reader_proc() as a separate python process

        _start = time.time()
        writer(count, pqueue)    # Send a lot of stuff to reader()
        reader_p.join()         # Wait for the reader to finish
        print("Sending {0} numbers to Queue() took {1} seconds".format(count, 
            (time.time() - _start)))

"""
multi_simplequeue.py
"""

from multiprocessing import Process, SimpleQueue
import time
import sys

def reader_proc(queue):
    ## Read from the queue; this will be spawned as a separate Process
    while True:
        msg = queue.get()         # Read from the queue and do nothing
        if (msg == 'DONE'):
            break

def writer(count, queue):
    ## Write to the queue
    for ii in range(0, count):
        queue.put(ii)             # Write 'count' numbers into the queue
    queue.put('DONE')

if __name__=='__main__':
    pqueue = SimpleQueue() # writer() writes to pqueue from _this_ process
    for count in [10**4, 10**5, 10**6]:
        ### reader_proc() reads from pqueue as a separate process
        reader_p = Process(target=reader_proc, args=((pqueue),))
        reader_p.daemon = True
        reader_p.start()        # Launch reader_proc() as a separate python process

        _start = time.time()
        writer(count, pqueue)    # Send a lot of stuff to reader()
        reader_p.join()         # Wait for the reader to finish
        print("Sending {0} numbers to SimpleQueue() took {1} seconds".format(count,
            (time.time() - _start)))

"""
multi_joinablequeue.py
"""
from multiprocessing import Process, JoinableQueue
import time

def reader_proc(queue):
    ## Read from the queue; this will be spawned as a separate Process
    while True:
        msg = queue.get()         # Read from the queue and do nothing
        queue.task_done()

def writer(count, queue):
    for ii in range(0, count):
        queue.put(ii)             # Write 'count' numbers into the queue

if __name__=='__main__':
    for count in [10**4, 10**5, 10**6]:
        jqueue = JoinableQueue() # writer() writes to jqueue from _this_ process
        # reader_proc() reads from jqueue as a different process...
        reader_p = Process(target=reader_proc, args=((jqueue),))
        reader_p.daemon = True
        reader_p.start()     # Launch the reader process
        _start = time.time()
        writer(count, jqueue) # Send a lot of stuff to reader_proc() (in different process)
        jqueue.join()         # Wait for the reader to finish
        print("Sending {0} numbers to JoinableQueue() took {1} seconds".format(count, 
            (time.time() - _start)))

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

多处理 - 管道与队列的相关文章

matplotlib：调整图形窗口大小而不缩放图形内容

当您调整图形大小时 Matplotlib 会自动缩放图形窗口中的所有内容通常这是用户想要的但我经常想增加窗口的大小为其他东西腾出更多空间在这种情况下我希望在更改窗口大小时预先存在的内容保持相同的大小有谁知道一个干净的方法来做到这
测试交互式Python程序

我想知道python的哪些测试工具支持交互式程序的测试例如我有一个由以下人员启动的应用程序 python dummy program py gt gt Hi whats your name Joseph 我想要仪器Joseph所以我可以
成员初始值设定项列表和分配之间是否存在性能差异？ [复制]

这个问题在这里已经有答案了我最近与一位朋友讨论他们说在 C 中创建对象时使用初始化列表而不是简单地分配数据成员可以提高性能为什么会这样如果这是真的我找到了这个页面 http www parashift com c faq in
python blpapi安装错误

我试图根据 README 中的说明为 python 安装 blpapi 3 5 5 但是在运行时 python setup py install 我收到以下错误 running install running build running b
如何使用 python http.server 运行 CGI“hello world”

我使用的是 Windows 7 和 Python 3 4 3 我想在浏览器中运行这个简单的 helloworld py 文件 print Content Type text html print print print print h2 H
Scrapy Splash，如何处理onclick？

我正在尝试抓取以下内容我能够收到响应但我不知道如何访问以下项目的内部数据以抓取它我注意到访问这些项目实际上是由 JavaScript 和分页处理的这种情况我该怎么办下面是我的代码 import scrapy from scrapy
如何检查 webgl(two.js) 的客户端性能

我有一个使用 Three JS 的图形项目现在我想自动检查客户端 GPU 性能并计算可以在应用程序中加载多少元素我想到了诸如 GPU 基准测试之类的东西看一眼stats js https github com mrdoob stats
淘汰赛应用程序的性能调整 - 改进响应时间的指南

我有一个大型复杂的页面严重依赖于 Knockout js 性能开始成为一个问题但检查调用堆栈并试图找到瓶颈是一个真正的挑战我在另一个问题中注意到 Knockout js 理解 foreach 和 with https stackov
如何解决CDK CLI版本不匹配的问题

我收到以下错误此 CDK CLI 与您的应用程序使用的 CDK 库不兼容请将CLI升级到最新版本云程序集架构版本不匹配支持的最大架构版本为 8 0 0 但发现为 9 0 0 发出后cdk diff命令我确实跑了npm instal
TypeError：“NoneType”对象不可下标[重复]

这个问题在这里已经有答案了错误 names curfetchone 0 TypeError NoneType object is not subscriptable 我尝试检查缩进但仍然有错误我读到如果数据库中没有文件名记录变量名
如何使用 Python 实现并行 gzip 压缩？

使用python压缩大文件 https stackoverflow com questions 9518705 big file compression with python给出了一个很好的例子来说明如何使用例如bz2 纯粹用 Pytho
使用 conda 安装额外功能

With pip我们可以使用方括号安装子包例如与阿帕奇气流 https pythonhosted org airflow installation html pip install airflow all 有类似的东西吗conda或者我必
在 matplotlib 中将 3D 背景更改为黑色

我在将 3D 图表的背景更改为黑色时遇到问题这是我当前的代码当我将facecolor设置为黑色时它会将图表内部更改为灰色这不是我想要的 fig plt figure fig set size inches 10 10 ax plt
如何输入可变的默认参数

Python 中处理可变默认参数的方法是将它们设置为无 https stackoverflow com a 366430 5049813 例如 def foo bar None bar if bar is None else bar ret
尝试 Catch 性能 Java

当捕获异常而不是进行检查时 try catch 需要多长时间以纳秒为单位假设消息具有用于查找的 HashMap 类型性能 try timestamp message getLongField MessageField TIMESTAMP
python：xml.etree.ElementTree，删除“命名空间”

我喜欢 ElementTree 解析 xml 的方式特别是 Xpath 功能我有一个带有嵌套标签的应用程序的 xml 输出我想按名称访问此标签而不指定名称空间这可能吗例如 root findall molpro job 代替 ro
“yield item”与 return iter(items) 相比有何优点？

在下面的示例中 resp results 是一个迭代器版本1 items for result in resp results item process result items append item return iter items
如何通过 Selenium 内部的文本查找按钮（Python）？

我有以下三个按钮我不知道如何获取其中的文本例如异常值我试过browser find element by link text Outliers click 但出现无法找到元素错误我该怎么做 See find element by
如何对每一行进行 value_counts 并创建一些列，其值是每个值的计数

我得到一个数据框如下 df c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 r1 0 1 1 1 1 0 0 0 0 0 0 0 r2 1 2 2 2 2 1 1 1 1 0 0 0 r3 1 0 2 0 0
Pandas 2 个字段中唯一值的数量

我正在尝试查找覆盖 2 个字段的唯一值的数量例如一个典型的例子是姓氏和名字我有一个数据框当我执行以下操作时我只获取每列的唯一字段数在本例中为最后一个和第一个不是复合体 df Last Name First Name nu

随机推荐

简单的 jQuery ajax 示例未在返回的 HTML 中找到元素

我正在尝试学习 jQuery 的 ajax 函数我已经让它工作了但是 jQuery 在返回的 HTML DOM 中找不到元素在与 jquery 相同的文件夹中运行此页面
为什么内联和内联块在相同的文本内容下具有不同的高度

在下面的代码中 span class d inline style border 1px red solid padding 3px 0 span asbbb span span span class d inline block ml 2
ADO.NET 实体框架：ORM 解决方案之间的决策

我在选择 ORM 时正在为我的新应用程序寻找一些指南我想通过 NHibernate 和 LINQ to SQL 来评估 EF 我需要来自这个美好社区的专家声音您可以根据以下几点进行评估可扩展性学习曲线便于使用表现 ETC 嗯在
如何在部署时隐藏或加密我自己的keras模型文件（如h5）？

我为应用程序制作了自己的模型并将其保存在 Keras 中作为 h5 文件我使用 PyQt5 制作了 GUI 应用程序该应用程序使用此模型我正在尝试在没有任何有关深度学习模型的信息的情况下部署此应用程序我对这种情况有一些疑问我可以
取消文件链接失败。我应该再试一次吗？

我的本地 git 存储库中的一个文件出现问题当我尝试更改分支时它说 Unlink of file templates media container html failed Should I try again y n 这意味着什么这
如何在 codeIgniter 中自定义表单验证错误

codeIgniter 中是否有一个文件我可以在其中进行编辑以便自定义表单验证消息我只是想将它们放在项目符号列表中以消耗更少的空间这是我用来输出错误消息的代码 div class div
xsd.exe 生成了一个数组中包含多个元素的 C#

我收到了一组 XML 架构文件我无法更改 XML 因为这些有时会更新我正在使用 xsd exe 将架构文件转换为生成的 C 代码我无法使用任何第三方工具 XML 架构文件之一的一部分如下所示
静态常量数据成员应该在哪里定义？ [复制]

这个问题在这里已经有答案了我有课 class foo public foo foo int private static const string s 初始化字符串的最佳位置在哪里s在源文件中任何地方在one编译单元通常是 cpp 文
模板文字作为 JSX 中的字符串内容

我想知道在 JSX 标签内混合字符串值和变量的最佳实践是什么我列出了我熟悉的选项 render const totalCount this state const totalCountStr Total count totalCount
如何重新定义“迷你缓冲区”模式映射内的键？

当在多个命令中接受正则表达式并提供 C p C n 历史导航时我试图重新定义用于导航历史记录的键除了 C p C n 之外我还想使用其他键例如当使用occur or 替换正则表达式 C p 和 C n 可用于转到上一个和下一个元素
关联实体和关联关系属性之间的区别？

关联实体和关联关系属性有什么区别我的书现代数据库管理 Hoffer 第 11 版指出两者之间存在差异它没有解释为什么会有差异它只是举例说明它们的不同之处一种关系具有single与其关联的属性是关联关系属性并用圆角矩形的虚线表
CORS 标头在浏览器中发生更改，导致内容被阻止

更新2 完整日志从客户的角度请求标头 POST dev micro server php HTTP 1 1 主机生产服务器 com连接保持活动状态内容长度 86编译指示无缓存缓存控制无缓存接受文本 html q 0 01起源
无法使用 jQuery 正确设置 Accept HTTP 标头

我正在尝试使用以下 jquery 代码将 Accept HTTP 标头设置为 text xml ajax beforeSend function req req setRequestHeader Accept text xml type G
R 中与跨度相关的 LOESS 警告/错误

我正在 R 中运行 LOESS 回归并且在一些较小的数据集上遇到了警告警告消息 1 在 simpleLoess y x w 跨度度数度数参数参数化在 2703 9 使用伪逆 2 在 simpleLoess y x w 跨度度
错误：在“INTO”处或附近多次指定 INTO

在 postgresql 函数内我试图将从表中选择的两个值放入两个变量中但出现此错误 INTO specified more than once at or near INTO 这是伪代码 CREATE OR REPLACE FUN
多对多关系示例

我在这里和谷歌中没有找到任何 MYSQL 多对多关系示例我正在寻找的是一个非常简单的示例其中 php mysql 显示数据库的结果有人可以写一个非常简单的例子吗示例场景大学的学生和课程一个特定的学生可能会参加多门课程当然一门课
使用以反斜杠结尾的字符串时，ConvertTo-Json 会引发错误

以下代码位会产生错误 W surge ogre gt SolutionDir W Surge ogre ConvertTo Json ConvertTo Json The converted JSON string is in bad fo
在 dealloc 中调用 super 是否重要？

void dealloc super dealloc receivedData release receivedData nil or void dealloc receivedData release receivedData nil s
应用程序崩溃异常类型：EXC_CRASH (SIGKILL) 终止原因：命名空间 SPRINGBOARD

我的应用程序最近被拒绝因为该应用程序在启动时被 beta 测试人员崩溃但在开发配置文件中运行它时这种情况不会发生我可以知道这个崩溃日志是什么意思何时以及如何解决它指南 2 1 性能应用程序完整性我们无法审核您的应用程序因为
多处理 - 管道与队列

队列和管道之间的根本区别是什么Python 的多处理包在什么情况下应该选择其中一种而不是另一种什么时候使用比较有利Pipe 什么时候使用比较有利Queue 简短的摘要截至 CY2023 此答案中描述的技术已经过时这些天您可以使用p

多处理 - 管道与队列

简短的摘要

原答案

多处理 - 管道与队列 的相关文章

随机推荐

热门标签

多处理 - 管道与队列的相关文章