Python中的joblib.Parallel函数

2023-05-16

Joblib是一个可以将Python代码转换为并行计算模式的包，可以大大简化我们写并行计算代码的步骤。我们可以通过操作该包内的函数来实现目标代码的并行计算，从而提高代码运行效率。下面举一个简单的例子来说明：

1、首先，我们定义一个简单的函数single(a)，该函数顺序执行休眠1s然后打印a的值的操作：

from joblib import Parallel, delayed
import time
def single(a):
    """ 定义一个简单的函数  """
    time.sleep(1)  # 休眠1s
    print(a)       # 打印出a

2、我们使用for循环运行10次single()函数，并记录运行的时间，由结果可知，这种情况下代码大概会运行10s。

start = time.time()  # 记录开始的时间
for i in range(10):  # 执行10次single()函数
    single(i)
Time = time.time() - start  # 计算执行的时间
print(str(Time)+'s')

#  运行结果如下  #
0
1
2
3
4
5
6
7
8
9
10.0172278881073s

3、下面我们使用joblib库里的Parallel函数及delayed函数来对执行10次single()函数的操作实现并行化处理。Parallel函数会创建一个进程池，以便在多进程中执行每一个列表项，函数中，我们设置参数n_jobs=3，即开启三个进程。函数delayed是一个创建元组(function, args, kwargs)的简单技巧，代码中的意思是创建10个实参分别为0~9的single()函数的workers。代码及结果如下，可见运行时间相比顺序执行大大减小，由于进程切换等操作的时间开销，最终的执行时间并不是理想的3.33s，而是大于一个3.33s的时间。

start = time.time()  # 记录开始的时间
Parallel(n_jobs=3)(delayed(single)(i) for i in range(10))   # 并行化处理
Time = time.time() - start  # 计算执行的时间
print(str(Time)+'s')

#  运行结果如下  #
0
1
2
3
4
5
6
7
8
9
4.833665370941162s

另外，当n_jobs的值为1时，即相当于for循环的顺序执行，结果仍然会是10s，有兴趣可以自己实践下。当然，我们可以改变不同的n_jobs值来查看最终的运行结果。

4、Parallel参数众多，但常用的基本只有n_jobs和backend参数。有关Parallel函数的具体定义及用法可参考下面的解释：

Parallel函数的定义方式：

class joblib.parallel(n_jobs=None, backend=None, verbose=0, timeout=None, pre_dispatch='2 * n_jobs', 
                   batch_size='auto',temp_folder=None, max_nbytes='1M', mmap_mode='r', prefer=None, require=None)

参数解释：（参考：https://joblib.readthedocs.io/en/latest/generated/joblib.Parallel.html#joblib.Parallel）

n_jobs: int, default: None —— 设置并行执行任务的最大数量。

当backend="multiprocessing"时指python工作进程的数量，或者backend="threading"时指线程池大小。当n_jobs=-1时，使用所有的CPU执行并行计算。当n_jobs=1时，就不会使用并行代码，即等同于顺序执行，可以在debug情况下使用。另外，当n_jobs<-1时，将会使用(n_cpus + 1 + n_jobs)个CPU，例如n_jobs=-2时，将会使用n_cpus-1个CPU核，其中n_cpus为CPU核的数量。当n_jobs=None的情况等同于n_jobs=1

The maximum number of concurrently running jobs, such as the number of Python worker processes when backend ="multiprocessing" or the size of the thread-pool when backend="threading". If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used. None is a marker for 'unset' that will be interpreted as n_jobs=1 (sequential execution) unless the call is performed under a parallel_backend context manager that sets another value for n_jobs.

backend: str, default: 'loky' —— 指定并行化后端的实现方法。

backend='loky': 在与Python进程交换输入和输出数据时，可导致一些通信和内存开销。

backend='multiprocessing': 基于multiprocessing.Pool的后端，鲁棒性不如loky。

backend='threading': threading是一个开销非常低的backend。但是如果被调用的函数大量依赖于Python对象，它就会受到Python全局解释器(GIL)锁的影响。当执行瓶颈是显式释放GIL的已编译扩展时，“threading”非常有用(例如，封装在“with nogil”块中的Cython循环，或者对库(如NumPy)的大量调用)。

- "loky" used by default, can induce some communication and memory overhead when exchanging input and output data with the worker Python processes.

- "multiprocessing" previous process-based backend based on multiprocessing.Pool`. Less robust than `loky`.

- "threading" is a very low-overhead backend but it suffers from the Python Global Interpreter Lock if the called function relies a lot on Python objects. "threading" is mostly useful when the execution bottleneck is a compiled extension that explicitly releases the GIL (for instance a Cython loop wrapped in a "with nogil" block or an expensive call to a library such as NumPy).

- finally, you can register backends by calling register_parallel_backend. This will allow you to implement a backend of your liking.

It is not recommended to hard-code the backend name in a call to Parallel in a library. Instead it is recommended to set soft hints (prefer) or hard constraints (require) so as to make it possible for library users to change the backend from the outside using the parallel_backend context manager.

verbose: int, 可选项 —— 执行期间的信息显示

信息级别:如果非零，则打印进度消息。超过50，输出被发送到stdout。消息的频率随着信息级别的增加而增加。如果大于10，则报告所有迭代。

The verbosity level: if non zero, progress messages are printed. Above 50, the output is sent to stdout. The frequency of the messages increases with the verbosity level. If it more than 10, all iterations are reported.

timeout: float, 可选项 —— 任务运行时间限制

timeout仅用在n_jobs != 1的情况下，用来限制每个任务完成的时间，如果任何任务的执行超过这个限制值，将会引发“TimeOutError”错误。

Timeout limit for each task to complete. If any task takes longer a TimeOutError will be raised. Only applied when n_jobs != 1

pre_dispatch: {'all', integer, or expression, as in '3*n_jobs'}

预先分派的(任务的)批数(batches)。默认设置是“2 * n_jobs”。

The number of batches (of tasks) to be pre-dispatched. Default is '2*n_jobs'. When batch_size="auto" this is reasonable default and the workers should never starve.

batch_size: int or 'auto', default: 'auto' —— 一次性分派给每个worker的atomic tasks的数量。

当单个评估非常快时，由于开销的原因，使用dispatching的worker可能比顺序计算慢。一起进行批量快速计算可以缓解这种情况。“auto”策略会跟踪一个批处理完成所需的时间，并动态调整batch_size大小，使用启发式方法将时间保持在半秒以内。初始batch_size为1。batch_size="auto"且backend="threading时，将一次分派一个任务的batches，因为threading后端有非常小的开销，使用更大的batch_size在这种情况下没有证明带来任何好处。

The number of atomic tasks to dispatch at once to each worker. When individual evaluations are very fast, dispatching calls to workers can be slower than sequential computation because of the overhead. Batching fast computations together can mitigate this.

The ``'auto'`` strategy keeps track of the time it takes for a batch to complete, and dynamically adjusts the batch size to keep the time on the order of half a second, using a heuristic. The initial batch size is 1.

batch_size="auto" with backend="threading" will dispatch batches of a single task at a time as the threading backend has very little overhead and using larger batch size has not proved to bring any gain in that case.

temp_folder: str, 可选项

内存映射大数组池(pool for memmapping large arrays)使用的文件夹，以便与工作进程共享内存。如果没有，这将尝试以下顺序:

— 指向环境变量JOBLIB_TEMP_FOLDER的文件夹

— /dev/shm (如果这个文件存在并且可写)：这是现代Linux发行版上默认可用的RAM磁盘文件系统

— 可以被TMP、TMPDIR或TEMP这些环境变量覆盖的默认系统临时文件夹，Unix操作系统下通常是/TMP。

该参数只在backend="loky" 或 "multiprocessing"时有效。

Folder to be used by the pool for memmapping large arrays for sharing memory with worker processes. If None, this will try in order:

- a folder pointed by the JOBLIB_TEMP_FOLDER environment variable,

- /dev/shm if the folder exists and is writable: this is a RAM disk filesystem available by default on modern Linux distributions,

- the default system temporary folder that can be overridden with TMP, TMPDIR or TEMP environment variables, typically /tmp under Unix operating systems.

max_nbytes int, str, or None, optional, 1M by default

传递给在temp_folder中触发自动内存映射的worker的数组大小的阈值。’1M‘即1MB。

该参数只在backend="loky" 或 "multiprocessing"时有效。

Threshold on the size of arrays passed to the workers that triggers automated memory mapping in temp_folder. Can be an int in Bytes, or a human-readable string, e.g., '1M' for 1 megabyte. Use None to disable memmapping of large arrays. Only active when backend="loky" or "multiprocessing". Only active when backend="loky" or "multiprocessing".

mmap_mode: {None, 'r+', 'r', 'w+', 'c'}

Memmapping mode for numpy arrays passed to workers.See 'max_nbytes' parameter documentation for more details.

prefer: str in {'processes', 'threads'} or None, default: None

如果使用parallel_backend上下文管理器没有选择任何特定backend，则使用软提示选择默认backend。默认的基于进程(thread-based)的backend是“loky”，默认的基于线程的backend是“threading”。如果指定了“backend”参数，则忽略。

Soft hint to choose the default backend if no specific backend was selected with the parallel_backend context manager. The default process-based backend is 'loky' and the default thread-based backend is 'threading'. Ignored if the “backend”parameter is specified.

require: 'sharedmem' or None, default None

硬约束选择backend。如果设置为'sharedmem'，即使用户要求使用parallel_backend实现非基于线程的后端,所选backend也将是single-host和thread-based的。

Hard constraint to select the backend. If set to 'sharedmem', the selected backend will be single-host and thread-based even if the user asked for a non-thread based backend with parallel_backend.

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

Python中的joblib.Parallel函数的相关文章

使用 Python 编辑 RTF 文件

也许这是一个愚蠢的问题但我不明白所以道歉我有一个 RTF 文档我想更改它例如有一个表我想复制一行并以面向对象的方式更改代码中第二行中的文本我认为 pyparsing 应该是可行的方法但我摆弄了几个小时但没有明白我没有提供
Tkinter：消息小部件中锚点选项的正确值是多少？

我一直在学习 tkinterTkinter 中的消息小部件 https python course eu tkinter message widget in tkinter php at Python 课程和教程 https python
如何调整 matplotlib 单选按钮的大小和纵横比？

我已经尝试了几个小时来使简单的单选按钮列表的大小和纵横比正确但没有成功首先导入模块 import matplotlib pyplot as plt from matplotlib widgets import RadioButtons
Python re无限执行

我正在尝试执行这段代码 import re pattern r w w s re compiled re compile pattern results re compiled search COPRO HORIZON 2000 HOR p
将 C++ 指针作为参数传递给 Cython 函数

cdef extern from Foo h cdef cppclass Bar pass cdef class PyClass cdef Bar bar def cinit self Bar b bar b 这总是会给我类似的东西 Can
检查 python 中命令行参数的数量

我是蟒蛇新手还是把脚弄湿了我正在尝试做这样的事情 import sys if len sys argv lt 3 or lt len sys argv gt 3 print This script will compare two fi
使用 Pytest 的参数化添加测试功能的描述

当其中一个测试失败时可以在测试正在测试的内容的参数化中添加描述快速了解测试失败的原因有时您不知道测试失败的原因您必须查看代码通过每个测试的描述您就可以知道例如 pytest mark parametrize num1 num2
numpy：高效执行数组的复杂重塑

我正在将供应商提供的大型二进制数组读入 2D numpy 数组 tempfid M N load data data numpy fromfile file dirname fid dtype numpy dtype i4 convert
将 pandas 剪切操作转换为常规字符串

我明白了 pandas cut 操作的输出 0 0 20 1 0 20 2 0 20 3 0 20 4 0 20 5 0 20 6 0 20 7 0 20 8 0 20 9 0 20 如何将 0 20 转换为 0 20 我正在这样做 str
是否有一个包可以维护所有带有符号的货币列表？

是否有一个 python 包提供所有或相当完整货币的列表与符号如美元的有优秀的pycountry 贪财的 https github com limist py moneyed and ccy http code google com
在Python中读取tiff标签

我正在尝试用 Python 读取 tiff 文件的标签该文件是 RGB 的uint16每个通道的值我目前正在使用tifffile import tifffile img tifffile imread file tif 然而 img是一
仅当某些值相等时，如何才能将一个文本文件中的值替换为另一个文本文件中的其他值？

我有一个名为finalscores txt我想创建一个 python 脚本它将打开它并从两个单独的列中读取值这是我的finalscores txt file Atom nVa predppm avgppm stdev delta QPr
更改QLineEdit的ClearButton图标

我想在Windows 10 1909 64位上的Python 3 8和PyQt5 5 15 0 上更改我的QLineEdit的ClearButton图标稍后我想在Linux上运行代码我尝试应用此处找到的代码如何在 QLineEdit
Python“非规范化”unicode 组合字符

我正在寻找标准化 python 中的一些 unicode 文本我想知道是否有一种简单的方法可以在 python 中获得组合 unicode 字符的非规范化形式例如如果我有序列u o xaf i e latin small lette
无法在 PyCharm 版本 9.3.3 中安装 NumPy。 Python版本3.8.2

在 PyCharm 中安装 NumPy 时出错尝试安装 Microsoft Visual C 14 0 还是行不通 NumPy 正在通过命令安装pip3 install numpy在 cmd 终端中但是当尝试将其安装在 PyCharm
如何从列表类别中对 pandas 数据框进行排序？

所以我在下面有这个数据集我想根据我的列表从名称列进行排序以及按 A 升序和按 B 降序排序 import pandas as pd import numpy as np df1 pd DataFrame from items A 1
如何通过函数注释指示函数需要函数作为参数，或返回函数？

您可以使用函数注释 http www python org dev peps pep 3107 在python 3中指示参数和返回值的类型如下所示 def myfunction name str age int gt str return
非法指令：MacOS High Sierra 上有 4 条指令

我正在尝试在 pygame 3 6 中制作一个看起来像聊天的窗口我刚刚将我的 MacBook 更新到版本 10 13 6 在我这样做之前它工作得很好但在我收到消息之后非法指令 4 Code import pygame from pyg
在matlab中，如何读取python pickle文件？

在 python 中我生成了一个 p 数据文件 pickle dump allData open myallData p wb 现在我想在Matlab中读取myallData p 我的Matlab安装在Windows 8下其中没有Pyt
Jupyter Notebook：带有小部件的交互式绘图

我正在尝试生成一个依赖于小部件的交互式绘图我遇到的问题是当我使用滑块更改参数时会在前一个绘图之后完成一个新绘图而我预计只有一个绘图会根据参数发生变化 Example from ipywidgets import interact i

随机推荐

大华SDK+JAVA+4g网络摄像头进行二次开发

前言监控 xff0c 相信大家都不陌生现在的监控技术发展迅速 xff0c 国内以海康威视为首的智能视频监控提供商也层出不穷现在 xff0c 这些提供商都已经提供了相应的SDK以及API接口 xff0c 能够很好的支撑我们进行摄像机的二
运用docker到ros2工程

此方法制作的镜像可以直接在docker容器的伪终端对ros2工程进行命令行操作 xff0c 从而运行工程基础知识 xff1a 登陆镜像仓库 sudo docker login 退出镜像仓库 sudo docker logout 查看镜像
FreeRTOS

FreeRTOS概述 FreeRTOS目录结构主要涉及2个目录 xff1a Demo Demo目录下是工程文件 xff0c 以 34 芯片和编译器 34 组合成一个名字比如 xff1a CORTEX STM32F103 Keil Sou
Octave实现一元线性回归详解及错误分析（吴恩达老师作业）

注意事项建议使用CLI xff0c 简洁明了如果函数在定义期间输入错误 xff0c 而运行期间才发现的话 xff0c 通过修改函数名 xff0c 再执行新函数代码如下首先要cd到txt文件保存的路径下才可读取txt文件 xff0c
大半夜记录一个小知识点——matlab中omitnan未定义的问题

之前做了老师留的一个实验如何用matlab清除实验数据中的脏数据 xff0c 或者说是滤波器其中有这样一组代码 xff1a a span class token operator 61 span span class token fu
【机器学习小白必备】Scikit-learn(Sklearn)最常用的函数这里都帮您总结好了~速速来取！持续更新！

Scikit learn Sklearn 常用函数详解大全在这篇文章中 xff0c 总结了sklearn模块常用的函数 xff0c 建议收藏 xff01 因为会持续更新 xff01 文章目录 Scikit learn Sklearn 常用
【总结好文】【新手避坑指南】Ubuntu18.04 轻松愉快的安装ROS并带你仿真个无人机飞行

Ubuntu18 04 ROS无人机PX4仿真环境搭建引用文章环境搭建总流程 indoor1 launch is neither a launch file 的解决办法 xff08 在最后 xff09 Gazebo模型大全下载方式无人机
Linux下CMake链接googletest【CMake编写】

源码下载 git clone https github com google benchmark git git clone https github com google googletest git benchmark googlete
Google Colab的详细使用教程—YOLOv3为例

一先哔哔几句热下身相信疫情隔离在家而且还要使用深度学习的小伙伴们或多或少都会面临么的钱买高性能显卡又么的钱租服务器的窘境下面我就来给小伙伴们介绍以下如何使用Google Colab免费服务器的故事吧 xff08 至于怎么上Google
【一文足以系列】ORB SLAM2完整详细算法流程

目录算法目的算法应用场景算法优点相关概念点内涵算法算法实现tracking线程步骤单目相机初始化成功条件单目相机初始化器初始化后续初始位姿估计跟踪局部地图关键帧的创建参考关键帧局部建图线程构建地图步骤好的地图点标准创建新地图点地图点融
IMU数学模型推导

旋转运动学质量块在body坐标系下的坐标为 xff1a body坐标系为imu坐标系惯性系为世界坐标系只考虑旋转旋转到惯性系下 xff1a 对时间求导如下 xff1a 其中对 R R R 的求导有如下推导 xff1a 其中
【IMU】【卡尔曼滤波】惯性导航误差微分方程与状态转移方程

惯性导航误差微分方程在IMU惯性导航误差分析建模中一共有三个微分方程分别对应姿态误差微分方程速度误差微分方程和位置误差微分方程姿态误差微分方程 n为东北天坐标系 b为机体坐标系这里求出二者之间的转换关系就可以求出IMU的姿态这
【IMU/VIOSLAM易混淆知识点】IMU坐标系和其他坐标系关系梳理以及SLAM中BA优化概念

IMU测量值所在坐标系 imu测得的加速度和角度是在imu系下的 imu系和body坐标系存在一个固定的RT 也可以认为imu系就是body系 SLAM中的世界坐标系世界坐标系有时候也被称为惯性系这个纯是工程师自行定义的比如说可能是第
【优秀论文解读】BoW3D: Bag of Words for Real-time Loop Closing in 3D LiDAR SLAM

论文简介本论文新颖性在于3D激光雷达中实时闭环且能够实时进行回环矫正词袋模型为BoW3D 实时构建词袋效率高但是鲁棒性未知词袋存储 word包含两种变量 xff1a Dim value为描述子计算得到的非零数和Dim ID为wo
【原文核心对照代码】【一文足以系列】A-LOAM里程计部分简短精解

前言本文将通过论文对照代码的方式阐述A LOAM这一神奇算法全文保持各个章节短小精悍的风格本文会省去一些细节 xff0c 但是了解大部分的论文和代码实现已经足够了点曲率计算与边缘点面点区分论文中通过对点云点的曲率进行如下求曲率的计
【ROS2】【源码展示】ROS2中Rviz2增加一个可以实现收发节点双击修改图表等功能的插件panel

使用说明源代码在这里 xff0c 本文基于源代码进行功能增加和修改主要应用Qt中的一些方法 xff0c 结合ros2中rviz2对增加panel功能的一些封装实现双击修改图表中的内容 xff0c 节点的收发 xff0c 图表根据收到的
【持续更新篇】SLAM视觉特征点汇总+ORB特征点+VINS前端

Harris角点 opencv函数 cornerHarris提取输入图像的Harris角点检测原理检测思想 xff1a 使用一个固定窗口在图像上进行任意方向的滑动 xff0c 对比滑动前后的窗口中的像素灰度变化程度 xff0c 如果存在
在Ubuntu上安装samba服务器

文章目录在Ubuntu上安装samba服务器为什么要使用SambaSamba创建一个共享目录安装Samba服务器配置Samba服务器开启Samba服务器添加samba用户在window上共享Linux目录参考在Ubuntu上安装samb
51单片机点亮LED和使用定时器中断控制蜂鸣器发声

初学51单片机第一个实验一般都是以点亮LED灯开始 xff0c 以下是使用Proteus仿真软件的实验效果实验需要仿真仪器 xff1a 示波器 LED 电压表蜂鸣器 AT89C51 实验电路图 xff1a 实验代码 xff1a incl
Python中的joblib.Parallel函数

Joblib是一个可以将Python代码转换为并行计算模式的包 xff0c 可以大大简化我们写并行计算代码的步骤我们可以通过操作该包内的函数来实现目标代码的并行计算 xff0c 从而提高代码运行效率下面举一个简单的例子来说明 xff1a

Python中的joblib.Parallel函数

Python中的joblib.Parallel函数 的相关文章

随机推荐

热门标签

Python中的joblib.Parallel函数的相关文章