Python 多处理

2023-12-14

我有一个包含二进制编码字符串的大列表，我之前曾在单个函数中处理过这些字符串，如下所示：

""" just included this to demonstrate the 'data' structure """
data=np.zeros(250,dtype='float32, (250000,2)float32')

def func numpy_array(data, peaks):
rt_counter=0
    for x in peaks:
        if rt_counter %(len(peaks)/20) == 0:
            update_progress()
        peak_counter=0
        data_buff=base64.b64decode(x)
        buff_size=len(data_buff)/4
        unpack_format=">%dL" % buff_size
        index=0
        for y in struct.unpack(unpack_format,data_buff):
            buff1=struct.pack("I",y)
            buff2=struct.unpack("f",buff1)[0]
            if (index % 2 == 0):
                data[rt_counter][1][peak_counter][0]=float(buff2)
            else:
                data[rt_counter][1][peak_counter][1]=float(buff2)
                peak_counter+=1
            index+=1
        rt_counter+=1

我一直在阅读有关多处理的内容，并认为我想尝试一下，看看是否可以大幅提高性能，我将我的函数重写为 2（帮助程序和“调用程序”），如下所示：

def numpy_array(data, peaks):
    processors=mp.cpu_count #Might as well throw this directly in the mp.Pool (just for clarity for now)
    pool = mp.Pool(processes=processors)
    chunk_size=len(peaks)/processors
    for i in range(processors):
        counter = i*chunk_size
        chunk=peaks[i*chunk_size:(i+1)*chunk_size-1]
        pool.map(decode(data,chunk,counter))

def decode(data,chunk,counter):
    for x in chunk:
        peak_counter=0
        data_buff=base64.b64decode(x)
        buff_size=len(data_buff)/4
        unpack_format=">%dL" % buff_size
        index=0
        for y in struct.unpack(unpack_format,data_buff):
            buff1=struct.pack("I",y)
            buff2=struct.unpack("f",buff1)[0]
            if (index % 2 == 0):
                data[counter][1][peak_counter][0]=float(buff2)
            else:
                data[counter][1][peak_counter][1]=float(buff2)
                peak_counter+=1
            index+=1
        print data[counter][1][10][0]
        counter+=1

该程序运行但仅使用 100-110% 的 CPU（根据 top），一旦完成就会抛出TypeError: map() takes at least 3 arguments (2 given)对我来说，任何对多进程有更多经验的人都可以给我一个提示，告诉我要注意哪些事情（这可能会导致 TypeError）？是什么原因导致我的 cpu 使用率低？

--合并答案后的代码--

def decode((data,chunk,counter)):
    print len(chunk), counter
    for x in chunk:
        peak_counter=0
        data_buff=base64.b64decode(x)
        buff_size=len(data_buff)/4
        unpack_format=">%dL" % buff_size
        index=0
        for y in struct.unpack(unpack_format,data_buff):
            buff1=struct.pack("I",y)
            buff2=struct.unpack("f",buff1)[0]
            if (index % 2 == 0):
                data[counter][1][peak_counter][0]=float(buff2)
            else:
                data[counter][1][peak_counter][1]=float(buff2)
                peak_counter+=1
            index+=1
        counter+=1

def numpy_array(data, peaks):
    """Fills the NumPy array 'data' with m/z-intensity values acquired
    from b64 decoding and unpacking the binary string read from the 
    mzXML file, which is stored in the list 'peaks'.

    The m/z values are assumed to be ordered without validating this
    assumption.

    Note: This function uses multi-processing
    """
    processors=mp.cpu_count()
    pool = mp.Pool(processes=processors)
    chunk_size=int(len(peaks)/processors)
    map_parameters=[]
    for i in range(processors):
        counter = i*chunk_size
        chunk=peaks[i*chunk_size:(i+1)*chunk_size-1]
        map_parameters.append((data,chunk,counter))
    pool.map(decode,map_parameters)

到目前为止，这个最新版本“有效”，它填充了进程中的数组（其中数组包含值），但是一旦所有进程完成访问数组，只会产生零值，因为每个进程都会获取数组的本地副本。

像这样的东西应该有效

注意pool.map每次调用都采用一个函数和该函数的参数列表。在你原来的例子中，你只是在numpy_array功能。

该函数必须只有一个参数，因此将参数打包到一个元组中，并且使用看起来相当奇怪的双括号decode（这称为元组拆包）。

def numpy_array(data, peaks):
    processors=4
    pool = mp.Pool(processes=processors)
    chunk_size=len(data)/processors
    print range(processors)
    map_parameters = [] # new
    for i in range(processors):
        counter = i*chunk_size
        chunk=peaks[i*chunk_size:(i+1)*chunk_size-1]
        map_parameters.append((data,chunk,counter)) # new
    pool.map(decode, map_parameters) # new

def decode((data,chunk,counter)): # changed
    for x in chunk:
        peak_counter=0
        data_buff=base64.b64decode(x)
        buff_size=len(data_buff)/4
        unpack_format=">%dL" % buff_size
        index=0
        for y in struct.unpack(unpack_format,data_buff):
            buff1=struct.pack("I",y)
            buff2=struct.unpack("f",buff1)[0]
            if (index % 2 == 0):
                data[counter][1][peak_counter][0]=float(buff2)
            else:
                data[counter][1][peak_counter][1]=float(buff2)
                peak_counter+=1
            index+=1
        print data[counter][1][10][0]
        counter+=1

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

python

multiprocessing

Python 多处理的相关文章

Python setuptools：如何在 setup.py 中添加私有存储库 (gitlab)？

我上传了 2 个包它们位于我的 gitlab 存储库中如果我想使用 pip 将它们安装在我的系统中这很容易因为 gitlab 可以帮助您 https docs gitlab com ee user packages pypi rep
Python 中的字节数组

如何在 Python 中表示字节数组如 Java 中的 byte 我需要用 gevent 通过网络发送它 byte key 0x13 0x00 0x00 0x00 0x08 0x00 在Python 3中我们使用bytes对象也称为s
Flask+Nginx+uWSGI：导入错误：没有名为站点的模块

我安装为http www reinbach com uwsgi nginx flask virtualenv mac os x html http www reinbach com uwsgi nginx flask virtualenv
切片稀疏（scipy）矩阵

我将不胜感激任何帮助以理解从 scipy sparse 包中切片 lil matrix A 时的以下行为实际上我想根据行和列的任意索引列表提取子矩阵当我使用这两行代码时 x1 A list 1 x2 x1 list 2 一切都很好
使用 Django Rest 保存 Base64ImageField 类型会将其保存为原始图像。如何将其转换为普通图像

我的模型中有 5 个图像字段 imageS imageS imageS imageS 和 imageE 我正在尝试按以下方式保存图像图像的类型Base64ImageField images imageA imageB imageC ima
查找模块中显式定义的函数 (python)

好的我知道您可以使用 dir 方法列出模块中的所有内容但是有什么方法可以仅查看该模块中定义的函数吗例如假设我的模块如下所示 from datetime import date datetime def test return Thi
当我在 Pandas 中使用 df.corr 时，我的一些列丢失了

这是我的代码 import numpy as np import pandas as pd import seaborn as sns import matplotlib pyplot as plt data pd read csv dea
更改 python tkinter canvas 中的线坐标

我画了一条线tkinter Canvas现在我想移动一端这可能吗例如和itemconfig import tkinter tk tkinter Tk canvas tkinter Canvas tk canvas pack line c
WindowsError：[错误 126] 使用 ctypes 加载操作系统时

python代码无法在Windows 7平台上运行 def libSO lib ctypes cdll LoadLibrary ConsoleApplication2 so lib cfoo2 1 3 当我尝试运行它时得到来自python
PySide6.1 与 matplotlib 3.4 不兼容

当我只安装PySide6时 GUI程序运行良好但是一旦我安装了matplotlib及其依赖包包括pyqt5 则GUI程序将无法运行并输出以下错误消息 This application failed to start because no
Python多处理错误“ForkAwareLocal”对象没有属性“连接”

下面是我的代码我面临着多处理问题我看到这个问题之前已经被问过我已经尝试过这些解决方案但它似乎不起作用有人可以帮我吗 from multiprocessing import Pool Manager Class X def init
如何使用 sys.path.append 在 Python 中导入文件？

我的桌面上有两个目录 DIR1 and DIR2其中包含以下文件 DIR1 file1 py DIR2 file2 py myfile txt 这些文件包含以下内容 file1 py import sys sys path append s
在 Sphinx 中，有没有办法在声明参数的同时记录参数？

我更喜欢在声明参数的同一行记录每个参数根据需要以便应用D R Y http en wikipedia org wiki Don t repeat yourself 如果我有这样的代码 def foo flab nickers a ser
处理大文件的最快方法？

我有多个 3 GB 制表符分隔文件每个文件中有 2000 万行所有行都必须独立处理任何两行之间没有关系我的问题是什么会更快逐行阅读 with open as infile for line in infile 将文件分块读入内存
如何在 robobrowser-python 中发出 POST 请求

http robobrowser readthedocs org en latest api html http robobrowser readthedocs org en latest api html 我正在尝试使用 APIbrows
Django 模型：如何使用 mixin 类来覆盖 django 模型以实现 save 等功能

我想在每次保存模型之前验证值所以我必须重写保存函数代码几乎是一样的我想把它写在 mixin 类中但失败了我不知道如何写 super func 我英语不好抱歉 class SyncableMixin object def sav
如何循环遍历字典列表并打印特定键的值？

我是 Python 新手有一个问题我知道这是一个非常简单的问题运行Python 3 4 我有一个需要迭代并提取特定信息的列表以下是列表称为部分的示例已截断数千个项目 state DEAD id phwl type name
长/宽数据到宽/长

我有一个数据框如下所示 import pandas as pd d decil 1 decil 1 decil 2 decil 2 decil 3 decil 3 decil kommune AA BB AA BB AA BB 2010
使用 urllib 编码时保持 url 参数有序

我正在尝试用 python 模拟 get 请求我有一个参数字典并使用 urllib urlencode 对它们进行 urlencode 我注意到虽然字典的形式是 k1 v1 k2 v2 k3 v3 urlencoding 后参数的顺序切
缓存 Flask-登录 user_loader

我有这个 login manager user loader def load user id None return User query get id 在我引入 Flask Principal 之前它运行得很好 identity loa

随机推荐

如何从 qdateEdit 获取用户输入并从 postgres 的数据库中选择它

我想知道如何在 QDateEdit 中获取用户输入并在 postgres 的表中选择它这是我的代码 def date self try date self dateEdit date print date conn psycopg2 co
如何将 std::sort 与结构向量和比较函数一起使用？

谢谢你的C 中的解决方案现在我想使用 std sort 和向量在 C 中实现这一点 typedef struct double x double y double alfa pkt vector lt pkt gt wektor 使用pu
如何在R中找到超过10个变量的第二、第三和第n最大行？

我有一个包含 20 个变量的数据集我需要使用其中的 10 个变量来查找第一个第二个第三个第 n 个最大值变量是x1 to x10 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 1 2 0 3 4 5 6 7 8 5
全局临时表 - SQL Server 与 Oracle

我使用 Oracle 11g 全局临时表因为我需要一个解决方案可以将行添加到临时表中以进行联接并且我只希望添加到临时表中以包含 Oracle 连接会话的行我在 Oracle 中使用全局临时表因为我希望该表存在于会话之间这样就不
检查每个进程和子进程的内存

我试图创建一个脚本来显示 mysqld 的每个进程和子进程的使用量您可以在我的代码中看到我做了什么 bin bash file contains the output of pstree mysql a p awk print 1 sed
Astar 可以多次访问节点吗？

我一直在阅读维基百科的 Astararticle 在他们的实现中他们检查每个节点是否在closed设置如果是这样他们会跳过它是不是有可能如果启发式是可以接受的但是NOT一致我们可能需要重新访问一个节点两次或更多次才能改进它
注册 Office.EventType.ItemChanged 时 Outlook WebAddin 引发内部服务器错误

在我的 Outlook WebAddin 中我尝试使用以下代码注册邮件 ItemChange 事件 Office context mailbox addHandlerAsync Office EventType ItemChanged m
cygwin pthread_mutex_timedlock 代理

不幸的是 cygwin GCC 4 5 3 pthread 库实现不支持 POSIX 标准函数 int pthread mutex timedlock pthread mutex t mutex struct timespec abstim
Colorbox 和通过 ajax 返回的内容

我正在使用 jquery colorbox 在窗口中弹出用户帐户我还有一个按钮可以通过 ajax 将更多用户加载到页面中但由于某种原因使用 ajax 加载的用户不会在彩盒窗口中弹出如何让 colorbox 处理通过 ajax 返回
使用 BOOST_FUSION_ADAPT_ADT 调整类时出错

我有以下课程 ifndef WFRACTAL FRACTAL METADATA H define WFRACTAL FRACTAL METADATA H include
python - Django内置登录视图不重定向到下一个

我正在使用 django auth 视图进行身份验证但成功登录后它应该尝试将用户重定向到下一个 GET 参数但它仅重定向到 LOGIN REDIRECT URL 这是我的网址 url r login auth views login
Windows Phone 8.1文本框字符虚拟键验证

我正在开发 Windows Phone 8 1 应用程序在文本框中我想阻止用户仅输入任何非数字字母 0 9 所以这是我的代码 private void NumKeyDown object sender KeyRoutedEventArg
使用 python 登录网站

我正在尝试登录此page使用Python 这是我的代码 from urllib2 import urlopen from bs4 import BeautifulSoup import requests import sys URL htt
mvn 编译错误：打开 zip 文件时读取 jar 错误

我有一个具有这些属性的 x 模块模块名称 x datamodel 这是 pom xml 的一部分
枚举单例如何发挥作用？

以前我不使用枚举而是这样做 public static ExampleClass instance public ExampleClass instance this public static ExampleClass getInsta
反向工程 HTTP 请求

我拦截了 Charles 上从 iPhone 到 Instagram 的 HTTP 请求以下是标头 POST logging client events HTTP 1 1 Host graph instagram com Content
使用 javascript 添加 ASP.NET 控件

我想添加一个ASP label and ASP textbox通过Javascript控制页面
使用树形图将嵌套单元格绘制为树：MATLAB

我有一个代表树结构的复杂单元格 CellArray 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 我想用它来绘制代表树treeplot p 但我不知道如何构造数组
当元素位于视口中时重新启动计数器动画

此代码使计数器动画在视图中启动但我希望它在滚动到视图之外然后再次进入视图时重新启动似乎无法解决如果您想在此处查看实时链接向下滚动到页脚之前的底部 https easyrecycle dk Serviceomraader html
Python 多处理

我有一个包含二进制编码字符串的大列表我之前曾在单个函数中处理过这些字符串如下所示 just included this to demonstrate the data structure data np zeros 250 dtype

Python 多处理

Python 多处理 的相关文章

随机推荐

热门标签

Python 多处理的相关文章