Python 多处理：类型错误：new() 缺少 1 个必需的位置参数：'path'

2024-02-28

我目前正在尝试使用 joblib 库和多处理后端在 python 3.5 中运行并行进程。但是，每次运行时我都会收到此错误：

Process ForkServerPoolWorker-5:
Traceback (most recent call last):
  File "/opt/anaconda3/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/opt/anaconda3/lib/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/anaconda3/lib/python3.5/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/opt/anaconda3/lib/python3.5/site-packages/joblib/pool.py", line 362, in get
    return recv()
  File "/opt/anaconda3/lib/python3.5/multiprocessing/connection.py", line 251, in recv
    return ForkingPickler.loads(buf.getbuffer())
TypeError: __new__() missing 1 required positional argument: 'path'

这是我用来运行它的 joblib 代码：

from joblib import Parallel, delayed
results = Parallel(n_jobs=6) (
             delayed(func)(i) for i in array)

默认情况下，后端是多处理的。当我将后端更改为“线程”时，代码运行良好，但对于我的用例来说，与使用多处理相比，线程效率低下。

我也尝试过直接使用以下代码使用多处理，但仍然遇到相同的错误：

from multiprocessing import Pool
with Pool(5) as p:
    results = p.map(func, array)

EDIT：这是我如何尝试使用 joblib 的一个较小示例。我的实现将 joblib 函数包装在类函数内。下面的代码有效，但是当我使用实际数据运行相同的函数时，出现上述错误。如果使用大量数据，下面的函数会导致问题吗？

import numpy as np
import pandas as pd
from joblib import Parallel, delayed
import multiprocessing

class ABTest(object):


    def __init__(self, df, test_comps, test_name):
        self.df = df
        self.test_comps = test_comps
        self.test_name = test_name

        self.variant_dict = {}
        for variant in self.df['test_variant'].unique():
            self.variant_dict[variant] = self.df[self.df['test_variant'] == variant]


        self.test_cols = ['clicks', 'opens', 'buys']
        self.cpu_count = multiprocessing.cpu_count()


    def bootstrap_ci(self, arguments):
        '''
        Finds the confidence interval for the difference in means for
        two test groups using bootstrap

        In: self
            arugments (list) - A list with elements [test_comp, cols], where test_comp
                                is a tuple of two test variants and cols is a list
                                of columns to bootstrap means for.  A single argument
                                must be used for parallel processing
        Out: results (matrix) - confidence interval information for the difference
                        in means of the two groups
        Creates: None
        Modifies: None
        '''
        test_comp = arguments[0]
        cols = arguments[1]

        test_a_df = self.variant_dict[test_comp[0]]
        test_b_df = self.variant_dict[test_comp[1]]

        results = []

        print('Getting Confidence Intervals for Test Groups: {}, {}...'.format(test_comp[0], test_comp[1]))

        for col in cols:
            test_a_sample_mean = []
            test_b_sample_mean = []

            test_a_len = test_a_df.shape[0]
            test_b_len = test_b_df.shape[0]

            for j in range(5000):
                # Get sample means for both test variants
                test_a_bs_mean = test_a_df[col].sample(n=test_a_len, replace=True).mean()
                test_a_sample_mean.append(test_a_bs_mean)

                test_b_bs_mean = test_b_df[col].sample(n=test_b_len, replace=True).mean()
                test_b_sample_mean.append(test_b_bs_mean)

            test_a_s = pd.Series(test_a_sample_mean)
            test_b_s = pd.Series(test_b_sample_mean)

            # Gets confidence interval for the difference in distribution of means
            test_diffs = test_b_s-test_a_s
            z = test_diffs.quantile([.025, 0.05, 0.5, 0.95, 0.975])

            results.append([self.test_name, test_comp[0], test_comp[1], col, z.iloc[0], z.iloc[1], z.iloc[2], z.iloc[3], z.iloc[4]])

        return results


    def run_parallel(self, func, array):
        '''
        Runs a function (func) on each item in array and returns the results

        In:
            func (function that takes one argument) - the function to run in parallel
            array (list or array-like object) - the array to iterate over
        Out: results (list) - The results of running each item in array through func
        Creates: None
        Modifies: None
        '''
        # Never uses more than 6 cores
        n_jobs = min(self.cpu_count - 1, 6)
        results = Parallel(n_jobs=n_jobs) ( \
                            delayed(func) (i) for i in array)

        return results


    def confidence_intervals(self):
        results = self.run_parallel(self.bootstrap_ci, [(x, self.test_cols) for x in self.test_comps])

        results = np.array([y for x in results for y in x])

        return results

if __name__ == '__main__':
    columns = ['id', 'test_variant', 'clicks', 'opens', 'buys']
    data = [[0, 'control', 10, 60, 2], \
            [1, 'test_1', 5, 50, 1], \
            [2, 'test_2', 11, 50, 3], \
            [3, 'control', 8, 55, 1], \
            [4, 'test_1', 5, 40, 0], \
            [5, 'test_2', 15, 100, 5], \
            [6, 'control', 2, 30, 0], \
            [7, 'test_1', 1, 60, 1], \
            [8, 'test_2', 11, 50, 3], \
            [9, 'control', 10, 60, 2], \
            [10, 'test_1', 5, 50, 1], \
            [11, 'test_2', 11, 50, 3], \
            [12, 'control', 10, 60, 2], \
            [13, 'test_1', 5, 50, 1], \
            [14, 'test_2', 11, 50, 3], \
            [15, 'control', 10, 60, 2], \
            [16, 'test_1', 5, 50, 1], \
            [17, 'test_2', 11, 50, 3], \
            [18, 'control', 10, 60, 2], \
            [19, 'test_1', 5, 50, 1], \
            [20, 'test_2', 11, 50, 3]]


    df = pd.DataFrame(data, columns=columns)
    test_comps = [['control', 'test_1'], ['control', 'test_2'], ['test_1', 'test_2']]

    ab = ABTest(df, test_comps, test_name='test')
    results = ab.confidence_intervals()
    print(results)

None

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

python

RuntimeError

pythonmultithreading

pythonmultiprocessing

joblib

Python 多处理：类型错误：new() 缺少 1 个必需的位置参数：'path' 的相关文章

多处理与 gevent

目前我正在使用带有发布订阅模式的 Zeromq 我有一个要发布的工作人员和许多 8 个订阅者所有人都会订阅相同的模式现在我尝试使用多处理来生成订阅者它可以工作我错过了一些消息我使用多重处理的原因是在每条消息到达时对其进行处理
Spark 中的广播 Annoy 对象（对于最近邻居）？

由于 Spark 的 mllib 没有最近邻居功能我正在尝试使用Annoy https github com spotify annoy为近似最近邻我尝试广播 Annoy 对象并将其传递给工人然而它并没有按预期运行下面是可重复性的
我知道 scipy curve_fit 可以做得更好

我使用 python numpy scipy 来实现此算法用于根据地形坡向和坡度对齐两个数字高程模型 DEM 用于量化冰川厚度变化的卫星高程数据集的联合配准和偏差校正 C Nuth 和 A K b doi 10 5194 tc 5 271
在 python 中读取具有恶意字节 0xc0 的文件，导致 utf-8 和 ascii 出错

尝试将制表符分隔的文件读入 pandas 数据帧 gt gt gt df pd read table fn na filter False error bad lines False 它会出错如下所示 b Skipping line 58
Python：球体的交集

我对编程非常陌生但我决定承担一个有趣的项目因为我最近学会了如何以参数形式表示球体当三个球体相交时有两个不同的交点除非它们仅在一个奇点处重叠球体的参数表示我的代码是根据答案修改的Python matplotlib 绘制 3d 立
在Python中获取目录基名的优雅方法？

我有几个脚本将目录名称作为输入并且我的程序在这些目录中创建文件有时我想获取给程序的目录的基本名称并用它在目录中创建各种文件例如 directory name given by user via command line output
在散景中隐藏轴

如何在散景图中隐藏 x 轴和 y 轴我已经根据此进行了检查和尝试 p1 figure visible None p1 select type Axis visible 0 xaxis Axis plot p1 visible 0 和喜欢h
matplotlib pyplot：子图大小

如果我绘制如下所示的单个图它将具有 x y 大小 import matplotlib pyplot as plt plt plot 1 2 1 2 但是如果我在同一行中绘制 3 个子图则每个子图的大小均为 x 3 y fig ax p
如何以最大窗口形式保存 matplotlib 图而不是默认大小？

有人知道我应该如何解决这个问题吗我知道有一个保存按钮我可以手动执行此操作但我正在绘制 100 多个图表所以我希望有一种方法可以自动执行此操作我正在使用 TkAgg 后端并寻找任何可能的解决方案通过在我的绘图函数末尾使用以下内容
Python：如何使用 struct.pack_into 将不同类型的数据打包到字符串缓冲区中

我正在尝试将一些无符号 int 数据打包到使用创建的字符串缓冲区中ctypes create string buffer 这是以下代码段以及显示错误的运行示例在键盘上 http codepad org S8nUWMcW import st
使用 nditer 进行浅层迭代

我有这样一个数组 gt gt gt y np random randint 0 255 2 2 3 gt gt gt array 242 14 211 198 7 0 235 60 81 164 64 236 我必须迭代每个triplet元
pandas：如何将嵌套 JSON 解包为数据帧？

我有这样的 JSON 输出 json json SeriousDlqin2yrs prediction 0 prediction probs 0 0 95 1 0 04 SeriousDlqin2yrs prediction 0 predi
在组织内部分发我的 python 模块

我用 python 制作了一些模块我想将它们分发到我的组织内这些模块已经存储在BitBucket中例如有什么方法可以使用 pip install 来分发它们吗正确的方法是什么您可以从 GitHub 进行 pip 安装并且应该能
填充 MultiIndex Pandas Dataframe 中的日期空白

我想修改 pandas MultiIndex DataFrame 以便每个索引组都包含指定范围内的日期我希望每个组都用值 0 或NaN Group A Group B Date Value loc a group a 2013 06 11
Python 类：通过传递值实现单例还是非单例？

我有一个 Python 3 类目前是使用 a 定义的单例 singleton装饰器但有时需要not成为单身人士问题是否可以在从类实例化对象时执行类似于传递参数的操作并且该参数确定该类是否是单例我试图找到一种替代方法来复制类并使其
如何在给定目标大小的情况下在 python 中调整图像大小，同时保留纵横比？

首先我觉得这是一个愚蠢的问题对此感到抱歉目前我发现计算最佳缩放因子目标像素数的最佳宽度和高度同时保留纵横比的最准确方法是迭代并选择最佳缩放因子但是必须有更好的方法来做到这一点一个例子 import cv2 numpy as
从Python中的URL中提取域[重复]

这个问题在这里已经有答案了我有一个像这样的网址 http abc hostname com somethings anything 我想得到 hostname com 我可以使用什么模块来完成此任务我想在python2中使用相同的模块和
使用 python 提取 MP3 URL 的 ID3 标签并进行部分下载

我需要提取远程 mp3 文件的 ID3 标签和元数据我写了几行可以获取本地文件的ID3标签 from mutagen mp3 import MP3 import urllib2 audio MP3 Whistle mp3 songtitl
在 Pandas 中按索引分组

如何使用 groupby by 索引 1 2 3 它们的顺序相同并获得属于每个索引范围的列分数的总和基本上我有这个 index score 1 2 2 2 3 2 1 3 2 3 3 3 我想要的是 index score sum 1
使用 PyDrive 将图像上传到 Google Drive

我有一个关于 PyDrive 的愚蠢问题我尝试使用 FastAPI 制作一个 REST API 它将使用 PyDrive 将图像上传到 Google Drive 这是我的代码 from fastapi import FastAPI Fil

随机推荐

GIDSignIn 钥匙串错误 iOS 10 Xcode 8

在 iOS 10 和 xcode 8 中当我尝试登录 google 服务时我得到钥匙串错误 func sign signIn GIDSignIn didSignInFor user GIDGoogleUser withError er
从 jQuery UI 对话框内部的元素关闭它？

这是一个简单的问题可能比我想象的要简单我正在使用 ajax 调用生成的 html 填充 jQuery UI 对话框在某些情况下 html 包含一个按钮单击该按钮时我想关闭包含的对话框假设我对指定为对话框的元素一无所知 eleme
从 Drools 6 中的数据库加载和更新规则

如何在启动时从数据库表加载规则并从 Drools 6 2 0 中的同一个表更新它们我找到了一个example http sujitpal blogspot com 2013 03 jboss rules in database take
如何在 Mac 上从 ifconfig 获取格式为“接口：IP 地址”的输出

我试图从 ifconfig 中获取以下格式化输出 en0 10 52 30 105 en1 10 52 164 63 我至少能够弄清楚如何使用以下命令获取 IP 地址淘汰 localhost 但这不足以满足我的要求 ifconfig gr
JSON @属性

我很难理解如何读取包含 attributes 的 JSON 对象 JavaScript ajax type GET dataType json url http script weather php r req success functi
如何使用正则表达式过滤字符串中不需要的字符？

基本上我想知道是否有一个方便的类或方法来过滤字符串中不需要的字符该方法的输出应该是已清理的字符串 IE String dirtyString This contains spaces which are not allowed St
如何从 Pl/SQL 写入文本文件，PLS 错误 00363

我正在尝试从过程写入文件 out File Utl File FOpen C test batotest txt W Utl File Put Line out file Hi this is text file Utl File FClo
从 powershell 运行 SQL 脚本文件

我正在尝试从 PowerShell 运行存储在文本文件中的查询我使用以下方法来做到这一点 Invoke Expression sqlcmd d TestDB U user P pw i E SQLQuery1 sql 如果在执行查询时发生
equals 方法未在定义类的对象上使用[重复]

这个问题在这里已经有答案了抱歉已经很晚了所以我可能无法解释所有细节但我一直在研究这个问题但我无法理解为什么数组中的对象 Item 对象引用不使用它所给出的 Item 类的 equals 方法我检查了函数内两个 Item 对象的类
更有效地查找和压缩数百万个文件

我的服务器上有一个作业在命令行提示符下运行了两天 find data name filepattern 2009 exec tar uf 2009 tar 它正在采取forever 然后还有一些是的目标目录中有数百万个文件在经过良好哈
如何将完整的文件夹结构上传到 Artifactory 存储库到新文件夹并保持文件夹名称不变？

我是自动化新手尝试将整个文件夹结构以及父文件夹和子文件夹上传到 Artifactory 存储库结构如下 test1 文件夹包含子文件夹 new ref 还包含子文件夹 gt gt v1 new data1 还包含子文件夹 gt gt v
什么时候应该在JavaScript中使用outerHTML？ [关闭]

Closed 这个问题需要多问focused help closed questions 目前不接受答案何时应使用innerHTML 和outerHTML 有什么区别您将如何最好地实现outerHTML 来替换或添加内容 externa
从 Vector 生成类数据成员

请考虑这个 C 问题 include
添加服务失败。服务元数据可能无法访问。确保您的服务正在运行并公开元数据。

EDIT 我修改后web config我没有收到错误这很好然后我添加一个新页面 html 并编写这个小代码来使用服务如下所示 btn12 click function event getJSON http localhost 3576
String.replaceAll() 对某些字符串不起作用

我正在编辑一些从 tesseract ocr 收到的电子邮件这是我的代码 if email null email email replaceAll email email replaceAll caneer career email em
如何更改 PyCharm 中的终端字体颜色？

我想设置我的终端配色方案这样我将在浅色背景上显示黑色文本例如白色或浅黄色我在编辑器 gt 颜色和字体 gt 控制台颜色下更改了控制台颜色设置例如背景标准输出和系统输出但我仍然遇到同样的问题如果我将背景颜色更改为白色它还会
如何像Excel拖动一样基于模式填充pandas数据框？

我有数据框应该通过理解行来填充它就像我们在 Excel 中所做的那样如果是连续整数则由下一个数字本身填充 python中有这样的函数吗 import pandas as pd d year 2019 2020 2019 2020 n
具有 CORS 的跨域 REST/Jersey Web 服务

我想使用 CORS 跨源资源共享制作跨域 REST Web 服务我正在使用泽西图书馆提供服务我需要知道从服务器端的角度来看我需要进行哪些代码配置更改如何从 HTML5 js 调用此服务 Thanks 服务器端配置的所有信息都可
将 Reporting Services 报表导出到 Excel 时抑制分页符

将多页报表从 SQL Server 2008 Reporting Services 导出到 Excel 时默认情况下报表中由分页符创建的页面将发送到 Excel 文件中的单独工作表虽然这在大多数情况下都很好但有时却不然现在我希望
Python 多处理：类型错误：__new__() 缺少 1 个必需的位置参数：'path'

我目前正在尝试使用 joblib 库和多处理后端在 python 3 5 中运行并行进程但是每次运行时我都会收到此错误 Process ForkServerPoolWorker 5 Traceback most recent call

Python 多处理：类型错误：__new__() 缺少 1 个必需的位置参数：'path'

Python 多处理：类型错误：__new__() 缺少 1 个必需的位置参数：'path' 的相关文章

随机推荐

热门标签

Python 多处理：类型错误：new() 缺少 1 个必需的位置参数：'path'

Python 多处理：类型错误：new() 缺少 1 个必需的位置参数：'path' 的相关文章