在 Windows 上使用多重处理时出现“无法 pickle ”错误

2023-11-22

我正在编写一个多处理程序来使用 Windows 并行处理大型 .CSV 文件。

I found 这个很好的例子对于类似的问题。在 Windows 下运行它时，我收到一条错误，指出 csv.reader 不可 Picklable。

我想我可以在阅读器子进程中打开 CSV 文件，然后将文件名从父进程发送给它。但是，我想传递一个已经打开的 CSV 文件（就像代码应该做的那样），具有特定的状态，即真正使用共享对象。

知道如何在 Windows 下做到这一点或者那里缺少什么吗？

这是代码（为了便于阅读，我重新发布）：

"""A program that reads integer values from a CSV file and writes out their
sums to another CSV file, using multiple processes if desired.
"""

import csv
import multiprocessing
import optparse
import sys

NUM_PROCS = multiprocessing.cpu_count()

def make_cli_parser():
    """Make the command line interface parser."""
    usage = "\n\n".join(["python %prog INPUT_CSV OUTPUT_CSV",
            __doc__,
            """
ARGUMENTS:
    INPUT_CSV: an input CSV file with rows of numbers
    OUTPUT_CSV: an output file that will contain the sums\
"""])
    cli_parser = optparse.OptionParser(usage)
    cli_parser.add_option('-n', '--numprocs', type='int',
            default=NUM_PROCS,
            help="Number of processes to launch [DEFAULT: %default]")
    return cli_parser

class CSVWorker(object):
    def __init__(self, numprocs, infile, outfile):
        self.numprocs = numprocs
        self.infile = open(infile)
        self.outfile = outfile
        self.in_csvfile = csv.reader(self.infile)
        self.inq = multiprocessing.Queue()
        self.outq = multiprocessing.Queue()

        self.pin = multiprocessing.Process(target=self.parse_input_csv, args=())
        self.pout = multiprocessing.Process(target=self.write_output_csv, args=())
        self.ps = [ multiprocessing.Process(target=self.sum_row, args=())
                        for i in range(self.numprocs)]

        self.pin.start()
        self.pout.start()
        for p in self.ps:
            p.start()

        self.pin.join()
        i = 0
        for p in self.ps:
            p.join()
            print "Done", i
            i += 1

        self.pout.join()
        self.infile.close()

    def parse_input_csv(self):
            """Parses the input CSV and yields tuples with the index of the row
            as the first element, and the integers of the row as the second
            element.

            The index is zero-index based.

            The data is then sent over inqueue for the workers to do their
            thing.  At the end the input thread sends a 'STOP' message for each
            worker.
            """
            for i, row in enumerate(self.in_csvfile):
                row = [ int(entry) for entry in row ]
                self.inq.put( (i, row) )

            for i in range(self.numprocs):
                self.inq.put("STOP")

    def sum_row(self):
        """
        Workers. Consume inq and produce answers on outq
        """
        tot = 0
        for i, row in iter(self.inq.get, "STOP"):
                self.outq.put( (i, sum(row)) )
        self.outq.put("STOP")

    def write_output_csv(self):
        """
        Open outgoing csv file then start reading outq for answers
        Since I chose to make sure output was synchronized to the input there
        is some extra goodies to do that.

        Obviously your input has the original row number so this is not
        required.
        """
        cur = 0
        stop = 0
        buffer = {}
        # For some reason csv.writer works badly across threads so open/close
        # and use it all in the same thread or else you'll have the last
        # several rows missing
        outfile = open(self.outfile, "w")
        self.out_csvfile = csv.writer(outfile)

        #Keep running until we see numprocs STOP messages
        for works in range(self.numprocs):
            for i, val in iter(self.outq.get, "STOP"):
                # verify rows are in order, if not save in buffer
                if i != cur:
                    buffer[i] = val
                else:
                    #if yes are write it out and make sure no waiting rows exist
                    self.out_csvfile.writerow( [i, val] )
                    cur += 1
                    while cur in buffer:
                        self.out_csvfile.writerow([ cur, buffer[cur] ])
                        del buffer[cur]
                        cur += 1

        outfile.close()

def main(argv):
    cli_parser = make_cli_parser()
    opts, args = cli_parser.parse_args(argv)
    if len(args) != 2:
        cli_parser.error("Please provide an input file and output file.")

    c = CSVWorker(opts.numprocs, args[0], args[1])

if __name__ == '__main__':
    main(sys.argv[1:])

在Windows下运行时，这是我收到的错误：

Traceback (most recent call last):
  File "C:\Users\ron.berman\Documents\Attribution\ubrShapley\test.py", line 130, in <module>
    main(sys.argv[1:])
  File "C:\Users\ron.berman\Documents\Attribution\ubrShapley\test.py", line 127, in main
    c = CSVWorker(opts.numprocs, args[0], args[1])
  File "C:\Users\ron.berman\Documents\Attribution\ubrShapley\test.py", line 44, in __init__
    self.pin.start()
  File "C:\Python27\lib\multiprocessing\process.py", line 130, in start
    self._popen = Popen(self)
  File "C:\Python27\lib\multiprocessing\forking.py", line 271, in __init__
    dump(process_obj, to_child, HIGHEST_PROTOCOL)
  File "C:\Python27\lib\multiprocessing\forking.py", line 193, in dump
    ForkingPickler(file, protocol).dump(obj)
  File "C:\Python27\lib\pickle.py", line 224, in dump
    self.save(obj)
  File "C:\Python27\lib\pickle.py", line 331, in save
    self.save_reduce(obj=obj, *rv)
  File "C:\Python27\lib\pickle.py", line 419, in save_reduce
    save(state)
  File "C:\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Python27\lib\pickle.py", line 649, in save_dict
    self._batch_setitems(obj.iteritems())
  File "C:\Python27\lib\pickle.py", line 681, in _batch_setitems
    save(v)
  File "C:\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Python27\lib\multiprocessing\forking.py", line 66, in dispatcher
    self.save_reduce(obj=obj, *rv)
  File "C:\Python27\lib\pickle.py", line 401, in save_reduce
    save(args)
  File "C:\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Python27\lib\pickle.py", line 548, in save_tuple
    save(element)
  File "C:\Python27\lib\pickle.py", line 331, in save
    self.save_reduce(obj=obj, *rv)
  File "C:\Python27\lib\pickle.py", line 419, in save_reduce
    save(state)
  File "C:\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Python27\lib\pickle.py", line 649, in save_dict
    self._batch_setitems(obj.iteritems())
  File "C:\Python27\lib\pickle.py", line 681, in _batch_setitems
    save(v)
  File "C:\Python27\lib\pickle.py", line 331, in save
    self.save_reduce(obj=obj, *rv)
  File "C:\Python27\lib\pickle.py", line 396, in save_reduce
    save(cls)
  File "C:\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Python27\lib\pickle.py", line 753, in save_global
    (obj, module, name))
pickle.PicklingError: Can't pickle <type '_csv.reader'>: it's not the same object as _csv.reader
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Python27\lib\multiprocessing\forking.py", line 374, in main
    self = load(from_parent)
  File "C:\Python27\lib\pickle.py", line 1378, in load
    return Unpickler(file).load()
  File "C:\Python27\lib\pickle.py", line 858, in load
    dispatch[key](self)
  File "C:\Python27\lib\pickle.py", line 880, in load_eof
    raise EOFError
EOFError

您遇到的问题是由于使用 CSVWorker 类的方法作为流程目标引起的；并且该类有无法 pickle 的成员；那些打开的文件永远不会起作用；

你想要做的就是将该类分成两个类；一个协调所有工作子进程，另一个实际执行计算工作。工作进程将文件名作为参数并根据需要打开各个文件，或者至少等到它们调用其工作方法并打开文件。他们也可以采取multiprocessing.Queues 作为参数或实例成员；可以安全地传递。

在某种程度上，你已经这样做了；你的write_output_csv方法正在子进程中打开文件，但是您的parse_input_csv方法期望找到一个已经打开并准备好的文件作为属性self。坚持以其他方式做，你就会保持良好的状态。

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

python

在 Windows 上使用多重处理时出现“无法 pickle ”错误的相关文章

如何忽略传递给函数的意外关键字参数？

假设我有一些功能 f def f a None print a 现在如果我有一本字典比如dct a Foo 我可以打电话f dct 并得到结果Foo打印但是假设我有一本字典dct2 a Foo b Bar 如果我打电话f dct2
重新索引错误没有意义

I have DataFrames大小在 100k 到 2m 之间我正在处理这个问题的框架是如此之大但请注意我必须对其他框架执行相同的操作 gt gt gt len data 357451 现在这个文件是通过编译许多文件创建的所以它
Python grpc protobuf 存根生成问题：--grpc_out: protoc-gen-grpc: 插件失败，状态代码 1

正如问题所说我从源代码编译了 grpc 并且也做了sudo pip install grpcio 但是那which grpc python plugin不返回任何内容这是一个问题因为route guide的grpc python示例
我应该为 MySQL 使用什么 python 3 库？ [关闭]

Closed 此问题正在寻求书籍工具软件库等的推荐不满足堆栈溢出指南 help closed questions 目前不接受答案据我所知 MySQLdb 仍然没有移植到 Python 3 pypy 上似乎有另一个名为 PyMySQL
在 vim 折叠线中语法高亮 Python

我发现代码折叠 http en wikipedia org wiki Code folding帮助我更好地组织我的文件因此在我的底部 vimrc 我启用vim代码折叠 http vimdoc sourceforge net htmldo
从主机名中提取域名

是否有一种编程方式可以从给定的主机名查找域名给出 gt www yahoo co jp 返回 gt yahoo co jp 有效但非常慢的方法是拆分为并从左侧删除 1 个组使用 dnspython 加入并查询 SOA 记录当返回有
使 np.loadtxt 使用多个可能的分隔符

我有一个程序可以读取数据文件用户可以选择他们想要使用的列我希望它对于输入文件更加通用有时列可能如下所示 10 34 24 58 8 284 6 121 有时它们可能看起来像这样 10 34 24 58 8 284 6 121 我希
如何在 Pytorch 中将一维 IntTensor 转换为 int

如何将一维 IntTensor 转换为整数这 IntTensor int 给出错误 KeyError Variable containing 423 torch IntTensor of size 1 我所知道的最简单最干净的方法 In
如何在 Sublime 2 REPL Mac 中运行 Python 3

我的问题如下我安装了 sublime 2 和 sublime repl 插件一切正常我唯一需要的是更改在控制台内置的 sublimerepl 上运行的 python 版本我的意思是我有 python 2 7 5 预先安装了 mav
Python：帮助（numpy）在退出时导致段错误

我遇到了一个奇怪的现象在 python 解释器中我执行以下操作 gt gt gt import numpy gt gt gt help numpy 帮助显示正确但一旦我按 q 返回解释器 Segmentation fault core
Scrapy的redirect_urls异常.KeyError

我是 Scrapy 和 Python 的新手最近推出了我的第一个蜘蛛有一个功能似乎以前有效但现在它只适用于我试图废弃的一些网站代码行是 item url direct response request meta redirect u
使用 statsmodels.formula.api 中的 ols - 如何删除常数项？

我正在遵循第一个例子statsmodels教程 http statsmodels sourceforge net devel http statsmodels sourceforge net devel 如何指定在 ols 中不使用常数项进
Python 中的十进制到二进制半精度 IEEE 754

我只能使用以下命令将十进制转换为二进制单精度 IEEE754struct pack模块或者使用相反的方法 float16 或 float32 numpy frombuffer 是否可以使用 Numpy 将十进制转换为二进制半精度浮点数我
如何限制scrapy请求对象？

所以我有一个蜘蛛我认为它正在泄漏内存结果当我检查 telnet 控制台 gt gt gt prefs 时它只是从链接丰富的页面中抓取了太多链接有时它会超过 100 000 个现在我已经一遍又一遍地浏览文档和谷歌但我找不到一种方法
Scrapy 抓取并跟踪 href 中的链接

我对 scrapy 很陌生我需要从 url 的主页跟踪 href 到多个深度再次在 href 链接内我有多个 href 我需要遵循这些href 直到到达我想要抓取的页面我的页面的示例 html 是初始页 div class page
如何保持 python 3 脚本 (Bot) 运行

不是母语英语抱歉英语可能很蹩脚我也是编程新手您好我正在尝试使用 QueryServer 连接到 TeamSpeak 服务器来创建机器人经过几天的努力它有效只有 1 个问题而我却被这个问题困扰了如果您需要检查这是我正在使
Pip 突然使用了错误版本的 Python

在 os x 上使用 pip 时遇到一个奇怪的问题据我所知快速查看我的 bash history 似乎可以确认我最近没有对我的配置进行任何更改唉 pip 命令似乎突然使用了与以前不同的 python 版本到目前为止我使用命令 p
Flask 扩展未在 app.extensions 中注册

我想访问在我的 Flask 应用程序上注册的一些扩展我尝试使用app extensions 但我初始化的一些扩展不在字典中 from flask import current app current app extensions get
在多个图表上绘制一条线

I don t know how this thing is called or even how to describe it so the title may be a little bit misleading The first a
在至少 7 天内连续三天登录该产品的用户

我有一个用于用户参与的数据框 df 如下所示 time stamp user id 2013 01 01 10 05 23 1 2013 01 03 16 35 23 1 2013 01 06 11 06 35 1 2013 01 10 1

随机推荐

如何匹配包含特定字符串的属性？

当属性包含多个单词时我在按属性选择节点时遇到问题例如 div class atag btag div 这是我的 xpath 表达式 class atag 该表达式适用于 div class atag div 但不适用于前面的示例我怎样
Mac OSX、Emacs 24.2 和 nrepl.el 不工作

我在用着nrepl el Emacs 24 2 我的 S O 版本是 OS X Lion 10 7 5 运行命令 M x nrepl启动后REPL会话通过lein lein repl 我能够连接到它但如果我尝试使用 M x nrepl j
在 Mac OS X 上使用链接描述文件

有没有办法使用链接器脚本ld在 Mac OS X 上 The GNU ldLinux 上的程序接受 T
如何使用 lxml、XPath 和 Python 从网页中提取链接？

我有这个 xpath 查询 html body tbody tr td a title href 它提取所有带有标题属性的链接并给出href in FireFox 的 Xpath 检查器插件但是我似乎无法将它与lxml from lx
如何使用retofit2和RxAndroid取消请求

我正在使用 Retrofit 2 0 和 Rx android 来加载我的 API 我遵循该部分RxJava Integration with CallAdapter at 这个网站而且效果很好但是我不知道如何取消可观察对象的加载请求
Bash 进度条[重复]

这个问题在这里已经有答案了我正在使用以下脚本来浏览 whois 中的大量域列表并找到注册商对于服务器 DNS 迁移很有用并且它工作正常不过为了方便起见我想在其中加入一个进度条这是我的脚本如果可以改进请告诉我 bin bash
Asp.net MVC 3 使用 DataAnnotations 进行条件验证

我正在使用 Asp net MVC 3 面临数据注释的验证问题如下所示我们在单独的库项目中维护模型模型类层次结构如下 public class EditAlternateMailingAddressModel BaseModel pu
Spring MVC 将 ArrayList 传递回控制器

我是春天的新手我显示用户列表每行都有一个用于删除用户的复选框控制器 Controller public class AdminController Autowired private UserDao userDao RequestMa
从 JList 中删除项目

我有一个简单的 Jlist 其中包含来自List
WSAGetLastError() 只是 GetLastError() 的别名吗？

在我的代码中我有带有 I O 完成端口的异步 I O 对于读写完成回调我得到一个HANDLE 当然可以是套接字文件句柄命名管道等因此如果这样的例程出现问题我想检查错误但如何知道它是否是网络 HANDLE a SOCKET
在 Elixir 中查找代码点是否为大写

我需要检测 Elixir 中的代码点是否为大写字母我尝试检查它的值是否在范围内65 90但这对非拉丁大写字母失败我也尝试过检查是否 String upcase cp cp 然而这对非字母即数字标点符号失败我真的不想遍历整个 u
UIImageView，设置 ClipsToBounds 以及我的图像如何失去理智

我正在开发一个 iOS 4 应用程序我正在使用此代码UIImageView on an UITableViewCell cell photo contentMode UIViewContentModeScaleAspectFill cel
Hibernate 标准、整数和“like”

我正在将一些 hql 语句迁移到 Criterias 现在我正在解决一个问题实体属性是 Integer 类型但我需要使用通配符搜索所以在 hql 中我这样做 session createQuery from P1 where id l
运行模拟器时 SQLiteConnection 数据库泄漏

我正在运行模拟器并收到以下有关内存泄漏的错误有趣的是泄漏的数据库似乎是 Google gms 的数据库而不是用户数据库有谁知道如何修理它谢谢 09 27 15 55 07 252 2058 2068 com google andr
如何转到 vim 中所有缓冲区的最后一个编辑位置？

很容易转到当前缓冲区中的最后一个编辑位置看如何返回到 Vim 中最后一行之前编辑的行更改列表是缓冲区本地的每个缓冲区都有自己的更改列表然而我从最近编辑的缓冲区导航到另一个缓冲区是很常见的并且以某种方式返回到原始缓冲区中的最后一个
具有字节数组键和字符串值的 HashMap - containsKey() 函数不起作用

我正在使用 HashMap byte 键和字符串值但我意识到即使我使用相同的对象相同的字节数组和相同的字符串值 myList put TheSameByteArray TheSameStringValue 到 HashMap 中表仍然
带计时器的 JPanel 动画（滑入）

我正在尝试使用我制作的此类从侧面滑入 JPanel public class AnimationClass private int i private int y private JPanel panel private int xTo p
OpenXml：将 XElement 转换为 OpenXmlElement

我将如何去转换XElement to an OpenXmlElement 要么我的 google fu 失败要么这个问题还没有得到解决您可以转换给定的OpenXmlElement to a XElement使用以下代码 OpenXmlE
ASPNET 用户没有临时 ASP.NET 文件的写入权限

在我的 XP Professional 机器上运行 Visual Studio 2008 ASP NET 项目启动时不进行调试时出现以下错误 System Web HttpException The current identity m
”错误' aria-label='在 Windows 上使用多重处理时出现“无法 pickle ”错误'> 在 Windows 上使用多重处理时出现“无法 pickle ”错误

我正在编写一个多处理程序来使用 Windows 并行处理大型 CSV 文件 I found 这个很好的例子对于类似的问题在 Windows 下运行它时我收到一条错误指出 csv reader 不可 Picklable 我想我可以在阅读

在 Windows 上使用多重处理时出现“无法 pickle ”错误

在 Windows 上使用多重处理时出现“无法 pickle ”错误 的相关文章

随机推荐

热门标签

在 Windows 上使用多重处理时出现“无法 pickle ”错误的相关文章