Using fork in Perl to spread load to multiple cores

2023-11-08

原文链接:https://perlmaven.com/fork

If you have a big task to do that needs a lot of computation, but can be split up in several independent smaller tasks, you can reduce the overall execution time by spreading the work to several CPUs or cores.

On a Linux/Unix system this can be done using fork(). On Windows, fork() is emulated by perl and it might not fully work the same way.

Skeleton fork-ing

First let’s see how forking works.

When you call the fork() function of perl, it creates a copy of the current process and from that point on, there are going to be two processes executing the same script in parallel. In both “instance” of the script there will be all the variables that already existed in the original process, but there is no connection between the variables of the two processes. There is no direct way for one process to update (or even to check) the variables in the other process.

When we call fork() it can return 3 different values: undef, 0, or some other number.

It will return undef if the fork did not succeed. For example because we reach the maximum number of processes in the operating system. In that case we don’t have much to do. We might report it and we might wait a bit, but that’s about what we can do.

If the fork() succeeds, from that point there are two processes doing the same thing. The difference is that in original process, that we also call parent process, the fork() call will return the process ID of the newly created process. In the newly created (or cloned) process, that we also call child process, fork() will return 0.

The variable $$ contains the Process ID of the current process. Let’s see the following script:

use strict;
use warnings;
use 5.010;
 
say "PID $$";
my $pid = fork();
die if not defined $pid;
say "PID $$ ($pid)";

If we runt this script we will get output like this:

PID 63778
PID 63778 (63779)
PID 63779 (0)

Before calling fork, we got a PID (63778), after forking we got two lines, both printed by the last line in the code. The first printed line came from the same process as the first print (it has the same PID), the second printed line came from the child process (with PID 63779). The first one received a $pid from fork containing the number of the child process. The second, the child process got the number 0.

Fork and wait

There are a couple of issue we need to improve.

First of all, usually we want the parent process and the child process to do different things. We also usually want the parent process to wait till all of its child processes have finished working, and exit only after that. Otherwise they child processes will become so-called zombies.

Let’s see the next example:

use strict;
use warnings;
use 5.010;
 
my $name = 'Foo';
 
say "PID $$";
my $pid = fork();
die if not defined $pid;
if (not $pid) {
   say "In child  ($name) - PID $$ ($pid)";
   $name = 'Qux';
   # sleep 2;
   say "In child  ($name) - PID $$ ($pid)";
   exit;
}
 
say "In parent ($name) - PID $$ ($pid)";
$name = 'Bar';
# sleep 2;
say "In parent ($name) - PID $$ ($pid)";
 
my $finished = wait();
say "In parent ($name) - PID $$ finished $finished";

In the child process $pid will be 0, and so not $pid will be true. The child process will execute the block of the if-statement and will exit at the end.

In the parent process $pid will contain the process id of the child which is a non-negative number and thus not $pid will be false. The parent process will skip the block of the if-statement and will execute the code after it. At one point the parent process will call wait(), that will only return after the child process exits.

There is also a variable called $name that had a value assigned before forking. If you look at the output below, you will see that the variable $name kept its value in both the parent and the child process after the fork, but we could change it in both processes independently.

PID 64071
In parent (Foo) - PID 64071 (64072)
In parent (Bar) - PID 64071 (64072)
In child  (Foo) - PID 64072 (0)
In child  (Qux) - PID 64072 (0)
In parent (Bar) - PID 64071 finished 64072

The above code also has two calls to sleep commented out. They are there so you can enable each one of them separately to observe two things:

If the sleep is enabled in the child process (inside the if block) then the parent will arrive to the wait call much sooner than the child finishes. You will see it really waits there and the last print line from the parent will only appear after the child process finished.

On the other hand, if you enable the sleep in the parent process only, then the child will exit long before the parent reaches the call to wait. So when the parent finally calls wait, it will return immediately and return the process id of the child that has finished earlier.

This is important, as this means the signal that the parent process received marking the end of the child process has also waited for the parent to “collect” it. This will be especially important in the next example, where we create several child processes and we want to make sure we wait for all of them.

use strict;
use warnings;
use 5.010;
 
say "Process ID: $$";
 
 
my $n = 3;
my $forks = 0;
for (1 .. $n) {
  my $pid = fork;
  if (not defined $pid) {
     warn 'Could not fork';
     next;
  }
  if ($pid) {
    $forks++;
    say "In the parent process PID ($$), Child pid: $pid Num of fork child processes: $forks";
  } else {
    say "In the child process PID ($$)"; 
    sleep 2;
    say "Child ($$) exiting";
    exit;
  }
}
 
for (1 .. $forks) {
   my $pid = wait();
   say "Parent saw $pid exiting";
}
say "Parent ($$) ending";

In the $fork variable we count how many times we managed to fork successfully. Normally it is the same number as we wanted to fork, but in case one of the forks is not successful we don’t want to wait for too many child processes.

As the child processes exit, each one of them sends a signal to the parent. These signals wait in a queue (handled by the operating system) and the call to wait() will return then next item from that queue. IF the queue is empty it will wait for a new signal to arrive. So in the last part of the code we call wait exactly the same time as the number of successful forks.

In the for loop, we called fork $n times. In the part of the parent process (in the if-block), we just counted the forks. In the child process (in the else-block) we are supposed to do the real work. Here replaced by a call to sleep.

I am sure you are also familiar with people who go to work and then sleep there…

Anyway, the output will look like this:

Process ID: 66428
In the parent process PID (66428), Child pid: 66429 Num of fork child processes: 1
In the parent process PID (66428), Child pid: 66430 Num of fork child processes: 2
In the child  process PID (66429)
In the parent process PID (66428), Child pid: 66431 Num of fork child processes: 3
In the child  process PID (66430)
In the child  process PID (66431)
Child (66431) exiting
Child (66429) exiting
Child (66430) exiting
Parent saw 66430 exiting
Parent saw 66431 exiting
Parent saw 66429 exiting
Parent (66428) ending

wait and waitpid

The wait function will wait for any child process to exit and will return the process id of the one it saw. On the other hand there is a waitpid call that we don’t use in our examples, but that will wait for the end of a specific child process based on its process ID (PID).

Spreading load to several cores

Lastly, let’s see an example how we can spread load to several cores in your CPU. For this probably the best tool to use is the htop command in Linux. Please make sure you install it (yum install htop on Redhat/Fedora/CentOS, or apt-get install htop on Debian/Ubuntu). Then open a separate terminal and launch htop. At the top it should show several numbered rows. One for each core in your computer. The vertical sticks show the load.
在这里插入图片描述
The following script is just testing if we can create load on several cores on the same machine. Tested on a MacBook htop showed 100% on all 4 cores.

The script will fork $n = 8 times. Each child process will run a loop 10,000,000 times generating and adding together two random numbers. Of course, we could have find some useful thing to do instead of just generating random numbers, but this is a simple way to create load on the CPU.

use strict;
use warnings;
 
my $n = 8;
for (1 .. $n) {
  my $pid = fork;
  if (not $pid) {
    my $i = 0;
    while ($i < 10_000_000) {
      my $x = rand;
      my $y = rand;
      my $z = $x+$y;
      $i++;
    }
    exit;
  }
}
 
for (1 .. $n) {
   wait();
}

Run the script again and look at the output of htop:

All 4 cores are now filled with vertical sticks meaning all 4 cores have 100% CPU usage.
在这里插入图片描述
Communication
The big issue with forking is that once we forked, the two processes have no common variables. If the child process does something useful (as opposed to the above example) and wants to tell the parent process about the results it cannot do it directly.

There are several solutions to this, for example they can communicate via a socket. This will enable two way and long term communication, but it can be a bit complex. If we only want the child process to send the results back to the parent we can let the child save the results in a file and the parent can read the file after the child terminates.

We will see it in another article: Speed up calculation by running in parallel
在这里插入图片描述

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

Using fork in Perl to spread load to multiple cores 的相关文章

  • fork() 是如何工作的?

    我对fork真的很陌生 这段代码中的pid在做什么 有人可以解释一下 X 行和 Y 行的结果吗 include
  • 何时计划 (SELECT) 查询?

    在 PostgreSQL 中 什么时候计划 SELECT 查询 Is it 在报表准备时 或者 在处理 SELECT 开始时 或者 别的东西 我问的原因是 Stackoverflow 上有一个问题 相同的查询 两种不同的方式 性能截然不同
  • perl - 子进程向父进程发送信号

    我编写了以下代码来测试孩子和父母之间的信号传递 理想情况下 当子进程向父进程发出 SIGINT 时 父进程应该在新的迭代中返回并等待用户输入 我在 perl 5 8 中观察到了这一点 但在 perl 5 6 1 我被要求使用 中 父级实际上
  • Perl 脚本的 shebang 行应该使用什么?

    哪一个用作 Perl 脚本的 shebang 行更好或更快 perl perl exe fullpath perl perl exe partialpath perl perl exe 并且 当使用 perl 当它在特定系统上运行时 我如何
  • Perl `join` 生成多行字符串

    我有这个程序来对两个数组进行排序 usr bin perl w movies movies txt open FHD movies die could not open movies n movies
  • 子进程中的变量修改

    我正在研究科比和奥哈拉伦的作品Computer Systems A Programmer s Perspective 练习 8 16 要求程序的输出如下 我更改了它 因为他们使用了一个你可以在他们的网站上下载的头文件 include
  • 如何检测FTP文件传输完成?

    我正在编写一个脚本 用于轮询 FTP 站点上的文件并在可用时将它们下载到本地 文件由各个来源方随机存放到 FTP 站点 我需要一种方法能够在下载之前检测 FTP 站点上的文件是否已被源方完全传输 关于如何解决这个问题有什么想法吗 如果您可以
  • 使用perl创建层次结构文件

    我的任务是使用 perl 创建父子层次结构文件 示例输入文件 制表符分隔 记录将以随机顺序排列在文件中 父项 可能出现在 子项 之后 S5 S3 S5 S8 ROOT S1 S1 S7 S2 S5 S3 S4 S1 S2 S4 77 S2
  • 如何加速我的 Perl 程序?

    这确实是两个问题 但它们非常相似 为了简单起见 我想我应该把它们放在一起 Firstly 给定一个已建立的 Perl 项目 除了简单的代码优化之外 还有哪些不错的方法可以加速它 Secondly 用Perl从头开始编写程序时 有哪些好的方法
  • Perl:模板工具包的替代品

    我使用模板工具包来扩展现有的领域特定语言 verilog 已经超过 3 年了 虽然总的来说我对此感到满意 但主要的刺激性是 当出现语法 undef 错误时 错误消息不包含用于调试错误的正确行号信息 例如我会收到一条消息 指示 0 未定义 因
  • 如何只读取文件的第一行

    我已经用谷歌搜索了一段时间 但我找不到只读取文件第一行的函数 我需要读取文本文件的第一行并从中提取日期 Perl 新手 open my file lt filename txt my firstLine lt file gt close f
  • Perl 三元条件运算符内部赋值问题

    我的程序中的这段 Perl 代码给出了错误的结果 condition a 2 a 3 print a 无论价值如何 condition就是 输出总是3 为什么呢 Perl 中对此进行了解释文档 http perldoc perl org p
  • 在 Perl 中解析 RSS/Atom 的最佳库是什么? [关闭]

    Closed 这个问题正在寻求书籍 工具 软件库等的推荐 不满足堆栈溢出指南 help closed questions 目前不接受答案 我注意到XML RSS 解析器 http search cpan org dist XML RSS P
  • 如何使用 Perl 和正则表达式将 SQL 文档转换为 ColdFusion 脚本?

    我需要将 SQL 语句文档转换为 ColdFusion 文档 我对正则表达式只有一点经验 而且我是 Perl 超级新手 我昨天刚刚自学了它的基础知识 所以我可以完成这项任务 我正在尝试用 Perl 编写的脚本匹配和替换模式 该脚本保存为 B
  • 在 Perl 中,如何制作数组的深层复制? [复制]

    这个问题在这里已经有答案了 可能的重复 在 Perl 中制作数据结构深层复制的最佳方法是什么 https stackoverflow com questions 388187 whats the best way to make a dee
  • 如何使用 Perl 在 Unix 中获取文件创建时间

    如何使用 perl 在 unix 中获取文件创建时间 我有这个命令显示文件的最后修改时间 perl MPOSIX le print strftime d b Y H M localtime lstat 9 for ARGV file txt
  • cat/Xargs/命令 VS for/bash/命令

    Linux 101 Hacks 一书的第 38 页建议 cat url list txt xargs wget c 我通常这样做 for i in cat url list txt do wget c i done 除了长度之外 还有什么东
  • 为什么“fork()”调用没有在无限循环中优化? [关闭]

    Closed 这个问题需要多问focused help closed questions 目前不接受答案 考虑到 C 11 1 10 24 in intro multithread 该实现可以假设任何线程最终都会执行以下操作之一 终止 调用
  • 如何在 Perl 中将多个哈希值合并为一个哈希值?

    在 Perl 中 我如何得到这个 VAR1 999 gt 998 gt 908 906 0 998 907 VAR1 999 gt 991 gt 913 920 918 998 916 919 917 915 912 914 VAR1 99
  • 在 Perl 中查找标量变量的数据类型

    我有一个接受用户输入的函数 输入可以是整数 浮点数或字符串 我有三个重载函数 应该根据输入数据的数据类型调用它们 例如 如果用户输入一个整数 比如100 则应该调用具有整数参数的函数 如果用户输入字符串 例如 100 则应调用具有字符串参数

随机推荐

  • 有序单链表转换成二叉平衡搜索树

    题目 Given a singly linked list where elements are sorted in ascending order convert it to a height balanced BST 关键词 有序单链表
  • hibernate 注解 ,视图无主键,怎么配置联合主键

    我之前用myeclipse 反向生成了2个pojo 但是hql查询有问题 生成的类 Service类 我传入一个值查询的时候 这好像是我底层的 sessionFactory getCurrentSession createQuery hql
  • decimals数据格式化

    文章目录 decimals数据格式化 1 保留小数 1 1 iOS 2 去除小数点后多余的 0 2 1 iOS 2 2 C decimals数据格式化 格式化数据 以便移动端UI显示 1 保留小数 无小数部分 则保留整数 有小数部分保留两位
  • soul 网关源码解析

    一 soul网关引入的依赖分析 从上图可以看到我红线划分五个依赖区域 1 soul common包 这里不是很重要 我们大概看一下他的作用就好了 从上图中可以看出 这个包里主要定义了一个常量 枚举类 配置类 自定义的DTO对象 2 soul
  • CSS背景靠右对齐,并且背景图片右边刘10px

    margin right 10px float right 或者 Background Image url 图像路径 X坐标 Y坐标 no repeat 或者 padding right 10px float right
  • Qt -- 14Lambda表达式和信号功能

    视频学习链接 https www bilibili com video BV1g4411H78N p 14 在Qt中 使用Lambda表达式配合信号使用 非常方便 Lambda表达式是C 11中最重要的新特性之一 在QT5 4 包括 以前的
  • 从零开始学前端(三)

    上一篇我们已经写了一个带图片的网页 我们接着练一下其他的常用标签 声明为 HTML5 文档 元素是 HTML 页面的根元素 元素包含了文档的元 meta 数据 如 定义网页编码格式为 utf 8
  • 2023面试真题之浏览器篇

    人生当中 总有一个环节 要收拾你一下 让你尝一尝生活的铁拳 大家好 我是柒八九 今天 我们继续2023前端面试真题系列 我们来谈谈关于浏览器的相关知识点 如果 想了解该系列的文章 可以参考我们已经发布的文章 如下是往期文章 文章list 2
  • Conflux Studio 安装教学

    在 Conflux Studio 详解 中 烤仔从安装 教程 功能预览三个方面向大家介绍了 Conflux Studio 本次 由黑曜石实验室的 CEO Phil 向大家展示如何使用 Conflux Studio 进行一个完整的 Confl
  • 使用jenkins进行项目部署

    前言 由于近期接手了前端的项目 在项目打包部署的时候 手动操作构建打包部署等等步骤非常繁琐 所以自己尝试使用jenkins帮助自己解决这一烦恼 之前有用过 但只是使用而已 这次借机自己搭建配置一下 本以为很简单但是在自己使用的过程也多多少少
  • DCMM GBT 36073-2018 数据管理能力成熟度评估模型(Word版)

    ICS 35 240 70 L 67 中华人民共和国国家标准 GB T 36073 2018 数据管理能力成熟度评估模型 Data management capability maturity assessment model 2018 0
  • QT::槽函数关联的三种方式

    1 第一种方法 首先在头文件中定义 private slots void show l 在 cpp中进行connect QtGuiApplication3 QtGuiApplication3 QWidget parent QMainWind
  • C与C++的不同--------extern

    extern可以置于变量或者函数前 以表示变量或者函数的定义在别的文件中 提示编译器遇到此变量和函数时在其他模块中寻找其定义 另外 extern也可用来进行链接指定 C 语言的创建初衷是 a better C 但是这并不意味着C 中类似C语
  • 分布式MySQL数据库TDSQL架构分析

    分布式MySQL数据库TDSQL架构分析 发表于 11小时前 次阅读 来源 程序员电子刊 0 条评论 作者 雷海林 MySQL TDSQL 腾讯 架构 width 22 height 16 src http hits sinajs cn A
  • 区块链平台开发

    太晚了 明天写
  • MVCC 脑图 数据库事务并发版本控制

    学习MySQL MVCC时做的脑图 记个笔记
  • 来了来了,2023年某中大厂真实面经!

    300万字 全网最全大数据学习面试社区等你来 本篇文章的面经是我辅导的一个同学的真实面试经历 2023年校招的宝子们拿走快看 第一个面经来自某头部大厂 1 做过的项目细节和遇到的问题 30分钟 所以说大家要对简历中的项目细节了如指掌 2 实
  • 如何在一个vue项目中集成electron框架

    将electron框架集成到vue文件中 不改变vue文件的原有结构 1 在vue文件中安装electron框架 运行代码 vue add electron builder nde为v16的 electron版本一般选择 electron
  • WSL2 使用桥接网络(不使用代理,局域网可独立IP访问)

    1 一切开始之前首先需要启动 WSL 直接在命令行运行运行 wsl 即可 这样 WSL 的网卡才会被自动创建出来 2 查看 网卡 管理员权限运行 PowerShell 运行 Get NetAdapter获取所有的网卡信息 注意这里的网卡不能
  • Using fork in Perl to spread load to multiple cores

    原文链接 https perlmaven com fork If you have a big task to do that needs a lot of computation but can be split up in severa