如何将 SLURM-jobID 作为输入参数传递给 python?

2024-01-06

我是使用 SLURM 训练一批卷积神经网络的新手。为了轻松跟踪所有经过训练的 CNN,我想将 SLURM jobID 作为输入参数传递给 python。将其他变量作为参数传递可以正常工作。但是,我无法访问 SLURM jobid 来通过。

我已经尝试过使用${SLURM_JOBID}, ${SLURM_JOB_ID}, %j and %J。我还尝试在传递给 python 之前将这些 slurm 环境变量写入变量。

这是我最新的代码:

#!/bin/bash

# --- info to user
echo "script started ... "

# --- setup environment
module purge            # clean up
module load python/3.6
module load nvidia/10.0
module load cudnn/10.0-v7 

# --- display information
HOST=`hostname`
echo "This script runs the CNN. Slurm scheduled it on node $HOST"
echo "I am interested of all environment variables Slurm adds:"
env | grep -i slurm

# --- start running ... 
echo " --- run --- "

# --- define some varibles
dc="dice"
sm="softmax"

# --- run a job using a slurm batch script
for layer in {3..15..2}
  do
    sbatch -N 1 -n 1 --mem=20G --mail-type=END --gres=gpu:V100:3 --wrap="singularity --noslurm tensorflow_19.03-py3.simg python run_CNN_dynlayer.py ${SLURM_JOBID} ${layer} ${dc}"
    sleep 1 # pause 1s to be kind to the scheduler...
    echo "jobid: "+${SLURM_JOBID}
    echo " --- next --- "
  done    

cmd 看起来像这样:

femonk@rarp1 [CNN] ./run_CNN_test.slurm
script started ... 
This script runs the CNN. Slurm scheduled it on node rarp1
I am interested of all environment variables Slurm adds:
SLURM_ACCOUNT=AI
PYTHONPATH=/cluster/slurm/lib64/python3.6/site-packages:/cluster/slurm/lib64/python3.6/site-packages:/cluster/slurm/lib64/python3.6/site-packages:
 --- run --- 
Submitted batch job 3182711
jobid: 
 --- next --- 
femonk@rarp1 [CNN] 

有谁知道我的代码有什么问题吗? 预先非常感谢。


The SLURM_JOBID环境变量仅适用于作业进程,不适用于提交作业的进程。作业 ID 从sbatch命令,所以如果你想把它放在一个变量中,你需要给它赋值。

  do
    SLURM_JOBID=$(sbatch --parsable -N 1 -n 1 --mem=20G --mail-type=END --gres=gpu:V100:3 --wrap="singularity --noslurm tensorflow_19.03-py3.simg python run_CNN_dynlayer.py ${SLURM_JOBID} ${layer} ${dc}")
    sleep 1 # pause 1s to be kind to the scheduler...
    echo "jobid: "+${SLURM_JOBID}
    echo " --- next --- "
  done   

注意命令替换的使用$()--parsable的论证sbatch.

另请注意,该行Submitted batch job 3182711当前输出的将消失,因为它用于填充SLURM_JOBID多变的。

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

如何将 SLURM-jobID 作为输入参数传递给 python? 的相关文章

随机推荐