我正在使用脚本启动 SLURM 作业,并且脚本必须根据其在脚本本身内部获取的位置来工作SCRIPT_LOCATION=$(realpath $0)
。但 SLURM 将脚本复制到slurmd
文件夹并从那里开始工作,这会搞砸进一步的操作。
在移动/复制之前,是否有任何选项可以获取用于 slurm 作业的脚本的位置?
脚本位于网络共享文件夹中/storage/software_folder/software_name/scripts/this_script.sh
并且它必须:
- 得到它自己的位置
- 返回
software_name
folder
- 复制
software_name
文件夹到本地文件夹/node_folder
on node
- 从复制的文件夹运行另一个脚本
/node_folder/software_name/scripts/launch.sh
我的脚本是
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --partition=my_partition_name
# getting location of software_name
SHARED_PATH=$(dirname $(dirname $(realpath $0)))
# separating the software_name from path
SOFTWARE_NAME=$(basename $SHARED_PATH)
# target location to copy project
LOCAL_SOFTWARE_FOLDER='/node_folder'
# corrected path for target
LOCAL_PATH=$LOCAL_SOFTWARE_FOLDER/$SOFTWARE_NAME
# Copying software folder from network storage to local
cp -r $SHARED_PATH $LOCAL_SOFTWARE_FOLDER
# running the script
sh $LOCAL_PATH/scripts/launch.sh
当我通过以下方式在节点本身上运行它(不使用 SLURM)时,它运行得很好:sh /storage/software/scripts/this_script.sh
.
如果使用 SLURM 运行它sbatch /storage/software/scripts/this_script.sh
它被分配给节点之一,但是:
- 在运行之前它被复制到
/var/spool/slurmd/job_number/slurm_script
从此一切都搞砸了$(dirname $(dirname $(realpath $0)))
回报/var/spool/slurmd
是否可以获得原始位置(/storage/software_folder/software_name/
)在使用 SLURM 启动时在脚本内部?
附:所有机器都运行 Fedora 30 (x64)
UPDATE 1
有人建议运行sbatch -D /storage/software_folder/software_name ./scripts/this_script.sh
并使用SHARED_PATH="${SLURM_SUBMIT_DIR}"
在脚本本身内部。
但这会引发错误sbatch: error: Unable to open file ./scripts/this_script.sh
.
另外,我尝试使用绝对路径:sbatch -D /storage/software_folder/software_name /storage/software_folder/software_name/scripts/this_script.sh
。它尝试运行,但是:
- 在这种情况下,它仅使用指定的文件夹来创建输出文件
- 软件仍然不想运行
- 尝试使用
echo "${SLURM_SUBMIT_DIR}"
脚本打印内部/home/username_who_started_script
代替/storage/software_folder/software_name
还有其他建议吗?
更新2:也尝试过使用#SBATCH --chdir=/storage/software_folder/software_name
在脚本内部,但在这种情况下echo "${SLURM_SUBMIT_DIR}"
回报/home/username_who_started_script
or /
(如果以 root 身份运行)
UPDATE 3
接近与${SLURM_SUBMIT_DIR}
仅当任务运行为:
cd /storage/software_folder/software_name
sbatch ./scripts/this_script.sh
但这似乎不是一个正确的解决方案。还有其他方法吗?
SOLUTION
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --partition=my_partition_name
# check if script is started via SLURM or bash
# if with SLURM: there variable '$SLURM_JOB_ID' will exist
# `if [ -n $SLURM_JOB_ID ]` checks if $SLURM_JOB_ID is not an empty string
if [ -n $SLURM_JOB_ID ]; then
# check the original location through scontrol and $SLURM_JOB_ID
SCRIPT_PATH=$(scontrol show job $SLURM_JOBID | awk -F= '/Command=/{print $2}')
else
# otherwise: started with bash. Get the real location.
SCRIPT_PATH=$(realpath $0)
fi
# getting location of software_name
SHARED_PATH=$(dirname $(dirname $(SCRIPT_PATH)))
# separating the software_name from path
SOFTWARE_NAME=$(basename $SHARED_PATH)
# target location to copy project
LOCAL_SOFTWARE_FOLDER='/node_folder'
# corrected path for target
LOCAL_PATH=$LOCAL_SOFTWARE_FOLDER/$SOFTWARE_NAME
# Copying software folder from network storage to local
cp -r $SHARED_PATH $LOCAL_SOFTWARE_FOLDER
# running the script
sh $LOCAL_PATH/scripts/launch.sh