原文链接:https://hpc.ncsu.edu/Documents/lsf_scripts.php
Many workflows involve submitting multiple compute jobs with slightly different parameters. Users can create a more efficient and reproducible workflow by using LSF job arrays or shell scripts to automate submission to LSF. Some basic scripts are provided as examples.
Caution!
Syntax errors, logic errors, or miscalculation of resources in scripts that do multiple batch submissions may result in the following violations of the Acceptable Use Policy:
- Infinite loops that crash LSF
- Too many jobs creating enough simultaneous I/O to crash the file system
- Taking all available software licenses, making them unavailable to other users
IN THOSE CASES, A USER WILL BE REQUIRED TO KILL ALL JOBS.
- To kill all jobs, use
bkill 0
BEFORE USING AN AUTOMATED SUBMISSION SCRIPT:
- Test the logic of codes and scripts by adding echo statements and running with the bsub command commented out.
- Always do an initial test that submits only a single job. Next, test with only a few jobs for a short amount of time, and monitor the usage and output.
- When in doubt, contact
HPC staff
.
Job arrays for multiple job submissions
LSF job arrays allow a user to submit multiple jobs to LSF as defined by an array of integers. If the workflow allows for inputs, outputs, and parameters to be fully characterized by a single number, job arrays are the most efficient way of submitting multiple jobs.
In the sample batch script below, LSF will spawn 25 serial jobs, and each will execute the line source ./echo_hostname.csh $LSB_JOBINDEX
. Here, $LSB_JOBINDEX
is the job array index (an integer from 1 to 25) for each job.
#!/bin/tcsh
#BSUB -J My_array[1-25] #job name AND job array
#BSUB -n 1 #number of cores
#BSUB -W 00:10 #walltime limit: hh:mm
#BSUB -o Output_%J_%I.out #output - %J is the job-id %I is the job-array index
#BSUB -e Error_%J_%I.err #error - %J is the job-id %I is the job-array index
source ./echo_hostname.csh $LSB_JOBINDEX
Job array - serial example
The script job_array_serial.csh submits 25 jobs that run the program echo_hostname.csh, which echoes the hostname of the node that it is running on and which job of the job array it is. To use, type bsub < job_array_serial.csh. The scripts and the resulting output can be viewed here:
job_array_serial.csh
echo_hostname.csh
output
To avoid copy/paste errors when using, please copy these from the apps directory:
/usr/local/apps/examples/scripts/job_arrays
Job array - parallel example
The script job_array.csh is to demonstrate that multiple parallel jobs may be submitted in the same manner as for multiple serial jobs. The sample code hello_omp.F90 is a hybrid MPI-OpenMP example which echoes the hostname of the node that each thread is running on. To use, first compile the sample code, and then type bsub < job_array.csh. The job_array.csh contains instructions for compiling the sample code. The scripts and the resulting output can be viewed here:
job_array.csh
hello_omp.F90
output
To avoid copy/paste errors when using, please copy these from the apps directory:
/usr/local/apps/examples/scripts/job_arrays
Sample automation scripts using loops to submit multiple jobs to LSF
If automating job submissions requires something more complex than is available via the use of job arrays as described above, job submission may be automated with batch scripts.
Basic script for multiple job submissions
The script multiple_jobs.csh uses bsub to run the program run.csh, which echoes the hostname of the node that it is running on. This can also be used as a test as to whether an LSF batch script will distribute jobs to the intended hosts. The scripts and the resulting output can be viewed here:
multiple_jobs.csh
run.csh
output
To avoid copy/paste errors when using, please copy these from the apps directory:
/usr/local/apps/examples/scripts/basic/
R script for multiple job submissions
The script R_loops.csh uses bsub and Rscript to define various years and models and then run an R script codehpc.R for each scenario. The scripts and the resulting output can be viewed here:
R_loops.csh
codehpc.R
output
To avoid copy/paste errors when using, please copy these from the apps directory:
/usr/local/apps/examples/scripts/R_loops/