Skip to content

Submitting Jobs with Slurm

Processing computational tasks with Cheaha at the terminal requires submitting jobs to the Slurm scheduler. Slurm offers two commands to submit jobs: sbatch and srun. Always use sbatch to submit jobs to the scheduler, unless you need an interactive terminal. Otherwise only use srun within sbatch for submitting job steps within an sbatch script context.

The command sbatch accepts script files as input. Scripts should be written in an available shell language on Cheaha, typically bash, and should include the appropriate Slurm directives at the top of the script telling the scheduler the requested resources. Read on to learn more about how to use Slurm effectively.

Important

Much of the information and examples on this page require a working knowledge of terminal commands and the shell. If you are unfamiliar with the terminal then please see our Shell page for more information and educational resources.

Common Slurm Terminology

  • Node: A self-contained computing devices, forming the basic unit of the cluster. A node has multiple CPUs, memory, and some have GPUs. Jobs requiring multiple nodes must use a protocol such as MPI to communicate between them.
    • Login nodes: Gateway for reseacher access to computing resources, shared among all users. DO NOT run research computation tasks on the login node.
    • Compute nodes: Dedicated nodes for running research computation tasks.
  • Core: A single unit of computational processing, not to be confused with a CPU, which may have many cores.
  • Partition: A logical subset of nodes sharing computational features. Different partitions have different resource limits, priorities, and hardware.
  • Job: A collection of commands that require computational resources to perform. Can be interactive with srun or submitted to the scheduler with srun or sbatch.
  • Batch Job: An array of jobs which all have the same plan for execution, but may vary in terms of input and output. Only available in non-interactive batch mode via sbatch
  • Job ID: The unique number representing the job, returned by srun and sbatch. Stored in $SLURM_JOB_ID within a job.
  • Job Index Number: For array jobs, the index of the currently running job within the array. Stored in $SLURM_ARRAY_TASK_ID within a job.

Slurm Flags and Environment Variables

Slurm has many flags a researcher can use when creating a job, but a short list of the most important ones for are described below. It is highly recommended to be as explicit as possible with flags and not rely on system defaults. Explicitly using the flags below makes your scripts more portable, shareable and reproducible.

Flag Short Environment Variable Description sbatch srun
--job-name -J SBATCH_JOB_NAME Name of job stored in records and visible in squeue. sbatch srun
SLURM_JOB_ID Job ID number of running job or array task. May differ from SLURM_ARRAY_JOB_ID depending on array task index sbatch srun
--output -o SBATCH_OUTPUT Path to file storing text output. sbatch srun
--error -e SBATCH_ERROR Path to file storing error output. sbatch srun
--partition -p SBATCH_PARTITION Partition to submit job to. More details below. sbatch srun
--time -t SBATCH_TIMELIMIT Maximum allowed runtime of job. Allowed formats below. sbatch srun
--nodes -N Number of nodes needed. Set to 1 if your software does not use MPI or if unsure. sbatch srun
--ntasks -n SLURM_NTASKS Number of tasks planned per node. Mostly used for bookkeeping and calculating total cpus per node. If unsure set to 1. sbatch srun
--cpus-per-task -c SLURM_CPUS_PER_TASK Number of needed cores per task. Cores per node equals -n times -c. sbatch srun
SLURM_CPUS_ON_NODE Number of cpus available on this node. sbatch srun
--mem SLURM_MEM_PER_NODE Amount of RAM needed per node in MB. Can specify 16 GB using 16384 or 16G. sbatch srun
--gres SBATCH_GRES Used to request GPUs per node. For 2 GPUs per node use --gres=gpu:2. sbatch srun
--array SBATCH_ARRAY_INX Comma-separated list of similar tasks to run. More details below. sbatch n/a
SBATCH_ARRAY_JOB_ID Parent Job ID number of array task. Same for all array tasks submitted with same script. May differ from SLURM_JOB_ID depending on array task index. sbatch n/a
SLURM_ARRAY_TASK_COUNT Total number of array tasks. sbatch n/a
SLURM_ARRAY_TASK_ID Current array task index. sbatch n/a

Available Partitions for --partition

Please see Cheaha Hardware for more information. Remember, the smaller your resource request, the sooner your job will get through the queue.

Requesting GPUs

Please see the GPUs page for more information.

Dynamic --output and --error File Names

The --output and --error flags can use dynamic job information as part of the name:

  • %j is the Job ID, equal to $SLURM_JOB_ID.
  • %A is the main Array Job ID, equal to $SLURM_ARRAY_JOB_ID.
  • %a is the Array job index number, equal to $SLURM_ARRAY_TASK_ID.
  • %x is the --job-name, equal to $SLURM_JOB_NAME.

For example if using --job-name=my-job, then to create an output file like my-job-12345678 use --output=%x-%j.

If also using --array=0-4, then to create an output file like my-job-12345678-0 use --output=%x-%A-%a.

Batch Jobs with sbatch

Important

The following examples assume familiarity with the Linux terminal. If you are unfamiliar with the terminal then please see our Shell page for more information and educational resources.

Batch jobs are typically submitted using scripts with sbatch. Using sbatch this way is the preferred method for submitting jobs to Slurm on Cheaha. It is more portable, shareable, reproducible and scripts can be version controlled using Git.

For batch jobs, flags are typically included as directive comments at the top of the script like #SBATCH --job-name=my-job. Read on to see examples of batch jobs using sbatch.

A Simple Batch Job

Below is an example batch job script. To test it, copy and paste it into a plain text file testjob.sh in your Home Directory on Cheaha. Run it at the terminal by navigating to your home directory by entering cd ~ and then entering sbatch testjob.sh. Momentarily, two text files with .out and .err suffixes will be produced in your home directory.

#!/bin/bash
#
#SBATCH --job-name=test
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=1G
#SBATCH --partition=express
#SBATCH --time=00:10:00
#SBATCH --output=%x_%j.out
#SBATCH --error=%x_%j.err

echo "Hello World"
echo "Hello Error" 1>&2

There is a lot going on in the above script, so let's break it down. There are three main chunks of this script:

  1. Line 1 is the interpreter directive: #!/bin/bash. This tells the shell what application to use to execute this script. All sbatch scripts on Cheaha should start with this line.
  2. Lines 3-11 are the sbatch flags which tell the scheduler what resources you need and how to manage your job.

    • Line 3: The job name is test.
    • Lines 4-7: The job will have 1 node, with 1 core and 1 GB of memory.
    • Line 8: The job will be on the express partition.
    • Line 9: The job will be no longer than 10 minutes, and will be terminated if it runs over.
    • Line 10: Any standard output (stdout) will be written to the file test_$SLURM_JOB_ID.out in the same directory as the script, whatever the $SLURM_JOB_ID happens to be when the job is submitted. The name comes from %x equal to test, the --job-name, and %j equal to the Job ID.
    • Line 11: Any error output (stderr) will be written to a different file test_$SLURM_JOB_ID.err in the same directory.
  3. Lines 13 and 14 are the payload, or tasks to be run. They will be executed in order from top to bottom just like any shell script. In this case, it is simply writing "Hello World" to the --output file and "Hello Error" to the --error file. The 1>&2 Means redirect a copy (>&) of stdout to stderr.

Batch Array Jobs With Known Indices

Building on the job script above, below is an array job. Array jobs are useful when you need to perform the same analysis on slightly different inputs with no interaction between those analyses. We call this situation "pleasingly parallel". We can take advantage of an array job using the variable $SLURM_ARRAY_TASK_ID, which will have an integer in the set of values we give to the --array flag.

To test the script below, copy and paste it into a plain text file testarrayjob.sh in your Home Directory on Cheaha. Run it at the terminal by navigating to your home directory by entering cd ~ and then entering sbatch testarrayjob.sh. Momentarily, 16 text files with .out and .err suffixes will be produced in your home directory.

#!/bin/bash
#
#SBATCH --job-name=test
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=1G
#SBATCH --partition=express
#SBATCH --time=00:10:00
#SBATCH --output=%x_%A_%a.out
#SBATCH --error=%x_%A_%a.err
#SBATCH --array=0-9

echo "My SLURM_ARRAY_TASK_ID: " $SLURM_ARRAY_TASK_ID

This script is very similar to the one above, but will submit 10 jobs to the scheduler that all do slightly different things. Each of the 10 jobs will have the same amount and type of resources allocated, and can run in parallel. The 10 jobs come from --array=0-9. The output of each job will be one of the numbers in the set {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, depending on which job is running. The output files will look like test_$(SLURM_ARRAY_JOB_ID)_$(SLURM_ARRAY_TASK_ID).out or .err. The value of $(SLURM_ARRAY_JOB_ID) is the main Job ID given to the entire array submission.

Scripts can be written to take advantage of the $SLURM_ARRAY_TASK_ID variable indexing variable. For example, a project could have a list of participants that should be processed in the same way, and the analysis script uses the array task ID as an index to pull out one entry from that list for each job. Many common programming languages can interact with shell variables like $SLURM_ARRAY_TASK_ID, or the values can be passed to a program as an argument.

You can override the --array flag stored in the script when you call sbatch. To do so, pass another --array flag along with the script name like below. This allows you to rerun only subsets of your array script.

# submit jobs with index 0, 3, and 7
sbatch --array=0,3,7 array.sh

# submit jobs with index 0, 2, 4, and 6
sbatch --array=0-6:2 array.sh

For more details on using sbatch please see the official documentation.

Note

If you are using bash or shell arrays, it is crucial to note they use 0-based indexing. Plan your --array flag indices accordingly.

Batch Array Jobs With Dynamic or Computed Indices

For a practical example with dynamic indices, please visit our Practical sbatch Examples

Interactive Jobs with srun

Jobs should be submitted to the Slurm job scheduler either using a batch job or an Open OnDemand (OOD) interactive job.

You can use srun for working on short interactive tasks such as creating an Anaconda environment and running parallel tasks within an sbatch script.

Warning

The limitations of srun is that the jobs/execution die if the internet connection is down, and you may have to rerun the job again.

We recommend against using srun for any scientific or research computing or data analysis. Use a batch job or an Open OnDemand (OOD) interactive job instead.

Let us see how to acquire a compute node quickly using srun. You can run interactive job using srun command with the --pty /bin/bash flag. Here is an example,

$srun --ntasks=2 --time=01:00:00 --mem-per-cpu=8G --partition=medium --job-name=test_srun --pty /bin/bash

srun: job 21648044 queued and waiting for resources
srun: job 21648044 has been allocated resources

The above example allocates a compute node with a 8GB of RAM on a medium partition with --ntasks=2 to run short tasks.

srun for running parallel jobs

srun is used to run executables in parallel, and is used within sbatch script. Let us see an example where srun is used to launch multiple (parallel) instances of a job.

#!/bin/bash
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=1
#SBATCH --job-name=srun_test
#SBATCH --partition=long
#SBATCH --time=05:00
#SBATCH --mem=4G

srun hostname

In the script above, we have asked for two nodes --nodes=2, and each node will run a single instance of a hostname as we requested --ntasks-per-node=1. The output for the above script is,

c0187
c0188

Here is another example of running different independent programs simultaneously on different resources within a batch job. Multiple srun can execute simultaneously as long as they do not exceed the resources reserved for that job i.e., step 1 executes in node 1 with --ntasks=4, and step 2 executes in node 2 with --ntasks=4 simultaneously. Note that --nodes=1 -r1 in step 2 defines the number of nodes and their relative node position within the resources assigned to the job.

#!/bin/bash
#SBATCH --nodes=2
#SBATCH --ntasks=8
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=1
#SBATCH --partition=amd-hdr100
#SBATCH --time=05:00
#SBATCH --mem-per-cpu=1G

#Partioning of resources for two different tasks
#STEP 1
srun --nodes=1 --ntasks=4 hostname
#STEP 2
srun --nodes=1 -r1 --ntasks=4 uname -a

Here is the output for running multiple srun in a single job, i.e., executing the hostname and uname -a tasks simultaneously but on different nodes.

c0203
c0203
c0203
c0203
Linux c0204 3.10.0-1160.24.1.el7.x86_64 #1 SMP Thu Mar 25 21:21:56 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Linux c0204 3.10.0-1160.24.1.el7.x86_64 #1 SMP Thu Mar 25 21:21:56 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Linux c0204 3.10.0-1160.24.1.el7.x86_64 #1 SMP Thu Mar 25 21:21:56 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Linux c0204 3.10.0-1160.24.1.el7.x86_64 #1 SMP Thu Mar 25 21:21:56 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Alternatively, srun can also run MPI, OpenMP, hybrid MPI/OpenMP, and many more parallel jobs. For more details on using srun, please see the official documentation.

Important

srun has been disabled for use with MPI. We have removed this functionality due to an open vulnerability: https://nvd.nist.gov/vuln/detail/CVE-2023-41915. The vulnerability could allow an attacker to escalate privileges to root and/or access data they do not have permissions for.

Instead of srun, please load one of the OpenMPI modules with an appropriate version. Please contact Support with any questions or concerns.

Environment Setup and Module Usage in Job Submission

Before submitting a job using sbatch, it's crucial to establish a tailored environment, including software installations and loading necessary modules containing the required software packages. We highly recommend the practice of putting module reset before any module load calls in job scripts. The module system modifies the environment whenever the module list changes, and Slurm jobs inherit the environment from whatever called sbatch or srun. The module reset command normalizes the initial environment for the script, improving repeatability and minimizing the risk of hard-to-diagnose module conflicts. For examples and further information, please see best practice for loading modules.

Graphical Interactive Jobs

It is highly recommended to use the Open OnDemand web portal for interactive apps. Interactive sessions for certain software such as MATLAB and RStudio can be created directly from the browser while an HPC Desktop is available to access all of the other software on Cheaha. A terminal is also available through Open OnDemand.

It is possible to use other remote desktop software, such as VNC, to start and interact with jobs. These methods are not officially supported and we do not have the capacity to help with remote desktop connections. Instead, please consider switching your workflow to use the Open OnDemand HPC Desktop. If you are unable to use this method, please contact Support.

Estimating Compute Resources

Being able to estimate how many resources a job will need is critical. Requesting many more resources than necessary bottlenecks the cluster by reserving unused resources for an inefficient job preventing other jobs from using them. However, requesting too few resources will slow down the job or cause it to error.

Questions to ask yourself when requesting job resources:

  1. Can my scripts take advantage of multiple CPUs?
    1. For instance, RStudio generally works on a single thread. Requesting more than 1 CPU here would not improve performance.
  2. How large is the data I'm working with?
  3. Do my pipelines keep large amounts of data in memory?
  4. How long should my job take?
    1. For example, do not request 50 hours time for a 15 hour process. Have a reasonable buffer included to account for unexpected processing delays, but do not request the maximum time on a partition if that's unnecessary.

Note

Reasonable overestimation of resources is better than underestimation. However, gross overestimation may cause admins to contact you about adjusting resources for future jobs.

To get the most out of your Cheaha experience and ensure your jobs get through the queue as fast as possible, please read about Job Efficiency.

Faster Queuing with Job Efficiency

Please see our page on Job Efficiency for more information on making the best use of cluster resources to minimize your queue wait times.