Managing Jobs¶

When jobs are submitted, researchers can monitor their status using Slurm commands. Additionally, researchers can get information about completed jobs regarding their CPU and memory usage during execution for planning future jobs. Both of these cases should be a regular part of using Cheaha for researchers.

In case jobs were submitted by accident or the code was written incorrectly, they can also be cancelled.

Monitoring Queued Jobs With `squeue`¶

Currently running jobs can be monitored using the squeue command. The basic command to list all jobs for a specific researcher is:

squeue -u $USER

The output of squeue will look like:

By default the fields displayed are jobid, partition, jobname as name, BlazerID as user, job state as st, total run time as time, number of nodes as node, and the list of nodes as nodelist, used for each job a researcher has submitted.

For array jobs, the JobID will be formatted as jobid_arrayid.

More information is available at the Official Documentation.

Cancelling Jobs With `scancel`¶

Cancelling queued and currently running jobs can be done using the scancel command. Importantly, this will only cancel jobs that were initiated by the researcher running the command. scancel is very flexible in how it behaves:

# cancel a single job or an entire job array
scancel <jobid>

# cancel specific job array IDs, specified as single number or a range
scancel <jobid_arrayid>

# cancel all jobs on a partition for the user
scancel -p <partition>

# cancel all jobs for a researcher
scancel -u $USER

Warning

Cancelling all jobs will also cancel the interactive jobs created on the Open OnDemand portal.

More information is available at the Official Documentation.

Reviewing Past Jobs With `sacct`¶

If you are planning a new set of jobs and are estimating resource requests, it is useful to review similar jobs that have already completed. To list past jobs for a researcher, use the sacct command. Common use cases and information are detailed below. Full details are available at the Official Documentation.

Tip

To minimize queue wait times and make best use of resources, please review job efficiency using seff. See our Job Efficiency page for more information.

Review Jobs by JobID¶

The basic form is to use -j along with a JobID to list information about that job.

sacct -j <jobid>

You can also review multiple jobs using a comma-separated list of JobIDs.

This command will output basic information such as the ID, Name, Partition, Allocated CPUs, and State for the given JobID.

Jobs can have matching extern and/or batch job entries as well. These are not especially helpful for most researchers. You can remove these entries using the -X flag.

sacct -j <jobid> -X

Review Jobs Submitted Between Specific Timepoints¶

If you do not remember the JobID, you can use the -S and -E flags to retrieve jobs submitted between the given start datetime and end datetime.

For example, to retrieve jobs submitted during the month of July 2021, the command could be:

sacct -S 070121 -E 073121
sacct -S 07/01/21 -E 07/31/21
sacct -S 2021-07-01 -E 2021-07-31

Customizing the Output¶

You can add -o with a list of output fields to customize the information you see.

sacct -o jobid,start,end,state,alloccpu,reqmem

You may also use the format <field>%<width> to make columns be <width> characters wide. This is sometimes necessary for TRES fields and nodelist, among others. An example might be alloctres%40 to make the field 40 characters wide.

This command will output the JobID, the start time, end time, the state, the number of allocated CPUs, and the requested memory for the specified job. All potential output fields can be seen using sacct --helpformat. Their descriptions can be found on the sacct documentation under Job Accounting Fields.

Formatting the Output¶

You can format the output of sacct using a delimiter with the flags --parsable2 and --delimiter=<delim>. Any number of characters may be used as a delimiter. The default is |. It is not recommended to use , as that is used in comma-separated lists throughout sacct fields.

`sacct` Flags¶

Flag	Short	Description	Docs
FILTERING
`--user`	`-u`	Jobs from a specific user. Please only use your own BlazerID.	sacct
`--allocations`	`-X`	Show jobs only, not steps.	sacct
`--starttime`	`-S`	Jobs starting at a given time. See Time formatting.	sacct
`--endtime`	`-E`	Jobs ending at a given time. See Time Formatting.	sacct
`--state`	`-s`	Jobs with a given state. See States.	sacct
`--jobs`	`-j`	Show only the jobids supplied in a comma-separated list.	sacct
FORMATTING
`--format`	`-o`	Show only the Fields supplied in a comma-separated list.	sacct
`--helpformat`	`-e`	Show a list of available Fields.	sacct
`--parsable2`	`-P`	Output as delimited data with `--delimiter` if supplied, default is `\\|`.	sacct
`--delimiter`	n/a	Characters to delimit field values.	sacct
`--json`	n/a	Output as JSON. (Not yet available on Cheaha).	sacct
`--yaml`	n/a	Output as YAML. (Not yet available on Cheaha).	sacct
`--noconvert`	n/a	Keep uniform units, e.g. all M instead of M and G. See Units.	sacct

A complete list of flags is available at Official Documentation.

`sacct` Fields¶

Field	Description	Same As...	Job	Step	Docs
METADATA
jobid	Slurm assigned job ID number.	jobid format	yes	yes	sacct
jobname	User assigned job name.	`--job-name`	yes	yes	sacct
state	Current state of the job.	states	yes	yes	sacct
partition	Partition job was submitted to.	`--partition`	yes	yes	sacct
ntasks	Number of requested tasks.	`--ntasks`	yes	yes	sacct
nodelist	List of nodes used.	`--nodelist` if supplied	yes	yes	sacct
TIME
submit	Submit time as YYYY-MM-DDTHH:MM:SS	n/a	yes	yes	sacct
start	Start time as YYYY-MM-DDTHH:MM:SS	n/a	yes	yes	sacct
end	End time as YYYY-MM-DDTHH:MM:SS	n/a	yes	yes	sacct
elapsed	Elapsed time as DD-HH:MM:SS	n/a	yes	yes	sacct
RESOURCE REQUESTED
reqcpus	CPUs requested.	cpu calculation	yes	yes	sacct
reqmem	Memory requested. Uses 10Gc for per core, 10Gn for per node.	`--mem-per-cpu` or `--mem`	yes	no	sacct
reqnodes	Nodes requested.	`--nodes`	yes	yes	sacct
reqtres	All requested resources. May be used to review GPUs.	tres explanation	yes	yes	sacct
RESOUCES ALLOCATED
alloccpus	CPUs allocated.	cpu calculation	yes	yes	sacct
allocnodes	Nodes allocated	`--nodes`	yes	yes	sacct
alloctres	All allocated resources. May be used to review GPUs.	tres explanation	yes	yes	sacct
averss	Average resident set size (memory) in bytes across tasks.	resident set size	no	yes	sacct
maxrss	Maximum resident set size (memory) in bytes across tasks.	resident set size	no	yes	sacct

A complete list of fields is available at the Official Documentation.

Slurm Common Reference¶

Slurm JobID Formatting¶

JobID numbers are assigned automatically by the scheduler in the order submissions are received. All jobs have a single, unique JobID number associated with them. Some features will cause JobID numbers to be reported differently than their actual value.

For non-array jobs submitted with sbatch, salloc, or with srun outside of a job context, the unique JobID number is reported directly.
For array jobs submitted with sbatch, the array is assigned a master ID like 12345678, and each task is reported as <master-job-id>_<task-id>. An example might be 12345678_987. Each task still has a unique JobID number.
For job steps submitted with srun inside of a job context, the JobID is reported as <job-id>.<task-name>. All jobs submitted generate a .batch step and a .extern step. An example might be 12345678.batch.

Slurm Time Formatting¶

Slurm formats time in two different ways: (1) time points and (2) durations. Time points are used whenever a single point in time is needed, such as the start or end of a job. Durations are needed for job requests and reported for elapsed times.

Units are given a shorthand designations:

YYYY four-digit year.
MM two-digit month or two-digit minutes, depending on placement.
DD two-digit day.
HH two-digit hour.
SS two-digit seconds.
AM|PM literally AM or PM.

Square brackets [] indicate the contents are optional.

Time points may be formatted like any of the following.

HH:MM[:SS][AM|PM]
MMDD[YY][-HH:MM[:SS]]
MM.DD[.YY][-HH:MM[:SS]]
MM/DD[/YY][-HH:MM[:SS]]
YYYY-MM-DD[THH:MM[:SS]]

Duration requests are made like any of the following.

MM[:SS]
[HH:]MM:SS
DD-HH[:MM[:SS]]

Durations are reported like the following.

[DD-[HH:]]MM:SS

Slurm States¶

Job states report on where the job is in the overall Slurm process. If all goes well, you will see jobs move through the following states:

PENDING
RUNNING
A terminal state depending on what happens
1. COMPLETED if the job finished normally and returns exit code zero
2. CANCELLED if the researcher cancels the job
3. FAILED if there is a software error or non-zero exit code
4. TIMEOUT if the job had insufficient time

Other states are possible. A complete list of job states is available at the Official Documentation.

Slurm Units¶

Slurm uses flexible units for memory to keep reports compact. It always prefers the shortest possible representation, and will choose the largest units by default. Other units may be used, and there are flags to allow reporting in uniform units.

The memory units are KMGT for kilo, mega, giga, tera respectively. All are in bytes. Slurm uses the convention that e.g.

\[ \begin{aligned} 1\textrm{T} &=1024\textrm{G}\\ &=1024^{2}\textrm{M}\\ &=1024^{3}\textrm{K} \end{aligned} \]

TRES Explained¶

The abbreviation TRES stands for "trackable resources". Any resource made available by Slurm that is trackable is recorded in the Slurm database and can be recovered using sacct. The fields reqtres and alloctres can be used to review CPUs, memory, nodes and GPUs. The data is stored as a comma-separated list of <resource>=<quantity> pairs, and all values are totals across the entire job, not per node or per task. An example might look like:

billing=8,cpu=8,gres/gpu=2,mem=64G,node=1

RSS Explained¶

The abbreviation RSS stands for "resident set size", and is related to memory usage by jobs in Slurm. Memory usage is challenging to record accurately. Recording memory means a request must be made to the operating system to obtain memory usage at a single point in time, which uses computational resources. There is a balance made between resolution in time, and computational overhead.

The difficulty with recording memory usage contributes to difficulty diagnosing root causes of out of memory errors, bus errors, and segmentation faults.

RSS is recorded by Slurm in the sacct fields averss and maxrss. These values are both reported in bytes, rather than the usual compact memory units.

Slurm Resource Calculations¶

Calculating CPUs¶

\[ \begin{aligned} \textrm{Total CPUs} &=\left(\textrm{--cpus-per-task}\right) \left(\textrm{--ntasks}\right) \left(\textrm{--nodes}\right)\\ &=\left(\frac{\textrm{CPU}}{\textrm{Task}}\right) \left(\frac{\textrm{Task}}{\textrm{Node}}\right) \left(\textrm{Node}\right) \end{aligned} \]

Example:

For a job with --cpus-per-task=16 --ntasks=2 --nodes=3:

\[ \begin{aligned} \textrm{Total CPUs} &=16\times 2\times 3\\ &=96 \end{aligned} \]

Calculating Memory¶

\[ \begin{aligned} \textrm{Total Memory} &=\left(\textrm{--mem}\right) \left(\textrm{--nodes}\right)\\ &=\left(\frac{\textrm{Memory}}{\textrm{Node}}\right) \left(\textrm{Node}\right)\\ \\ \textrm{Total Memory} &=\left(\textrm{--mem-per-cpu}\right) \left(\textrm{--cpus-per-task}\right) \left(\textrm{--ntasks}\right) \left(\textrm{--nodes}\right)\\ &=\left(\frac{\textrm{Memory}}{\textrm{CPU}}\right) \left(\frac{\textrm{CPU}}{\textrm{Task}}\right) \left(\frac{\textrm{Task}}{\textrm{Node}}\right) \left(\textrm{Node}\right) \end{aligned} \]

Examples:

For a job with --mem=40G --nodes=2:

\[ \begin{aligned} \textrm{Total Memory} &=\left(\textrm{--mem}\right) \left(\textrm{--nodes}\right)\\ &=40\textrm{G}\times 2\\ &=80\textrm{G} \end{aligned} \]

For a job with --mem-per-cpu=10G --cpus-per-task=8 --ntasks=2 --nodes=2:

\[ \begin{aligned} \textrm{Total Memory} &=\left(\textrm{--mem-per-cpu}\right) \left(\textrm{--cpus-per-task}\right) \left(\textrm{--ntasks}\right) \left(\textrm{--nodes}\right)\\ &=10\textrm{G}\times 8\times 2\times 2\\ &=320\textrm{G} \end{aligned} \]