Slurm: Cheatsheet / Summary

  • Submit batch jobs: sbatch.

  • Run/launch a parallel job: srun.

  • Request interactive job sessions: sinteractive.

  • Cancelling jobs:
    • Cancel a single job: scancel jobid or scancel -n jobname.

    • Cancel all your jobs: scancel -u username.

    • Cancel all your pending jobs:scancel -t PD.

  • Job control and monitoring: scontrol, squeue and showq.

  • Currently running job detailed status: sstat and sjob.

  • Job usage summary: jobreport.

  • Nodes info and cluster status: sinfo, showbf and qinfo -v.

  • Job and job steps accounting data: sacct.

Slurm keywords

Some common sbatch Options/Directives

Short Format

Long Format

Description

-N count

--nodes=count

Used to allocate [count] nodes to your job.

N/A

--ntasks-per-node=count

Use [count] of MPI tasks per node

-c count

--cpus-per-task=count

Set the value as "number of of logical cores (CPUs) per MPI task". Do not set this usually. Defaults to 1.

-t DD-HH:MM:SS

--time=DD-HH:MM:SS

Always specify the maximum wallclock time for your job.

N/A

--mem=count

Allow your job to use up to [count] MB of memory on each node in your job.

N/A

--mem-per-cpu=count

Allow your job to use up to [count] MB of memory for each cpu in your job.

N/A

--tmp=X

eg. --tmp=20GB. Request temporary file space on the local disk (SSD or NVMe) on each node in your job. The environment variable $JOBFS points to this directory.

-J job_name

--job-name=job_name

job_name: up to 15 printable, non-whitespace characters.

-e filename

--error=filename

Write STDERR to filename

-o filename

--output=filename

Write STDOUT to filename. By default both standard output and standard error are directed to a file of the name "slurm-%j.out", where the "%j" is replaced with the job id. See the -i option for filename specification options.

-i filename_pattern

--input=filename_pattern

Instruct SLURM to connect the batch script's standard input directly to the file name specified in the "filename pattern".The filename pattern may contain one or more replacement symbols, which are a percent sign "%" followed by a letter (e.g. %j).Supported replacement symbols are:%jJob id.%NNode name. Only one file is created, so %N will be replaced by the name of the first compute node in the job, which is the one that runs the script.

N/A

--mail-type=events --mail-user=address

Valid event values are: BEGIN, END, FAIL, REQUEUE, ALL (equivalent to BEGIN, END, FAIL, REQUEUE, and STAGE_OUT), STAGE_OUT (burst buffer stage out completed), TIME_LIMIT, TIME_LIMIT_90 (reached 90 percent of time limit), TIME_LIMIT_80 (reached 80 percent of time limit), and TIME_LIMIT_50 (reached 50 percent of time limit). Multiple type values may be specified in a comma separated list. The user to be notified is indicated with --mail-user. Mail notifications on job BEGIN, END and FAIL apply to a job array as a whole rather than generating individual email messages for each task in the job array.

-D directory_name

--workdir=directory_name

Set the working directory of the batch script to directory_name before it is executed. The path can be specified as full path or relative path to the directory where the sbatch command is executed.

See https://slurm.schedmd.com/sbatch.html for more information.