Slurm: basics

Brief Command Summary

  • Job control and monitoring: scontrol and squeue and showq.
  • Submit batch jobs: sbatch yourJobScript.
  • Request interactive job sessions: through sinteractive.
  • Launch/running a job: srun.
  • Nodes info and cluster status: sinfo.
  • Currenly running job detailed status: sstat.
  • Job and job steps accounting data: sacct.
  • Cancel a job: scancel jobid.
    • Cancel all your jobs: scancel -u username.
    • Cancel all your pending jobs:scancel -t PD.

Slurm keywords

Some common sbatch Options/Directives

Short Format Long Format Description
-N count --nodes=count Used to allocate [count] nodes to your job.
N/A --ntasks-per-node=count Use [count] of MPI tasks per node
-c count --cpus-per-task=count Set the value as "number of of logical cores (CPUs) per MPI task". Do not set this usually. Defaults to 1.
-t DD-HH:MM:SS --time=DD-HH:MM:SS Always specify the maximum wallclock time for your job.
N/A --mem=count Allow your job to use up to [count] MB of memory on each node in your job.
N/A --mem-per-cpu=count Allow your job to use up to [count] MB of memory for each cpu in your job.
N/A --tmp=X eg. --tmp=20GB. Request temporary file space on the local disk (SSD or NVMe) on each node in your job. The environment variable $JOBFS points to this directory.
-J job_name --job-name=job_name job_name: up to 15 printable, non-whitespace characters.
-e filename --error=filename Write STDERR to filename
-o filename --output=filename Write STDOUT to filename. By default both standard output and standard error are directed to a file of the name "slurm-%j.out", where the "%j" is replaced with the job id. See the -i option for filename specification options.
-i filename_pattern --input=filename_pattern Instruct SLURM to connect the batch script's standard input directly to the file name specified in the "filename pattern".The filename pattern may contain one or more replacement symbols, which are a percent sign "%" followed by a letter (e.g. %j).Supported replacement symbols are:%jJob id.%NNode name. Only one file is created, so %N will be replaced by the name of the first compute node in the job, which is the one that runs the script.
N/A --mail-type=events --mail-user=address Valid event values are: BEGIN, END, FAIL, REQUEUE, ALL (equivalent to BEGIN, END, FAIL, REQUEUE, and STAGE_OUT), STAGE_OUT (burst buffer stage out completed), TIME_LIMIT, TIME_LIMIT_90 (reached 90 percent of time limit), TIME_LIMIT_80 (reached 80 percent of time limit), and TIME_LIMIT_50 (reached 50 percent of time limit). Multiple type values may be specified in a comma separated list. The user to be notified is indicated with --mail-user. Mail notifications on job BEGIN, END and FAIL apply to a job array as a whole rather than generating individual email messages for each task in the job array.
-D directory_name --workdir=directory_name Set the working directory of the batch script to directory_name before it is executed. The path can be specified as full path or relative path to the directory where the sbatch command is executed.