Migrating from Torque to Slurm

Job submission options:

Option Torque (qsub) Slurm (sbatch)
Script directive #PBS #SBATCH
Job name -N <name> --job-name=<name>

-J <name>

Account -A <account> --account=<account>

-A <account>

Queue -q <queue> --partition=<queue>
QOS -l qos=<qos> --qos=<qos>
Wall time limit -l walltime=<hh:mm:ss> --time=<hh:mm:ss>
Node count -l nodes=<count> --nodes=<count>

-N <count>

Process count per node -l ppn=<count> --ntasks-per-node=<count>
core count (per process) --cpus-per-task=<cores>
Memory limit -l mem=<limit> --mem=<limit> (Memory per node in mega bytes – MB)
Minimum memory per processor -l pmem=<limit> --mem-per-cpu=<memory>
Request GPUs -l gpus=<count> --gres=gpu:<count>
Request specific nodes -l nodes=<node>[,node2[,...]]> -w, --nodelist=<node>[,node2[,...]]>

-F, --nodefile=<node file>

Job array -t <array indices> --array <indexes>

-a <indexes>

Where <indexes> is replaced by a range (0-15), a list (0, 6, 16-32), or a step function (0-15:4)

Standard output file -o <file path> --output=<file path> (path must exist)
Standard error file -e <file path> --error=<file path> (path must exist)
Combine stdout/stderr to stdout -j oe --output=<combined out and err file path>
Copy environment -V --export=ALL (default)

--export=NONE to not export environment

Copy environment variable -v <variable[=value][,variable2=value2[,...]]> --export=<variable[=value][,variable2=value2[,...]]>
Job dependency -W depend=after:jobID[:jobID...]

-W depend=afterok:jobID[:jobID...]

-W depend=afternotok:jobID[:jobID...]

-W depend=afterany:jobID[:jobID...]





Request event notification -m <events>


Note: multiple mail-type requests may be specified in a comma separated list:


Email address -M <email address> --mail-user=<email address>
Defer job until the specified time -a <date/time> --begin=<date/time>
Node exclusive job qsub -n --exclusive

Job environment

The Slurm system will propagate the module environment of a users current environment (the environment of the shell from which a user calls sbatch) through to the worker nodes, with some exceptions noted in the following test. By default Slurm does not source the files ~./bashrc or ~/.profile when requesting resources via sbatch (although it does when running srun / salloc ).  So, if you have a standard environment that you have set in either of these files or your current shell then you can do one of the following:

  1. Add the command #SBATCH --get-user-env to your job script (i.e. the module environment is propagated).
  2. Source the configuration file in your job script:
    Sourcing your .bashrc file
    < #SBATCH statements >
    source ~/.bashrc
  3. You may want to remove the influence of any other current environment variables by adding #SBATCH --export=NONE to the script. This removes all set/exported variables and then acts as if #SBATCH --get-user-env has been added (module environment is propagated).

Common job commands:

Command Torque Slurm
Submit a job qsub <job script> sbatch <job script>
Delete a job qdel <job ID> scancel <job ID>
Job status (all) qstat


Job status (by job) qstat <job ID> squeue -j <job ID>
Job status (by user) qstat -u <user> squeue -u <user>
Job status (detailed) qstat -f <job ID>

checkjob <job ID>

scontrol show job -dd <job ID>
Show expected start time showstart <job ID> squeue -j <job ID> --start
Queue list / info qstat -q [queue] scontrol show partition [queue]
Node list pbsnodes -a scontrol show nodes
Node details pbsnodes <node> scontrol show node <node>
Hold a job qhold <job ID> scontrol hold <job ID>
Release a job qrls <job ID> scontrol release <job ID>
Cluster status qstat -B sinfo
Start an interactive job qsub -I <args> salloc <args>

srun --pty <args>

X forwarding qsub -l -X <args> srun --pty <args> --x11(Update with --x11 once 17.11 is released)
Read stdout messages at runtime qpeek <job ID> No equivalent command / not needed. Use the --output option instead.
Monitor or review a job’s resource usage sacct -j <job_num> --format JobID,jobname,NTasks,nodelist,CPUTime,ReqMem,Elapsed 


(see sacct for all format options)

View job batch script scontrol write batch_script <jobID> [filename]

Info Torque Slurm Notes
Version $PBS_VERSION Can extract from sbatch --version
Batch or interactive $PBS_ENVIRONMENT
Batch server $PBS_SERVER
Submit directory $PBS_O_WORKDIR $SLURM_SUBMIT_DIR Slurm jobs starts from the submit directory by default.
Node file $PBS_NODEFILE A filename and path that lists the nodes a job has been allocated
Node list cat $PBS_NODEFILE $SLURM_JOB_NODELIST The Slurm variable has a different format to the PBS one.

To get a list of nodes use:

scontrol show hostnames $SLURM_JOB_NODELIST

Job array index $PBS_ARRAYID


$SLURM_ARRAY_TASK_ID Only set when submitting a job array (with -a or –array)
Number of nodes allocated $PBS_NUM_NODES $SLURM_JOB_NUM_NODES


Number of processes $PBS_NP $SLURM_NTASKS
Number of processes per node $PBS_NUM_PPN $SLURM_TASKS_PER_NODE
List of allocated GPUs $PBS_GPUFILE
Requested tasks per node $SLURM_NTASKS_PER_NODE
Requested CPUs per task $SLURM_CPUS_PER_TASK
Scheduling priority $SLURM_PRIO_PROCESS
Hostname $HOSTNAME $HOSTNAME == $SLURM_SUBMIT_HOST Unless a shell is invoked on an allocated resource, the HOSTNAME variable is propagated (copied) from the submit machine environments will be the same on all allocated nodes.

Common commands:

Command Torque Slurm
View accounts you can submit to mdiag -u <uniqname>

sacctmgr show assoc user=$USER

View users with access to an account mdiag -a <account> sacctmgr show assoc account=<account>
View default submission account and wckey

sacctmgr show User <account>

View standard out and error of running job qpeek <jobID>

See the job output section of the User Guide for your cluster: Beta or Great Lakes

View standard out and error of running jobs

On clusters running Torque the job standard output and error is written to a file on the compute node. The command `qpeek` is used on the current clusters to connect you to the file on the compute node so you can see progress while the job is running. The output and error files are copied to the final location when the job completes.

Slurm, on the other hand, writes all job output to the final location from start to finish of the job. Once the job starts, and output begins to be written, the file will appear, and you can use the `tail` command to look at its contents. This is what `qpeek` does, but it first connects to the compute node. There will be some delays in output generation because of buffering (it will try to accumulate output into a block that is efficient to write to disk) and because of some delay when a file is written to network disk on one node becoming visible on another. For further details, see the job output section of the User Guide for your cluster: Beta or Great Lakes.

MAM vs Slurm Accounting, Billing and Resource Limits

Moab Accounting Manager (MAM) and Slurm handle billing, accounting, and resource limits in different ways.

In MAM, there are accounts which contain groups of users. These accounts have one or more allocations associated with them. Each allocation has a balance of CPU seconds which was placed in the allocation at creation time. When a job is run on an account, MAM debits the amount of CPU seconds that the job requests from the balance of an associated allocation. If there are no allocations associated with the account with a sufficient balance to run the job to completion, the job will not start. In order to run more jobs, an additional allocation with an additional balance must be created.

In the ARC-TS implementation of Slurm, there are also accounts which contain groups of users. However, there are no allocations associated with Slurm*. Instead, users simply submit their jobs and their account is billed for what they use at the end of the month. Account owners or their designee may request monthly spending limits to prevent accidental spending.

*At least not with accounting and billing. Slurm does use the term allocation to refer to a set of resources allocated in a job.


Slurm Definitions:

Job/Allocation: a set of resources assigned to an individual user for a specified amount of time.
Job Step: sets of (possibly parallel) tasks within a job
Trackable RESources (TRES): resources for which usage is tracked on the cluster (CPU, Memory, GPU).


Adapted from: https://confluence.csiro.au/display/SC/Reference+Guide%3A+Migrating+from+Torque+to+SLURM