Compute jobs on Flux are scheduled using PBS (Portable Batch Systems) software.

An important consideration when creating your PBS script is file input and output. If you reference your input files a lot, it is worth your time to add commands to your PBS script to create a local directory in /tmp, do your work there, then remove it when your job completes. For example,

 mkdir /tmp/$PBS_JOBID
 #
 # Your program runs here
 #
 cd /tmp
 rm -rf $PBS_JOBID

You can copy your files over to that directory before starting your program. PBS collects STDOUT and STDERR from your program and we highly recommend letting PBS take care of your output and not redirecting it to your home directory; when PBS handles the output, it writes yours files to the local disk while it is working, rather than the remote disk mounted on /home; the files are copied to your home directory at the end of the job. Writing to the local disk will improve the performance of your program (/home is accessed over the network, which is almost always slower than files on the disk in the compute node), but it does mean that you cannot see the output from your program while it is running. Also, if there is a problem with the remote file system while your program is running, it will be unaffected and will continue to run. With parallel programs, keep in mind that the local disk is local to each compute node, so if each task needs to read and write files, you need to distribute them to or gather them from all of the compute nodes as necessary. Finally, if your program uses scratch files, it is very worthwhile to set the scratch directory to a local disk (with parallel programs make sure that the data isn’t shared, or this won’t work).

The Queues and Accounts

When you submit a job, there are three things that need to be identified properly for it to run. These are the queue, the account, and the qos. When you get authorized to use the resources associated with a project and an account, you’ll receive e-mail with what you should use for those three attributes. You should check your PBS script carefully before submitting to make sure that it is asking for resources from the right place. In the examples that follow, we’ll assume that the project is example_flux.

Flux

For Flux, the queue is “flux”.

For Flux, the qos will be “flux”

#PBS -A example_flux
#PBS -l qos=flux
#PBS -q flux

Flux large memory

For the Flux large memory machines, the queue is “fluxm”.

For the Flux large memory machines, the qos is “flux”. Note, there is no “m” on the end; just flux.

#PBS -A example_fluxm
#PBS -l qos=flux
#PBS -q fluxm

Resources

Job attributes

You can request specific attributes, such as number of nodes, memory or job runtime. Memory requests should be set per process using the pmem. Set the walltime to a number close but slightly longer than you expect the job to run. To request 2 nodes of 2 processors each, each process using a maximum of 2000 megabytes of memory for 1 hour

#PBS -l nodes=2:ppn=2,pmem=2000mb,walltime=1:00:00

To request memory, you should remember that all the nodes have a little less than the total amount reported by the system. For example, our nodes with 48 GB of memory really only have about 47 GB available to run jobs. So, those nodes with 12 processors would not be able to run 12 jobs all asking for 4 GB of memory – the twelfth would wait because of the system overhead. You should remember this if you are asking for the same number of processors as a node has. It may make the queue time your job waits to start a little shorter if you instead ask for 4000mb instead of 4gb, if you ask per processor. Or, if you are requesting a whole node, you might use, say,

#PBS -l nodes=1:ppn=12,mem=47gb,walltime=1:00:00

to insure that it will be able to run on a node with 12 processors.

Generic resource requests

Many of our licensed software packages are licensed for limited quantities, and so you need to include a request for the licenses you need when you submit your job. Software licenses are tracked by means of what is called a generic resource, which is specified by using the gres option. An example requesting one license for the statistical program SAS might look like

#PBS -l nodes=1,mem=4gb,gres=sas:1,walltime=1:00:00

If you needed to request four Matlab distributed computing toolbox licenses, you would add

gres=matlab_distrib_comp_engine:4

to the #PBS -l line in your PBS script.

The Flux software web page for each package will list the gres needed to request a license, and when you load a module for a package that requires a gres, and example will be printed.

You can see all attributes and resources that can be requested for a job by reading the pbs_resources man page

The Importance of Estimating Your Job’s Runtime

You need to estimate how long your job will run. If you do not estimate the wall clock time required by your run, (e.g. walltime=45:00), PBS will terminate your job after 15 minutes. However, if you specify an excessively long runtime, your job may be delayed in the queue longer than it should be. Therefore, please attempt to accurately estimate your wall clock runtime. (A modest amount of overestimation (10-20%) is probably ideal).

How to Write a PBS Batch Script

Example PBS script to run an MPI-enabled program

An MPI example for user uniqname (using 14 processes) would like something like the following.

#!/bin/sh
#PBS -S /bin/sh
#PBS -N your-mpi-job
#PBS -l procs=14,mem=1gb,walltime=1:00:00
#PBS -A example_flux
#PBS -l qos=flux
#PBS -q flux
#PBS -M uniqname@umich.edu
#PBS -m abe
#PBS -j oe
#PBS -V
#
echo "I ran on:"
cat $PBS_NODEFILE
cd ~/your_stuff

# Use mpirun to run with 14 cores
mpirun ./your-mpi-program

The PBS script parameters

#PBS -S /bin/sh Script is to run under /bin/sh (see below)
#PBS -N your-mpi-job Name of the job in the queue is “your-mpi-job”. This can be anything as long as it is less that 13 characters long; you should make it descriptive so you know which of your jobs are running and queued.
#PBS -l procs=4,walltime=30:00,pmem=1gb Note that this example is not exactly the same as in the example script above. This PBS directive asks for 4 processors (not required to be on the same node), each using 1 GB of memory, with a maximum running time of 30 minutes. Note that pmem=1gb specifies the memory per processor and that is different from the mem=1gb used in the example script above, which specifies 1 GB total memory for the job, that is for all processors combined.
#PBS -A example_flux Specifies that the account to use for this job is example_flux.
#PBS -l qos=flux Note that you can include multiple lines using the -l parameter to make things easier to read and keep organized. Here we specify which qos to use separately from the job attributes like memory and walltime.
#PBS -q flux Submit to the queue named flux. If you are using nyx, then the queue should only be cac. If you are using the large-memory machines, then you would use fluxm.
#PBS -M uniqname@umich.edu Send e-mail to the address uniqname@umich.edu.
#PBS -m abe Send e-mail when the job aborts (a), begins (b), and ends (e).
#PBS -j oe If you do not use the -j option, then PBS will normally create two files for your job: the base name of the files will be the job name, in this case that would be your-mpi-job, followed by the letter o for output or e for error, followed by the job number of your job. So, for example, if you submitted the job above, PBS would create your-mpi-job.o1234567 and your-mpi-job.e1234567, where1234567 is the job number. To join your STDOUT (o) and STDERR (e) into the output (o) file use -j oe; reverse the order (-j eo) to write both to the error (e) file.
#PBS -V Copy the currently loaded modules in your shell environment to the job’s environment

For complete information on PBS flags, you should read the qsub manual page using the man qsub command. For further information on PBS, use man pbs.

MPI (mpirun) parameters

-np Number of processes.
-stdin filename Use filename as standard input.
-t Test but do not execute.

Example PBS script to run an OpenMP-enabled program

#!/bin/sh
#PBS -S /bin/sh
#PBS -N openmp_job
#PBS -l nodes=1:ppn=2,mem=1gb,walltime=90:00
#PBS -A example_flux
#PBS -l qos=flux
#PBS -q flux
#PBS -M uniqname@umich.edu
#PBS -m abe
#PBS -V
#
echo "I ran on:"
cat $PBS_NODEFILE
#
# Create a local directory to run from and copy your files to it.
# Let PBS handle your output
mkdir /tmp/${PBS_JOBID}
cd /tmp/${PBS_JOBID}
cp ~/your_stuff .

export OMP_NUM_THREADS=2
./your-openmp-program

# Clean up your files
cd ~
/bin/rm -rf /tmp/${PBS_JOBID}

Please be aware that for OpenMP jobs, all the processors must be on the same machine, so you must use nodes=1:ppn=X where X is the number of processors you want. You cannot use procs=X as that will not insure processors on the same machine.

Also note that OMP_NUM_THREADS must be less than or equal to the number of processors you request.

You may find it necessary to adjust the stack size if you run low on stack space due to the default stack size of 2 MB

export MPSTKZ 8M

Example: Serial Code

If you have serial code – a program that uses one processor on one machine – just set procs=1.

#PBS -N serial_job
#PBS -l procs=1,walltime=24:00,mem=1gb
#PBS -A example_flux
#PBS -l qos=flux
#PBS -q flux
#PBS -M uniqname@umich.edu
#PBS -m abe
#PBS -V
#
# Let PBS handle your output
sas input.sas

In this script, STDOUT and STDERR will be directed into the files serial_job.oNNNNNNN and serial_job.eNNNNNNN, where NNNNNNN is the job number, because #PBS -joe was not specified.

How to Submit a PBS Batch Script

To submit an PBS script simply type, you use the qsub command.

qsub your-scriptname.pbs

where your-scriptname.pbs is the name of your PBS script. You do not have to use the .pbs extension on the PBS script name, but you may find that it helps to keep things organized to do so.

Note that PBS runs your script under your shell, unless otherwise told to do so. One benefit of running your script with /bin/sh is that /bin/csh and /bin/tcsh are arguably broken in the way it handles terminal-disconnected jobs. Using /bin/csh or /bin/tcsh is fine, but you will receive error warnings at the beginning of your output file:

Warning: no access to stty (Bad file descriptor). Thus no job control in this shell.

How to Check the Status of a PBS Batch Job

To check the status of your job with job number NNNNNNN in the queue, use

qstat NNNNNNN

To see all jobs in the fluxm queue, use

qstat -a fluxm

To see detailed info on the job, use

qstat -f NNNNNNN

Scheduling

We are using the Moab scheduler to implement various scheduling requirements.

How to Cancel a PBS Batch Job

If you realize that you made a mistake in your script file or if you’ve made modifications to your program since you submitted your job and you want to cancel your job, you need the job number (or Job Id). To cancel a job, use

qdel NNNNNNN

If you encounter an error while using qdel, add the -W force flag.

qdel -W force NNNNNNN

How to Submit an Interactive job

There are times when you may require interactive access to resources for purposes outside of the acceptable uses of the login nodes (debugging code, post processing, tests that require large amounts of computing resources or time, etc.). To submit an interactive job to the cluster, use the -I (capital eye) flag with the qsub command:

qsub -I -V -A example_account -l procs=1,qos=example_qos,walltime=24:00:00,pmem=768mb -q flux

This command will submit a request to the resource manager to find you the requested resources and once they’re available you will receive a prompt on the master host of the job. At that point you can proceed as required.

How to Query the PBS Queues

To see the names of the available queues and their current parameters, use

qstat -q

The notable parameters in the output are the queue names (in the Queue column) and the maximum CPU time limits (in the Walltime column).

How to Use Job Dependencies

You can create dependency trees in order to satisfy different requirements for your workflow.  You might need to run jobA and when jobA completes successfully, jobB is available and should run automatically.  You could set up such a dependency in the following way.

First,

qsub jobA

Say the system returns PBS job id XXXXXX. You then use that to set up the dependency. You want jobB to run only if jobA finishes without errors, so you would

qsub -W depend=afterok:XXXXXX jobB

You can do similar operations with job arrays (which are explained in the next section). First,

qsub jobArrayA

The system returns PBS job id XXXXXX[], where the brackets indicate it is an array. Once you have the job id, you can

qsub  -W depend=afterokarray:XXXXXX[] jobArrayB

There are several dependency types (beforeafterafterokafternotok, etc.).  For more examples, see http://docs.adaptivecomputing.com/torque/4-2-7/help.htm#topics/2-jobs/jobSubmission.htm

How to Use Job Arrays

Job arrays in PBS are an easy way to submit multiple similar jobs. The only difference in them is the array index, which you can use in your PBS script to run each task with a different set of parameters, load different data files, or any other operation that requires a unique index.

To submit a PBS job with 10 elements, use the -t option in your PBS script.

#PBS -t 1-10

If your assigned job number is 5432 the elements in the array will be 5432[1]5432[2]5432[3], …, 5432[10]. In each script the environment variable PBS_ARRAYID is set to the numbers 1 through 10.

Note that each array element will appear as a separate job in the queue, and the normal scheduling rules apply to each element.

You can delete individual array elements from the queue by specifying the element number.

qdel 5432[4]

You can also delete the entire array by using the base job number as the argument to the qdel command.

qdel 5432[]

which will delete all remaining array elements.

To view the status of the entire job array, run qstat with the -t option.

qstat -t 5432[]

Suppose you want to run the same program 10 times, each time using a different input file. This is most easily done by naming the input files with a number in it, as in file-1file-2, etc. A sample PBS script that uses this to run the same executable with 10 different input files that you named with the appropriate names would look like:

#!/bin/sh
#PBS -N yourjobname
#PBS -l nodes=1,walltime=05:00:00
#PBS -S /bin/sh
#PBS -M uniqname@umich.edu
#PBS -t 1-10
#PBS -m abe
#PBS -A example_flux
#PBS -l qos=flux
#PBS -q flux
#PBS -j oe
#PBS -V

cd /path/to/my/program
./myprogram -input=file-${PBS_ARRAYID}

For each value specified with -t, it will run the script with that value substituted for PBS_ARRAYID, in this case that is file-1file-2, etc.

For more information on job arrays, please see the Cluster Resources web page on them: http://docs.adaptivecomputing.com/torque/4-2-7/help.htm#topics/2-jobs/jobSubmission.htm