Utility programs

ARC-TS makes available a number of local utilities written specifically to help users of the Flux and Armis clusters with common tasks.

cancel-my-jobs

Deletes all jobs belonging to the user who runs it.  This command is basically the same as

$ qdel $(qselect -u ${USER})

but it also does some error checking and does not try to delete jobs that have already finished.

Usage:
$ cancel-my-jobs

 

freealloc

Displays free resources (unused cores and memory) for an allocation.  This is useful to find out if resources are currently available for a job to start right away, or if a job will wait in the queue for resources to become available.

Usage:
$ freealloc [-h] [--jobs] allocation_name

Required arguments:
allocation_name        Name of the allocation

Optional arguments:
-h, --help                  show this help message and exit
--jobs                      display core and memory usage for each job

Example:
$ freealloc lsa_flux
59 of 120 cores in use, 61 cores available
254 GB of 480 GB memory in use, 226 GB memory available

 

freenodes

Displays information about how many nodes and how many cores are available and in use for each cluster service (Standard Flux, Larger Memory Flux, and so on).  This is useful for determining how busy a service is, overall.  In rare cases where there are resources (cores and memory) available in an allocation but the service is busy, jobs may wait in the queue for jobs running in other allocations to finish and return the resources they are using.

Usage:
$ freenodes

 

idlenodes

Displays idle whole nodes with a given property/feature (empty output means that no such nodes of the requested type are idle).  This is useful for researchers who request whole nodes of a specific type, and allows them to know if their job will start right away or if it will wait in the queue for resources to become available.  Also, Flux Operating Environment users can use idlenodes --all to display the status of all of the nodes that belong to them.  See http://arc-ts.umich.edu/software/torque/job-requirements/ for more information about node properties.

Usage:
$ idlenodes [-h] [-a] account_name [property_name]

Arguments:
account_name   display nodes usable by this account/allocation
property_name  (optional) display only nodes having this property

Optional arguments:
-h, --help     show this help message and exit
-a, --all      display all nodes, not just idle ones

Examples:
Show completely idle Haswell (24 core) nodes usable by jobs submitted to lsa_flux
$ idlenodes lsa_flux haswell

Show the state of all nodes usable by lsa_fluxm
$ idlenodes --all lsa_fluxm

Show that status of all Glotzer group nodes with K20X GPUs $ idlenodes –all sglotzer_fluxoe k20x

 

maxwalltime or tto

Displays the maximum walltime that you can request in a job and have the job start right away (assuming that the other resources the job requests are available).  This is useful in the time leading up to cluster maintenances and upgrades; jobs that request more walltime than remains before the maintenance starts will wait in the queue until after the maintenance has been completed.  “tto”, which stands for “time to outage” is another name for the “maxwalltime” command.

Usage:
$ maxwalltime

 

try

Run a command until it succeeds (as determined by the exit code the command returns).  This is useful in several situations, such as when the cluster scheduler is having problems.

Usage:
$ try 
Example:
$ try freealloc lsa_flux