Managing software with Lmod

By |

Why software needs managing

Almost all software requires that you modify your environment in some way. Your environment consists of the running shell, typically bash on Flux, and the set of environment variables that are set. The most familiar environment variable ot most people is the PATH variable, which lists all the directories in which the shell will search for a command, but there may be many others, depending on the particular software package.

Beginning in July 2016, Flux uses a program called Lmod to resolve the changes needed to accommodate having many versions of the same software installed. We use Lmod to help manage conflicts among the environment variables across the spectrum of software packages. Lmod can be used to modify your own default environment settings, and it is also useful if you install software for your own use.

Basic Lmod usage

Listing, loading, and unloading modules

Lmod provides the module command, an easy mechanism for changing the environment as needed to add or remove software packages from your environment.

This should be done before submitting a job to the cluster and not from within a PBS submit script.

A module is a collection of environment variable settings that can be loaded or unloaded. When you first log into Flux, the system will look to see if you have defined a default module set, and if you have, it will restore that set of modules. See below for information about module sets and how to create them. To see which modules are currently loaded, you can use the command

$ module list

Currently Loaded Modules:
  1) intel/16.0.3   2) openmpi/1.10.2/intel/16.0.3   3) StdEnv

We try to make the names of the modules as close to the official name of the software as we can, so you can see what is available by using, for example,

$ module av matlab

------------------------ /sw/arcts/centos7/modulefiles -------------------------

Use "module spider" to find all possible modules.
Use "module keyword key1 key2 ..." to search for all possible modules matching
any of the "keys".

where av stands for avail (available). To make the software found available for use, you use

$ module load matlab

(you can also use add instead of load, if you prefer.) If you need to use software that is incompatible with Matlab, you would remove it using

$ module unload matlab

More ways to find modules

In the output from module av matlab, module suggests a couple of alternate ways to search for software. When you use module av, it will match the search string anywhere in the module name; for example,

$ module av gcc

------------------------ /sw/arcts/centos7/modulefiles -------------------------
   fftw/3.3.4/gcc/4.8.5                          hdf5-par/1.8.16/gcc/4.8.5
   fftw/3.3.4/gcc/4.9.3                   (D)    hdf5-par/1.8.16/gcc/4.9.3 (D)
   gcc/4.8.5                                     hdf5/1.8.16/gcc/4.8.5
   gcc/4.9.3                                     hdf5/1.8.16/gcc/4.9.3     (D)
   gcc/5.4.0                              (D)    openmpi/1.10.2/gcc/4.8.5
   gromacs/5.1.2/openmpi/1.10.2/gcc/4.9.3        openmpi/1.10.2/gcc/4.9.3
   gromacs/5.1.2/openmpi/1.10.2/gcc/5.4.0 (D)    openmpi/1.10.2/gcc/5.4.0  (D)

   D:  Default Module

However, if you are looking for just gcc, that is more than you really want. So, you can use one of two commands. The first is

$ module spider gcc

      GNU compiler suite


     Other possible modules matches:
        fftw/3.3.4/gcc  gromacs/5.1.2/openmpi/1.10.2/gcc  hdf5-par/1.8.16/gcc  ...

  To find other possible module matches do:
      module -r spider '.*gcc.*'

  For detailed information about a specific "gcc" module (including how to load
the modules) use the module's full name.
  For example:

     $ module spider gcc/5.4.0

That is probably more like what you are looking for if you really are searching just for gcc. That also gives suggestions for alternate searching, but let us return to the first set of suggestions, and see what we get with keyword searching.

At the time of writing, if you were to use module av to look for Python, you would get this result.

[bennet@flux-build-centos7 modulefiles]$ module av python

------------------------ /sw/arcts/centos7/modulefiles -------------------------

However, we have Python distributions that are installed that do not have python as part of the module name. In this case, module spider will also not help. Instead, you can use

$ module keyword python

The following modules match your search criteria: "python"

  anaconda2: anaconda2/4.0.0
    Python 2 distribution.

  anaconda3: anaconda3/4.0.0
    Python 3 distribution.

  epd: epd/7.6-1
    Enthought Python Distribution

  python-dev: python-dev/3.5.1
    Python is a general purpose programming language

To learn more about a package enter:

   $ module spider Foo

where "Foo" is the name of a module

To find detailed information about a particular package you
must enter the version if there is more than one version:

   $ module spider Foo/11.1

That displays all the modules that have been tagged with the python keyword or where python appears in the module name.

More about software versions

Note that Lmod will indicate the default version in the output from module av, which will be loaded if you do not specify the version.

$ module av gromacs

------------------------ /sw/arcts/centos7/modulefiles -------------------------
   gromacs/5.1.2/openmpi/1.10.2/gcc/5.4.0 (D)

   D:  Default Module

When loading modules with complex names, for example, gromacs/5.1.2/openmpi/1.10.2/gcc/5.4.0, you can specify up to the second-from-last element to load the default version. That is,

$ module load gromacs/5.1.2/openmpi/1.10.2/gcc

will load gromacs/5.1.2/openmpi/1.10.2/gcc/5.4.0

To load a version other than the default, specify the version as it is displayed by the module av command; for example,

$ module load gromacs/5.1.2/openmpi/1.10.2/gcc/4.9.3

When unloading a module, only the base name need be given; for example, if you loaded either gromacs module,

$ module unload gromacs

Module prerequisites and named sets

Some modules rely on other modules. For example, the gromacs module has many dependencies, some of which conflict with the default modules. To load it, you might first clear all modules with module purge, then load the dependencies, then finally load gromacs.

$ module list
Currently Loaded Modules:
  1) intel/16.0.3   2) openmpi/1.10.2/intel/16.0.3   3) StdEnv

$ module purge
$ module load gcc/5.4.0 openmpi/1.10.2/gcc/5.4.0 boost/1.61.0 mkl/11.3.3
$ module load gromacs/5.1.2/openmpi/1.10.2/gcc/5.4.0
$ module list
Currently Loaded Modules:
  1) gcc/5.4.0                  4) mkl/11.3.3
  2) openmpi/1.10.2/gcc/5.4.0   5) gromacs/5.1.2/openmpi/1.10.2/gcc/5.4.0
  3) boost/1.61.0

That’s a lot to do each time. Lmod provides a way to store a set of modules and give it a name. So, once you have the above list of modules loaded, you can use

$ module save my_gromacs

to save the whole list under the name my_gromacs. We recommend that you make each set fully self-contained, and that you use the full name/version for each module (to prevent problems if the default version of one of them changes), then use the combination

$ module purge
$ module restore my_gromacs
Restoring modules to user's my_gromacs

To see a list of the named sets you have (which are stored in ${HOME}/.lmod.d, use

$ module savelist
Named collection list:
  1) my_gromacs

and to see which modules are in a set, use

$ module describe my_gromacs
Collection "my_gromacs" contains: 
   1) gcc/5.4.0                   4) mkl/11.3.3
   2) openmpi/1.10.2/gcc/5.4.0    5) gromacs/5.1.2/openmpi/1.10.2/gcc/5.4.0
   3) boost/1.61.0

How to get more information about the module and the software

We try to provide some helpful information about the modules. For example,

$ module help openmpi/1.10.2/gcc/5.4.0
------------- Module Specific Help for "openmpi/1.10.2/gcc/5.4.0" --------------

OpenMPI consists of a set of compiler 'wrappers' that include the appropriate
settings for compiling MPI programs on the cluster.  The most commonly used
of these are


Those are used in the same way as the regular compiler program, for example,

    $ mpicc -o hello hello.c

will produce an executable program file, hello, from C source code in hello.c.

In addition to adding the OpenMPI executables to your path, the following
environment variables set by the openmpi module.


For some generic information about the program you can use

$ module whatis openmpi/1.10.2/gcc/5.4.0
openmpi/1.10.2/gcc/5.4.0      : Name: openmpi
openmpi/1.10.2/gcc/5.4.0      : Description: OpenMPI implementation of the MPI protocol
openmpi/1.10.2/gcc/5.4.0      : License information:
openmpi/1.10.2/gcc/5.4.0      : Category: Utility, Development, Core
openmpi/1.10.2/gcc/5.4.0      : Package documentation:
openmpi/1.10.2/gcc/5.4.0      : ARC examples: /scratch/data/examples/openmpi/
openmpi/1.10.2/gcc/5.4.0      : Version: 1.10.2

and for information about what the module will set in the environment (in addition to the help text), you can use

$ module show openmpi/1.10.2/gcc/5.4.0
[ . . . .  Help text edited for space -- see above . . . . ]
whatis("Name: openmpi")
whatis("Description: OpenMPI implementation of the MPI protocol")
whatis("License information:")
whatis("Category: Utility, Development, Core")
whatis("Package documentation:")
whatis("ARC examples: /scratch/data/examples/openmpi/")
whatis("Version: 1.10.2")

where the lines to attend to are the prepend_path(), setenv(), and prereq(). There is also an append_path() function that you may see. The prereq() function sets the list of other modules that must be loaded before the one being displayed. The rest set or modify the environment variable listed as the first argument; for example,

prepend_path("PATH", "/sw/arcts/centos7/openmpi/1.10.2-gcc-5.4.0/bin")

adds /sw/arcts/centos7/openmpi/1.10.2-gcc-5.4.0/bin to the beginning of the PATH environment variable.

Interactive PBS jobs

By |

You can request an interactive PBS job for any activity for which the environment needs to be the same as for a batch job. Interactive jobs are also what you should use if you have something that needs more resources than are appropriate to use on a login node. When the interactive job starts, you will get a prompt, and you will have access to all the resources assigned to the job. This is a requirement to test or debug, for example, MPI jobs that run across many nodes.

Submitting an interactive job

There are two ways you can submit an interactive job to PBS: By including all of the options on the command line, or by listing the options in a PBS script and submitting the script with the command-line option specifying an interactive job.

Submitting an interactive job from the command line

To submit from the command line, you need to specify all of the PBS options that would normally be specified by directives in a PBS script. The translation from script to command line is simply to take a line, say,

#PBS -A example_flux

remove the #PBS, and the rest is the option you should put on the command line. More options will be needed, but that would lead to

$ qsub -A example_flux

For an interactive job, several options that are appropriate in a PBS script may be left off. Since you will have a prompt, you probably don’t need to use the options to send you mail about job status. The options that must be included include the accounting options, the resource options for number of nodes, processors, memory, and walltime, and the -V option to insure that all the nodes get the correct environment. The -I flag signals that the job should run as an interactive job. (Note: in the example that follows, the character indicates that the following line is a continuation of the one on which it appears.)

$ qsub -I -V -A example_flux -q flux 
   -l nodes=2:ppn=2,pmem=1gb,walltime=4:00:00,qos=flux

The above example requests an interactive job, using the account example_flux and two nodes with two processors, each processor with 1 GB of memory, for four hours. The prompt will change to something that says the job is waiting to start, followed by a prompt on the first of the assigned nodes.

qsub: waiting for job to start
[grundoon@nyx5555 ~]$

If at some point before the interactive job has started you decide you do not want to use it, Ctrl-C will cancel it, as in

^CDo you wish to terminate the job and exit (y|[n])? y
Job is being deleted

When you have completed the work for which you requested the interactive job, you can just logout of the compute node, either with exit or with logout, and you will return to the login node prompt.

[grundoon@nyx5555 ~]$ exit
[grundoon@flux-login1 ~]$

Submitting an interactive job using a file

To recreate the same interactive job as above, you could create a file, say interactive.pbs, with the following lines in it

#PBS -A example_flux
#PBS -l qos=flux
#PBS -q flux

#PBS -l nodes=2:ppn=2,pmem=1gb,walltime=4:00:00

then submit the job using

$ qsub -I interactive.pbs

Linking libraries with applications

By |

Using external libraries with compiled programs

Libraries are collections of functions that are already compiled and that can be included in your program without your having to write the functions yourself or compile them separately.

Why you might use libraries

Saving yourself time by not having to write the functions is one obvious reason to use a library. Additionally, many of the libraries focus on high performance and accuracy. Many of the libraries are very well-tested and proven. Others can add parallelism to computationally intensive functions without you having to write your own parallel code. In general, libraries can provide significant performance or accuracy dividends with a relatively low investment of time. They can also be cited in publications to assure readers that the fundamental numerical components of your work are fully tested and stable.

Compiling and linking with libraries

To use libraries you must link them with your own code. When you write your own code, the compiler turns that into object code, which is understandable by the machine. Even though most modern compilers hide it from you, there is a second step where the object code it created for you must be glued together with all the standard functions you include, and any external libraries, and that is called linking.When linking libraries that are not included with your compiler, you must tell the compiler/linker where to find the file that contains the library – typically .so and/or .a files. For libraries that require prototypes (C/C++, etc.) you must also tell the preprocessor/compiler where to find the header (.h) files. Fortran modules are also needed, if you are compiling Fortran code.

Environment variables from the module

When we install libraries on Flux, we usually create modules for them that will set the appropriate environment variables to make it easier for you to provide the right information to the compiler and the linker.The naming scheme is, typically, a prefix indicating the library, for example, FFTW, followed by a suffix to indicate the variable–s function, for example, _INCLUDE for the directory containing the header files. So, for example, the module for FFTW3 includes the variables FFTW_INCLUDE and FFTW_LIB for the include and library directories, respectively. We also, typically, set a variable to the top level of the library path, for example, FFTW_ROOT. Some configuration schemes want that and infer the rest of the directory structure relative to it.Libraries can often be tied to specific versions of a compiler, so you will want to run

$ module av

to see which compilers and versions are supported.One other variable that is often set by the library module is the LD_LIBRARY_PATH variable, which is used when you run the program to tell it where to find the libraries needed at run time. If you compile and link against an external library, you will almost always need to load the library module when you want to run the program so that this variable gets set.To see the variable names that a module provides you can use the show option to the module command to show what is being set by the module. Here is an edited example of what that would print if you were to run it for FFTW3.

[markmont@flux-login2 ~]$ module show fftw/3.3.4/gcc/4.8.5
FFTW consists of libraries for computation of the discrete Fourier transform
in one or more dimensions.  In addition to adding entries to the PATH, MANPATH,
and LD_LIBRARY_PATH, the following environment variables are created.

    FFTW_ROOT       The root of the FFTW installation folder
    FFTW_INCLUDE    The FFTW3 include file folder
    FFTW_LIB        The FFTW3 library folder, which includes single (float),
                    double, and long-double versions of the library, as well
                    as OpenMP and MPI versions.  To use the MPI libary, you
                    must load the corresponding OpenMPI module.

An example of usage of those variables on a compilation command is, for gcc and

    $ gcc -o fftw3_prb fftw3_prb-c -I${FFTW_INCLUDE} -L${FFTW_LIB} -lfftw3 -lm
    $ icc -o fftw3_prb fftw3_prb-c -I${FFTW_INCLUDE} -L${FFTW_LIB} -lfftw3 -lm

whatis("Name: fftw")
whatis("Description: Libraries for computation of discrete Fourier transform.")
whatis("License information:")
whatis("Category: Library, Development, Core")
whatis("Package documentation:")
whatis("Version: 3.3.4")

[markmont@flux-login2 ~]$

In addition to the environment variables being set, the show option also displays the names of other modules with which FFTW3 conflicts (in this case, just itself), and there may be links to documentation and the vendor web site (not shown above).

Compile and link in one step

Here is an example of compiling and linking a C program with the FFTW3 libraries.

gcc -I$FFTW_INCLUDE -L$FFTW_LIB mysource.c -lfftw3 -o myprogram

Here is a breakdown of the components of that command.

  • -I$FFTW_INCLUDE The -I option to the compiler indicates a location for header files and, in this case, points to a directory that holds the fftw3.h header file.
  • -L$FFTW_LIB The -L compiler option indicates a library location and, in this case, points to a directory that holds the libfftw3.a and files, which are the library files. Note, you will want to make sure that the -L option precedes the -l option.
  • mysource.c This is the source code that refers to the FFTW3 library functions; that is, your program.
  • -lfftw3 The -l compiler option indicates the name of a library that contains a function referenced in the source code. The compiler will look through the standard library (linker) paths the compiler came with, then the ones added with -L, and it wil link the first libfftw3.* file that it finds (that will be if you are specifying dynamic linking and libfftw3.a if you are statically linking).
  • -o myprogram The -o option is followed by the name of the final, executable file, in this case myprogram.

Compile and link in multiple steps

Sometimes you will need or want to compile some files without creating the final executable program, for example, if you have many smaller source files that all combine to make a complete executable. Here is an example.

gcc -c -I$FFTW_INCLUDE source1.c 
gcc -c -I$FFTW_INCLUDE source2.c 
gcc -L$FFTW_LIB source1.o source2.o -o myprogram -lfftw3

The -c compiler option tells the compiler to compile an object file only. Note that only the -I option is needed if you are not linking. The header files are needed to create the object code, which contain references to the functions in the library.The last line does not actually compile anything, rather, it links the components. The -L and -l options are the same as on the one-step compilation and linkage command and specifies where the binary library files are located. The -o option specifies the name of the final executable, in this case source.The location of the header files are only needed before linking. Thus the -I flags can be left off for the final step. The same is true for the -L and -l flags, which are only needed for the final link step, and so can be left off the compilation. Note that all the object files to be linked need to be named.

You will typically see this method used in large, complex projects, with many functions spread across many files with lots of interdepenencies. This method minimizes the amount of time it takes to recompile and relink a program if only a small part of it is changed. This is best managed with make and make files.

Submitting jobs using Torque PBS

By |


Torque PBS (just PBS, hereafter) is the Portable Batch System, and it controls the running of jobs on the Flux cluster. PBS queues, starts, controls, and stops jobs. It also has utilities to report on the status of jobs and nodes. PBS and Moab, the scheduling software, are the core software that keep jobs running on Flux.


PBS is available in the core software modules and is loaded automatically by the system at login. The PBS commands should automatically load when you log in. If, for some reason, you remove the module, or otherwise clear your modules, you can reload it with

$ module load torque

PBS overview

PBS is a batch job manager, where a job is some set of computing tasks to be performed. For most people, the primary uses will be to put jobs into the queue to be run, to check on the status of jobs, and to delete jobs. Most of the time, you will write a PBS script, which is just a text file – a shell script with PBS directives that the shell will interpret as comments – that contains information about the job and the commands that do the desired work.

You will find it convenient to name the PBS scripts in a consistent way, and some find that using the .pbs extension clearly identifies, and makes it easy to list, PBS scripts. We will use that convention for our examples. Before we get to the contents of a PBS script, we will show the three primary PBS commands, after which we will look at the PBS directives.

Submitting a PBS script

Suppose you have PBS script called test.pbs that you wish to run. The command qsub test.pbs will submit it to be run. The output will be a JobID, which is used if you need information about the job or wish to delete it. If you are having trouble with a job, it is always a good idea to include the JobID.

$ qsub test.pbs
Checking on job status

You only need to specify the numeric part of the JobID To get information about its status within the PBS system. For example,

$ qstat 1234567
Deleting a job

To delete a job, use

$ qdel 1234567

A PBS script template

There are many PBS directives you can use to specify the characteristics of a job and how you and the PBS system will interact. A PBS directive is on a line beginning with #PBS. We will show an idealized template for a PBS script to illustrate some of the most commonly used PBS directives, then we will explain them. The example PBS script, call it test.pbs contains these lines.

####  PBS preamble

#PBS -N PBS_test_script
#PBS -m abe

#PBS -A example_flux
#PBS -l qos=flux
#PBS -q flux

#PBS -l nodes=4:ppn=2,pmem=2gb
#PBS -l walltime=1:15:00
#PBS -j oe

####  End PBS preamble

if [ -s "$PBS_NODEFILE" ] ; then
    echo "Running on"

if [ -d "$PBS_O_WORKDIR" ] ; then
    echo "Running from $PBS_O_WORKDIR"

#  Put your job commands after this line
echo "Hello, world."

The lines that are not PBS directives but begin with the # character are comments and are ignored by PBS and by the shell. Once the first non-comment line is reached, and in the template above, the line that begins if, PBS stops looking for directives. It is, therefore, important to put all PBS directives before any commands.

You may find that grouping the PBS directives into blocks of related characteristics helps when reviewing a file for completeness and accuracy.

Roughly speaking, the three blocks, in order, are the attributes that control how you interact with the job, how the job gets paid for and routed, and what the resource characteristics of the job are.

All PBS directives start with #PBS, and each directive in a PBS script corresponds to a command line option to the qsub command. The #PBS -N PBS_test_script directive – which sets the job name – corresponds to adding -N PBS_test_script as an option to the qsub command. You can override the directives in the PBS script on the command line, or supplement them. So for example, you could use

$ qsub -N second_test test.pbs

and the name second_test would override the name set in test.pbs. The name should contain only letters, digits, the underscore, or the dash characters.

The next two directives (we will omit the #PBS portion for the remainder of this section), the -M and -m, control how PBS communicates job status to you by e-mail. Specify your own e-mail address after the -M ( is just a placeholder).

The three letters after the -m directive specify under what conditions PBS should send a notification: b is when a job begins, e is when a job ends, and a is when a job aborts. You can also specify n for none to suppress all e-mail from PBS.

The second block contains directives that have, roughly, to do with how your job is paid for and which sets of resources it runs against. The -A directive specifies the account to use. This is set up for the person paying for the use. The -l qos option will always be flux unless you receive specific instructions to the contrary.

The -q specifies which queue to use. In general, the queue will match the account suffix, so if the account is default_flux account, the queue would be specified using -q flux; similarly, if this were a large-memory account, default_fluxm, the queue should be specified -q fluxm; etc.

The exception to this rule is if you will be using software that is restricted to on-campus use, and you submit jobs from the node, in which case, the queue would be specified as -q flux-oncampus. Jobs can submitted to the flux-oncampus queue only from the flux-campus-login node.

The last block contains the directives that determine the environment in which your job runs and what resources are allocated to it. The most commonly changed are the processor count and layout, the memory, and the maximum amount of time the job can run.

Let us save the most complicated options for last, and review these in reverse order. The -V option instructs PBS to set the environment on all of the compute nodes assigned to a job to be the same as the environment that was in effect when you submitted the job. This is very important to include to make sure that all the paths to programs and libraries, and any important environment variables, are set everywhere. Many obscure error messages are due to this option not being used.

PBS will normally create a separate file for output and for errors. These are named job_name.oXXXXXXXX and job_name.eXXXXXXXX where job_name is that name you specify with the -N option, XXXXXXXX is the numeric JobID PBS assigned the job, and the o and e represent output and error, respectively. If there are errors, or if your program writes informative information to the error stream, then it can be helpful to combined the output and error stream so that the error messages appear in context with the output. It can be very difficult otherwise to determine exactly where in the course of a job an error occurred. This is what the -j oeoption does: it joins the output and error streams into the output stream (specifying it as eo would combine them into the error stream).

There are many options that can be specified with -l (dash ‘ell’). One of the simplest is walltime which specifies the maximum clock time that your job will be allowed to run. Time is specified as day, hours, minutes, seconds, dd:hh:mm:ss. So, 15:00 requests 15 minutes (this should be the minimum that you request for a single job), 2:30:00 would request two and one-half hours, and 7:00:00:00 would request one week. Longer times are harder to schedule than shorter times, but it is better to ask for too much time than too little, so your job will finish.

Finally, we get to the most complex of the -l options: selecting the number of nodes, processors, and memory. In the example above, the request is for nodes=4:ppn=2,pmem=2gb, which translates as “assign me 4 machines, each with 2 processors, and each processor should have 2 GB of memory”.

It does not specify that exactly; instead, it specifies a sort of “worst-case” scenario. It really says that you would take up to four physical machines, each of which has at least two processors each. You could end up with eight processors on one machine with that request.