Explore ARCExplore ARC

Using General Purpose GPUs on Flux

By February 24, 2016

What are GPGPUs?

GPGPUs are general-purpose graphics processing units. Originally, graphics processing units were very special purpose, designed to do the mathematical calculations needed to render high-quality graphics for games and other demanding graphical programs. People realized that those operations occur in other contexts and so started using the graphics card for other calculations. The industry responded by creating the general purpose cards, which generally has meant increasing the memory, numerical precision, speed, and number of processors.

GPGPUs are particularly good at matrix multiplication, random number generation, fast Fourier transforms (FFTs), and other numerically intensive and repetitive mathematical operations. They can deliver 5–10 times speed-up for many codes with careful programming.

Submitting Batch Jobs

The Flux GPGPU allocations are based on a single GPGPU accompanied by two CPUs each with 4 GB of memory for a total CPU memory pool of 8 GB per GPU. To use more memory or more CPUs with a GPU job, you must increase your allocation to make them available. For example, if your GPGPU program required 17 GB of memory, you would need to have an allocation for three GPGPUs to obtain enough CPU memory even though your job only uses one GPGPU and one CPU.

To use a GPU, you must request one in your PBS script. To do so, use the node attribute on your #PBS -l line. Here is an example that requests one GPU.

#PBS -l nodes=1:gpus=1,mem=2gb,walltime=1:00:00

Note that you must use nodes=1 and not procs=1 or the job will not run.

Also note that GPUs are available only with a GPU allocation, and those have names that end with _fluxg instead of _flux. Make sure that the line requesting the queue matches the GPU allocation name; i.e.,

#PBS -q fluxg

See our web page on Torque for more details on PBS scripts.

Programming for GPGPUs

The GPGPUs on Flux are NVIDIA graphics processors and use NVIDIA’s CUDA programming language. This is a very C-like language (that can be linked with Fortan codes) that makes programming for GPGPUs straightforward. For more information on CUDA programming, see the documentation at http://www.nvidia.com/object/cuda_develop.html.

For more examples of applications that are well-suited to CUDA, a language that enables use of GPUs, see NVIDIA’s CUDA pages at http://www.nvidia.com/object/cuda_home.html.

NVIDIA also makes special libraries available that make using the GPGPUs even easier. Two of these libraries are cuBLAS and cuFFT.

cuBLAS is a BLAS library that uses the GPGPU for matrix operations. For more information on the BLAS routines implemented by cuBLAS, see the documentation at http://developer.nvidia.com/cublas.

cuFFT is a set of FFT routines that use the GPGPU for their calculations. For more information on the FFT routines implemented by cuFFT, see the documentation at http://developer.nvidia.com/cufft.

To use the CUDA compiler (nvcc) or to link your code against one of the CUDA-enabled libraries, you will need to load a cuda module. There are typically several versions installed, and you can see which are available with

$ module av cuda

You can just load the cuda module to get the default, or you can load a specific version by specifying it, as in

$ module load cuda/6.0

Loading a cuda module will add the path to the the nvcc compiler to your PATH, and and it will set several other environment variables that can be used to link against cuBLAS, cuFFT, and other CUDA libraries in the library directory. You can use

$ module show cuda

to display which variables are set.

CUDA-based applications can be compiled on the login nodes, but cannot be run there, since they do not have GPGPUs. To run a CUDA application, you must submit a job, as shown above.

TitanV GPUs

Flux now contains GPU nodes which have NVIDIA Titan V GPU cards that use the NVIDIA Volta architecture.  The nodes are capable of running CUDA 9 codes.

These nodes are the only nodes capable of running CUDA 9 codes.  To insure that older codes that may be incompatible with the new hardware do not run on them unintentionally, the new nodes will be available only by requesting titanv within your PBS script to anyone with an active Flux GPU account.  Flux GPU accounts end with _fluxg.

To request a new node, you will need to add the titanv feature to your resource request in your PBS script.  If your current resource request looks like

#PBS -l nodes=1:gpus=1,mem=2gb,walltime=1:00:00

Then to run a job on the new GPU node, you would modify that by adding the titanv node property, which would make it look like this

#PBS -l nodes=1:gpus=1:titanv,mem=2gb,walltime=1:00:00

The TitanV GPU nodes will run older CUDA codes, but they should be recompiled for the new hardware to get optimum performance.

To access CUDA 9 on Flux, use

$ module load cuda/9.1