Matlab is capable of running parallel programs. There are two modes of parallelism: implicit and explicit.
Implicit parallelism is built into the software and is automatically used if it can be, regardless whether you ask for it. Matlab has this kind of implicit parallelism built into most of its matrix operations, for example. If your program spends a lot of time computing matrix operations, then your program will run faster if you increase the number of cores available to Matlab.
If you choose to increase the number of cores to use implicit parallelism, you should check on the efficiency of using those cores after your job completes. Most ARC-TS clusters should have the
my_job_statistics command available, which will give you some information about core usage efficiency.
Explicit parallelism is programmed by you (or whoever wrote your Matlab code). Explicit parallelism can be broken down into three main types.
- Shared memory parallelism
- Several copies of your program run, but they can all access the same data in memory. This type of parallelism can only occur (in our environment) on a single node.
- Distributed memory parallelism
- Several copies of your program run, but they all need to have the data on which they will work independently of the other copies. This can run on more than one node. You may need to explicitly program copying data to the node(s) where it will be used.
- Hybrid parallelism
- This combines distributed memory parallelism, where data is copied to a different node, but then on that node, the data is processed using shared memory parallelism. This method is advanced and has many places where errors and pitfalls may arise. Recommended only for advanced Matlab programmers.
ARC-TS parallel profiles
For explicit parallel processing, Matlab uses parallel profiles to define a processor pool. The profile contains information about the nodes to be used and the number of processors on each node, among other things. On the ARC-TS clusters, there will be three profiles available for you to use.
The local profile
The built-in local profile should be used for programs that run on a single node. Nothing special needs to be done to define this profile.
The local profile is always defined; it is the machine on which the initial Matlab process runs.
The current profile
The current profile is created by running the
setupUmichClusters command, and it contains the nodes and processors requested in the Slurm job in which it runs. This profile should not be used when running on a login node. The local profile should be preferred for single-node jobs.
Creating the pool of workers
Both the local and the current profiles are used to create a pool of Matlab worker processes. It is usually best to assign the reference to a variable, as in the examples below. The examples show setting a variable,
N to a literal number. You can also query environment variables to set this number to be the number of tasks available from the job scheduler. See the examples in the menu to the right for sample code.
To create a pool using the local profile, use this.
% Create a local worker pool N = 4; thePool = parpool('local', N);
To create a pool using the current profile, use this.
% Initialize the current profile setupUmichClusters % Create a current worker pool assuming 4 total tasks in the job N = 4; thePool = parpool('current', N);
Deleting the worker pool
When you are finished using the parallel pool created using either the local or the current profiles, you should always explicitly close it to prevent any possible problems that may arise with if it is shutdown improperly. This will shutdown the parallel pool, the reference to which was created by the commands above.
If you did not save the reference to the pool as a variable, you can use
delete(gcp('nocreate')), which will query to get the reference to the current pool and delete it.
Ideal candidates for parallel processing
Many files to be processed
One common example is when there are many files each of which needs to have the same operations performed on the data therein; for example, you have many hundreds of images, each of which needs some processing.
Many unique iterations
Simulation, in particular, Monte Carlo techniques for random sampling from data or randomly generating data, are good cases for parallelism. Each iteration will typically generate its own, unique starting place, but then uses the same procedure for each sample or iteration.
The Matlab documentation uses the distributed job method quite extensively. We do not recommend this, as it essentially duplicates the functions in Slurm. It seems to us not as well suited to work where you use a login node that is part of the cluster, and we do not allow user workstations to submit jobs to the cluster, which is where this would make the most sense.
In the distributed job scheme, you specify some function for Matlab to execute, then Matlab will submit a batch job to the cluster scheduler for each instance of the function. For more details, see the Matlab distributed jobs web page (not yet ready). This is very infrequently used at UM.
If you wish to use this method on one of the ARC-TS clusters, please contact email@example.com before doing so, and we can discuss your method and provide you with examples and help for implementing this on an ARC-TS cluster.
The slurm profile
The slurm profile is also defined by using the
setupUmichClusters command, and it defines Slurm job parameters that will be used when Matlab submits distributed jobs on your behalf. This profile is not recommended for use on ARC-TS clusters, but we make it available for those few programs that require it. It is almost always easier to do this kind of work with job scripts outside of Matlab.