How parallel for loops work

Parallel for loops are a way to take a group of independent tasks and compute more than one at a time. Matlab implements parallel for loops by using a master Matlab process that takes each of the possible steps in a for loop and assigning it to a worker Matlab process in a pool of workers that has been created for this purpose. This example shows how to set up and use a parallel for loop that is capable of using processors on more than one machine.

The spectral processing example

We will use the spec function from the spectral processing example to demonstrate how this can be done. The complete file can be found on Flux at


For this example, we begin by setting variables to contain the file names and precreate some data structures to hold the final results. The first command from spec_parfor_local.m that pertains to the parfor setup is

% If not inside a PBS job, use 4 processors
if isempty(getenv('PBS_NP'))
    NP = 4;
    NP = str2double(getenv('PBS_NP'));

That bit of code reads from the environment the number of processors that PBS assigned to this job. If there is no environment variable – that is, this is not being run from within PBS – then it defaults to setting the number of processors to four. This method can be used to create a Matlab script that can be run interactively outside of PBS or within it.

There are three cluster profiles that are defined at startup of Matlab 2015a and subsequent versions: local, current, and flux. When Matlab is run from within PBS, either the local or current profile is the correct one to use. For jobs where all the workers will be on one physical machine, the correct profile to use is local.

We initialize the pool of workers with the parpool command, as in

% Initialize the pool to use
myPool = parpool('local', NP);

The next bit of code in spec_parfor_local.m starts a timer, then the actual parallel for loop, which uses the pool just created.

parfor k = 1:N
   data_file = [ data_repository files{k} ];
   Y(:,k) = spec(data_file, k);
stop_time = toc;

We print the timing information (not shown here), and finally, we shut down the workers in the pool and delete the pool object.

% Shut down the parallel pool

Note that the delete command takes the name of the pool object we created with parpool. You should always delete your pool before you exit. Not deleting the pool prior to exiting could lead to warnings or errors on subsequent runs.

Timing example using the blackjack simulation

This example shows an example of how to run some timings to measure whether and by how much performance increases as processors get added. It also shows how to check that the worker pool got created and exit with a message if it did not. To make this suitable to run outside of a PBS job, you would have to change that to the mechanism used in the spectral analysis example.

%%%%  We get from the environment the number of processors
NP = str2num(getenv('PBS_NP'));

%%%%  Create the pool for parfor to use
thePool = parpool('local', NP);

%%%%  That worked, right?  If not, exit
if isempty(thePool)
    error('pctexample:backslashbench:poolClosed', ...
         ['This example requires a parallel pool. ' ...
          'Manually start a pool using the parpool command or set ' ...
          'your parallel preferences to automatically start a pool.']);

%%%%  Some parameters
numHands = 2000;
numPlayers = 6;
poolSize = thePool.NumWorkers;

%%%%  Precreate and initialize our results vector
t1 = zeros(1, poolSize);

%%%%  Run simulation to see decreased time with increased processors
fprintf('Simulating each player playing %d hands.n', numHands);
for n = 2:poolSize
        pctdemo_aux_parforbench(numHands, poolSize*numPlayers, n);
    t1(n) = toc;
    fprintf('%d workers simulated %d players in %3.2f seconds.n', ...
            n, poolSize*numPlayers, t1(n));

%%%%  Run one simulation many times to get average performance and std dev
numIter = 50;
t2 = zeros(1, numIter);
for i = 1:numIter
        pctdemo_aux_parforbench(numHands, poolSize*numPlayers, poolSize);
    t2(i) = toc;
    if mod(i,20) == 0
        fprintf('Benchmark has run %d out of %d times.n', i, numIter);
[muhat, sigmahat, muci] = normfit(t2)
fprintf('nnMean:    %8.4fnStdDev:  %8.4fn', muhat, sigmahat);
fprintf('n95%% CI for the meannLower:   %8.4fnUpper:   %8.4fnn', ...
        muci(1), muci(2));

%%%%  Delete the pool explicitly to prevent future problems