The R parallel package

The parallel package is meant to supercede both the multicore and the snow packages, which have been merged and given a new front-end with parallel. This page documents the multicore equivalent functionality.

Multiprocessor parallelism

The parallel package enables you to use multiple processors on the same physical machine, the “shared memory” model of parallel computing. What follows are two examples: one that computes distance matrices, modified from Yuen[1], and one that computes cross-validation, modified from Eugster, et al.[2].

Distance matrix

The first example creates a 4 by 4 matrix of random numbers, then defines a function that calculates the distance between the elements and creates a distance object, and finally that function is applied to the original matrix four times, first on a single processor, then on four processors. We’ll assume these commands get put into a file called distance.R for batch processing.


lfun <- function(i){
  d <- dist(X) 

p <- 4
X <- matrix(rnorm(2^p),ncol = 2^(p/2))

res <- lapply(1:4,lfun)

options(cores = 4)
mcres <- mclapply(1:4,lfun)
all.equal(res, mcres)

Note that we are asking for four cores for the worker processes to do the calculations. You should request one additional core when you write your PBS script, so that the master process still has a core of its own.

To run those commands using PBS, you will need to modify the PBS script from above so that the #PBS -l line and the line that runs R look like this.

#PBS -l nodes=1:ppn=5,mem=1gb
  [ . . . . ]<
R CMD BATCH --vanilla distance.R

For more information about the #PBS line, please see the PBS web page.

Cross validation example

The second example creates a random data set, sets up 10 equally sized samples, then fits a model to the samples, calculates the squared predicted values difference of the predicted to the observed values, and creates a list of the results. Finally, the mean squared difference is calculated.

n <- 100
x <- rnorm(n)
y <- x + rnorm(n) <- data.frame(x, y)

K <- 10
samples <- split(sample(1:n), rep(1:K, length = n)) <- function(index) {
   fit <- lm(y~x, data =[-samples[[index]],])
   pred <- predict(fit, newdata =[samples[[index]],])
   return((pred -$y[samples[[index]]])^2)

#  Sequential version   <- lapply(seq(along = samples),

#  Parallel version <- mclapply(seq(along = samples),

#  All the elements are identical

There is a very good tutorial that illustrates using several methods of parallelizing R code in MJA Eugster, J Knaus, et al., “Hands-on tutorial for parallel computing with R”, Computational Statistics, Jun 2011, Vol 26, pp 219–239. This is definitely accessible online via the UM library website if you are a UM affiliate. Log into the library web site first, then click this link (it may not work if you do not pre-authenticate to the Library web site).


[1] Robert Yuen,
[2] Adapted from MJA Eugster, J Knaus, et al., “Hands-on tutorial for parallel computing with R”, Computational Statistics, Jun 2011, Vol 26, pp 219–239.