Accessing Stata

Stata is part of the Flux software library. To use the standard Stata, you must first load the stata module.

$ module load stata

To access Stata/MP, which has multicore processing capability, you must first load

$ module load stata-mp

For details on the differences amnog the versions, see the web page at Which Stata is right for me. For the rest of this page, we will just use Stata to mean any version.

Stata versions

Stata/SE

The standard stata module provides both Stata/IC and Stata/SE. If you invoke Stata with just stata, you will run Stata/IC, which has a limitation on the number of independent variables in models; you need to run stata-se to get Stata/SE.

There are a limited number of licenses for Stata/SE on Flux. Be sure to use to request a license in your PBS script. See the section on running Stata from PBS below for details.

Stata/MP

Stata/MP is also available, and you must run it explicitly using stata-mp. If you do not have the stata-mp module loaded and try to run Stata/MP, you will get a licensing error. There are a limited number of licenses for Stata/MP on Flux. Be sure to use to request a license in your PBS script. See the section on running Stata from PBS below for details.

Not all Stata commands parallelize, so not all of them will run faster in Stata/MP than they would in Stata/SE. For example, the regress command parallelizes very well, whereas the xtreg command does not. See the Stata/MP Performance Report for details on which procedures speed up and by how much.

The maximum number of processors that our Stata/MP can use is 16. When you create your PBS script, you must request those processors on the same node. This is done with the #PBS -l directive, as in this example that asks for eight processors

#PBS -l nodes=1:ppn=8

Sample datasets from the manuals

The sample datasets used in the Stata documentation are available from the Stata Press web site: Datasets for Stata 13 manuals. Here is an example of downloading the systolic.dta dataset used in examples in the Stata Base Reference Manual, ANOVA section. This will only work from a login node, as the compute nodes do not have access to the internet outside of UM.

curl -O http://www.stata-press.com/data/r12/systolic.dta

The curl command is a standard Linux program for downloading files from web sites. More information can be found at the curl man page online or by typing man curl on a login node.

Additional help with Stata

Consulting for Statistics, Computing and Analytics Research (CSCAR) at the University of Michigan offers help using Stata. If you are having trouble with Stata itself, you can contact them for assistance. They can be reached by e-mail at stata-help@umich.edu. They also have telephone and walk-in support, and you can make an appointment with a consultant if your problem is statistical or complex. See the CSCAR website for contact information, hours, and location.

Running Stata interactively

Stata can be run interactively by simply typing

$ stata-se

(or stata-se). The Stata prompt is a period (.), and the command to quit Stata is exit.

Running Stata in batch mode

Stata can be run in batch mode by preparing a file of Stata commands and starting Stata with the -b option followed by the name of the file that contains the Stata commands. Stata command files are typically called “do files” by those who have used it a lot because of the .do extension typically used.

Here is a very simple example of a .do file that will read the contents of the auto.dta sample data set that is included with the Stata program. We will assume you call this simple.do for the rest of this example.

sysuse auto.dta
summarize
ttest mpg, by(foreign)

To run this in batch, you would use

$ stata-se -b do simple.do

By default, when you run Stata in batch, it will create a log file with the same base name as the .do file; e.g., if you run the above command, it should create simple.log.

Running Stata from PBS

Running Stata from a PBS script is just like running it in batch from the command line, except that you put the commands in a file.

We have a limited number of Stata licenses available, so please note that you must request licenses using the gres keyword when requesting resources.

To request Stata/SE licenses, you would use gres=stata:1 on the #PBS -l line.

To request Stata/MP licenses, you would use gres=stata-mp:1 on the #PBS -l line.

Assuming you have created the simple.do from above, the following example PBS script would run them using Stata/SE.

####  PBS preamble
#PBS -N stata_test
#PBS -M uniqname@umich.edu
#PBS -m abe

#PBS -l procs=1,mem=1gb
#PBS -l gres=stata:1
#PBS -j oe
#PBS -V

#PBS -A example_flux
#PBS -l qos=flux
#PBS -q flux

####  End PBS preamble
#  Include the next three lines always
if [ -d "${PBS_NODEFILE}" ] ; then
   cat $PBS_NODEFILE   # CPUs you were using if run with PBS
fi
#  Put your job commands after this line
stata-se -b do simple.do

If you have checkd in the Stata Performance Report that the Stata procedures you will be running can benefit from multiple processors, then you need to modify the PBS script above to request multiple processors and to request the appropriate license.

To request four processors, change the line requesting processors and memory to

#PBS -l nodes=1:ppn=4,pmem=1gb

which requests one physical machine with four available processors and 1 GB of memory per processor.

Change the license request to one for Stata/MP

#PBS -l gres=stata-mp:1

Note that only one license is needed regardless of the number of processors requested.

Finally, the command to invoke Stata must be changed to

stata-mp -b do simple.do

As noted above, the maximum number of processors that Stata/MP can use is 16. Even for procedures that benefit from multiprocessing, requesting more processors does not guarantee faster processing. That is, a procedure may be 60% faster if run with 4 processors, but it may only be an 50% faster if run with 14 processors. The <a href=”Stata Performance Guide should be consulted to help determine how many processors to request.

Extending Stata

Stata is easily extensible. Extensions are much like batch files, but are “automatic do files” and have the .ado extension. If you write your own, they can be put into ${HOME}/ado/personal and Stata will find them.

There are also two major repositories for user-contributed extensions to Stata, one is the Stata Journal archive and the other is the Boston College Statistical Software Components archive (SSC).

Installing packages from these sources is straightforward, but it must be done from a login node because the compute nodes do not have outside network connectivity and is probably best done using Stata interactively. An extension only needs to be installed once for it to be accessible by all subsequent Stata jobs. The .ado files for both will be put into your ${HOME}/ado/plus directory.

Stata Journal extensions

There are a number of ways to search the Stata Journal web site, one of the most convenient is to use the findit command, which simply takes some key words. So, for example, you are told that Levensohn has an extension to calculate production functions called levpet that you would like to use. Here is an example of how to find and install this from the Stata Journal archive (remember, the Stata prompt is a dot). This is a two-step process: first you find the reference and tell Stata from where to obtain it; second you then install it by name.

$ module load flux stata
$ stata-se
  [ . . . ]
. findit levpet levensohn
Keyword search
--------------

        Keywords:  levpet levensohn
  [ . . . . ]
1 package found (Stata Journal and STB listed first)
----------------------------------------------------

st0060 from http://www.stata-journal.com/software/sj4-2
    SJ4-2 st0060.  Production function estimation in Stata using ... /
    Production function estimation in Stata using inputs to control / for
    unobservables / by Amil Petrin / University of Chicago / NBER / / Brian P.
    Poi / StataCorp / / James Levinsohn / University of Michigan / NBER /

. net install st0060, from(http://www.stata-journal.com/software/sj4-2)
checking st0060 consistency and verifying not already installed...
installing into /home/bennet/ado/plus/...
installation complete.

Note that the top line of the description contains all the information you need to install the package.

SSC extensions

Suppose someone suggested that there was a package called ranktest that you might find useful. There is a built-in Stata command ssc that you can use to find out about the extension and install it. The following commands would print a description of the ranktest package, then install it.

. ssc describe ranktest
. ssc install ranktest

You would install it once from the command line on a login node, and it will be available thereafter as a Stata command.