What is ipyrad?

ipyrad is a toolbox for assembly and analysis of RAD-seq type genomic data sets. It has four assembly methods by which to assemble data: denovo, reference, reference addition, and reference subtraction. Assembled data sets are created in a variety of output formats, facilitating downstream genomic analyses for both population genetic and phylogenetic studies. ipyrad also includes methods for visualizing and analyzing data and results.

 

Accessing ipyrad

ipyrad is part of the LSA contributed software library. To use it, you must load both the ipyrad module as well as its dependencies. Different versions of ipyrad may have different dependencies. Run module avail ipyrad to find out what versions of ipyrad are available on the cluster, and then module load to load ipyrad and its dependencies. In the following example, the first attempt to load ipyrad fails because ipyrad’s dependencies have not been loaded, so we run the module load command again and include the necessary dependencies, as listed in the error message:

[markmont@flux-login2 ~]$ module avail ipyrad

--------------------------- /sw/lsa/centos7/modulefiles ---------------------------
   ipyrad/0.4.1    ipyrad/0.5.15    ipyrad/0.6.17
   ipyrad/0.4.4    ipyrad/0.6.8     ipyrad/0.7.17 (D)

  Where:
   D:  Default Module

Use "module spider" to find all possible modules.
Use "module keyword key1 key2 ..." to search for all possible modules matching any
of the "keys".

[markmont@flux-build ~]$ module load ipyrad
Lmod has detected the following error:  Cannot load module
ipyrad/0.7.17" without these module(s) loaded:
   gcc/4.8.5 openmpi/1.8.8/gcc/4.8.5

While processing the following module(s):
    Module fullname  Module Filename
    ---------------  ---------------
    ipyrad/0.7.17    /sw/lsa/centos7/modulefiles/ipyrad/0.7.17.lua

[markmont@flux-login2 ~]$ module load gcc/4.8.5 openmpi/1.8.8/gcc/4.8.5 ipyrad
[markmont@flux-login2 ~]$

Running ipyrad

Running ipyrad in an interactive job

Start an interactive job. See the instructions for starting an interactive job on the cluster.

Inside the interactive job, run ipyrad:

  • To run ipyrad in a single node interactive job using all of the cores the job requested on that node:
    ipyrad -p params-ASSEMBLYNAME.txt -s 123
  • To run ipyrad on multiple nodes using MPI (assuming you requested multiple nodes for your interactive job):
    ipyrad -p params-ASSEMBLYNAME.txt -s 123 --MPI

The above commands can be used either in an interactive job or in a PBS script. Replace “params-ASSEMBLYNAME.txt” with the name of your parameter file, and specify the steps (“-s 123“) and any other ipyrad command line options you want.

NOTE: ipyrad on Flux and Armis automatically detects the number of cores you requested for the job. You should not use the “-c” option to ipyrad unless you want to run ipyrad with fewer cores than are available to your job.

NOTE: ipyrad on Flux and Armis automatically creates an ipython / ipyparallel profile named “ARC_Connect” for use with ipcluster that contains the special configuration needed for Flux and Armis. The “ipyrad” command will automatically start and stop the ipcluster as needed. There should be no need to start ipcluster by hand under normal circumstances.

Running ipyrad in a non-interactive job

Use a standard PBS script and at the end include whichever ipyrad command is appropriate (single-node or MPI, see above).

Here is an example PBS script for a single-node ipyrad job:

####  PBS preamble

#PBS -N sample_job

# Change "bjensen" to your uniqname:
#PBS -M bjensen@umich.edu
#PBS -m abe

# Change the number of cores (ppn=4), amount of memory,
# and walltime to be what you need for your job:
#PBS -l nodes=1:ppn=4,mem=16000mb,walltime=04:00:00
#PBS -j oe
#PBS -V

# Change "example_flux" to the name of your Flux allocation:
#PBS -A example_flux
#PBS -q flux

####  End PBS preamble

#  Show list of CPUs you ran on, if you're running under PBS
if [ -n "$PBS_NODEFILE" ] ; then cat $PBS_NODEFILE ; fi

#  Change to the directory you submitted the job from
if [ -n "$PBS_O_WORKDIR" ] ; then cd $PBS_O_WORKDIR ; fi

#  Put your job commands below here.  Change "params-ASSEMBLYNAME.txt"
#  to be the name of your ipyrad parameter file:
echo "Started at: " $(date)
ipyrad -p params-ASSEMBLYNAME.txt -s 123
echo "Finish at: " $(date)

Running ipyrad in Jupyter

ARC-TS clusters support a simplified procedure for running ipyrad in Jupyter compared to the generic ipyrad instructions for starting Jupyter manually and accessing it through a SSH tunnel.

  1. Load the ipyrad module for the version of ipyrad you want to use. For example,
    module load gcc/4.8.5 openmpi/1.8.8/gcc/4.8.5 ipyrad
  2. Start an interactive job.
    See the instructions for starting an interactive job on the cluster. For example,

    qsub -I -V -A example_flux -q flux -l nodes=1:ppn=4,mem=16000mb,walltime=04:00:00
  3. In the interactive job, run the command “jupyter-start“.
  4. On your local computer, open in your web browser the URL printed by the jupyter-start command above. You may be prompted to choose your institution (select “University of Michigan”) and/or to log in (use your U-M uniqname, Kerberos password, and Duo).
  5. In Jupyter, go to the “IPython Clusters” tab. On the line for the “ARC_Connect” profile, click the “Start” button.

    NOTE: Do not fill in the number of engines. The number of cores you requested for your job will be automatically detected.

    NOTE: Make sure you ONLY click the “Start” button for the “ARC_Connect” profile. The profile named “default” will not work with these instructions, nor will any profile that has “ipyrad” in its name (profiles with “ipyrad” in their name are either for very old copies of ipyrad or for a version you installed yourself that has not been modified to work with Jupyter on Flux and Armis).

  6. Switch to the Jupyter “Files” tab and either create a new notebook using the “New” dropdown menu, or open an existing notebook.

    NOTE: The only type of notebook you should see under the “New” menu is “Python 2”. If you see other notebook types (such as Python 3, R, Julia, and SAS), then you probably forgot to load the ipyrad module in step 1, above, and are running a generic version of Jupyter that does not have access to ipyrad. In this case, stop Jupyter by running the command "jupyter-stop" in your interactive job, exit the interactive job, and then start over.

  7. In the notebook, run the following Python code in order to allow the notebook to find and connect to the IPython cluster you just started. Note that this code is special for Flux and Armis in order to support multiple concurrent jobs.
    import os
    import ipyrad as ip
    import ipyparallel as ipp
    ipyclient = ipp.Client(cluster_id=os.getenv("PBS_JOBID", str(os.getpid()) + '.' + os.uname()[1]))
  8. You can now use ipyrad in Jupyter normally.
  9. When you are done using ipyrad in Jupyter:
    1. Save, then Close and Halt your notebook.
    2. Go to the “IPython Clusters” tab in Jupyter and click the “Stop” button for the “ARC_Connect” profile.
    3. Close all Jupyter web browser windows.
    4. In your interactive job’s terminal window, shut Jupyter down by running the command “jupyter-stop“.
    5. In your interactive job’s terminal window, run “exit” to end the interactive job and return to the login node.

Additional information

Additional information is available on the ipyrad web site at http://ipyrad.readthedocs.io/index.html For any Flux- or Armis-specific assistance running ipyrad, contact hpc-support@umich.edu.