Jupyter is a web application that allows you to create and share interactive documents, known as “notebooks”, that contain live code, equations, visualizations, widgets, and explanatory text.  A notebook contains code in a language you choose:  ARC Connect currently supports Jupyter notebooks for Python 2, Python 3, R, and Julia.  To enquire about additional languages, please contact hpc-support@umich.edu

Jupyter enables collaboration by allowing you to share Jupyter notebooks with other researchers.  Sharing is typically done via the filesystem, Globus Connect Personal, GitHub, email, FileSender, Box, or the Jupyter Notebook Viewer.

connect-jupyter

Starting and ending Jupyter jobs

To start a Jupyter job, follow the steps in Starting a job.  Once your job has started, you will see the following in the Job tab:

connect-jupyter-job-tab

You can then use the green “Open in Browser” button to open Jupyter in another web browser tab.  The “Open in Browser” button will automatically log you in to Jupyter so that you do not need to manually enter the password.

You can use the URL and password information to share access with collaborators.  Note that your collaborators will be able to access and modify all of your files and notebooks and submit additional jobs.  If you just need to let your collaborator view a notebook, you should send them the notebook via email so that they can run the notebook themselves, or you can use the Jupyter Notebook Viewer.

WARNING: If you and your collaborator have the same notebook file (in the same directory on an ARC-TS cluster) open at the same time, the automatic saving (checkpointing) that Jupyter does can result in data loss.  To avoid this, you should close and halt the notebook in Jupyter before your colleague opens the notebook; your colleague should likewise close and halt the notebook before you open it again.

When you are done using Jupyter, you should save all open and close all open notebooks (using the “File -> Close and Halt” menu in each notebook) and then click the red “Terminate Jupyter” button in the ARC Connect Job tab.  If you click the “Terminate Jupyter” button before closing and halting all open notebooks, you could lose data.

Using Jupyter

When you open Jupyter, you will see the Jupyter dashboard:

jupyter-dashboard-good

If you already have a notebook available, click the notebook name (A) in order to open and run it.

To create a new notebook, open the “New” pull-down (B) and select the language (C) you want to use in the new notebook.

To access files residing under /scratch, /nfs, or /afs click on the Home (house) icon (D) at the top of the file list.  If the directory is under /nfs, it may not appear automatically — you may need to open a terminal (E) in Jupyter first, access the directory (for example, by running ls /nfs/my_subdirectory_name) and then clicking on the Jupyter file manager refresh icon (F).

Once you open a notebook (or create a new one (B) then (C)) there is nothing special about using the notebook via ARC Connect.  The following resources may be useful in getting started:

Collaborating with other researchers

You can collaborate with other researchers by sharing your Jupyter job information with them.  Note that your collaborators will be able to access and modify all of your files and notebooks and submit additional jobs.  If you just need to let your collaborator view a notebook, you should send them the notebook via email so that they can run the notebook themselves, or you can use the Jupyter Notebook Viewer.  When you send a notebook to a collaborator, the output of all cells which you have run is included; a collaborator won’t need access to the ARC-TS cluster on which the notebook was created unless they need to re-run the cells.

WARNING: If you and your collaborator have the same notebook file (in the same directory on an ARC-TS cluster) open at the same time, the automatic saving (checkpointing) that Jupyter does can result in data loss.  To avoid this, you should close and halt the notebook in Jupyter before your colleague opens the notebook; your colleague should likewise close and halt the notebook before you open it again.

ARC Connect allows researchers to access U-M High Performance Computing (HPC) resources — currently only the Flux cluster — using a web browser or a Virtual Network Computing (VNC) client. The U-M Sensitive Data Guide identifies the types of sensitive regulated data that are permitted to be processed or maintained in Flux, which does not currently include data regulated under HIPAA or FISMA. It is your responsibility to know whether your data is regulated and whether and with whom you are allowed to share the data or access to the data.

You and your collaborators are also responsible for:

If you have any questions or need additional information or further clarification, consult with your departmental IT manager or Security Unit Liaison or Information and Infrastructure Assurance before sharing access to your ARC Connect Jupyter jobs.

connect-vnc-web-clipboard

NOTE: Currently, due to data use agreement compliance requirements, researchers affiliated with institutions other than the University of Michigan need to be sponsored as an academic affiliate (that is, obtain a U-M uniqname and password) and connect to the U-M VPN before accessing ARC Connect.

IPython Clusters / ipyparallel

The “IPython Clusters” tab on the Jupyter dashboard allows you to easily start and stop ipyparallel clusters inside your Jupyter job.  ipyparallel is a Python package that lets you use many different methods (SPMD, MPMD, MPI, task farming, and others) to interactively parallelize your computations.

NOTE: You should only start an IPython Cluster if you plan on using “import ipyparallel” in your Python notebook and then using the various ipyparallel classes in your code to parallelize your computations.  If you will instead be directly using mpi4py or multi-core / multi-threaded functions in your notebooks, you should not start an IPython Cluster since it will just consume resources without being used.

To start an IPython Cluster, click on the “IPython Clusters” tab (A) and then click the Start button (B) for the ARC_Connect profile line (C).  Do not fill in the number of engines (D) — this will be set automatically to the number of cores you requested for your ARC Connect Jupyter job.  After clicking the Start button, please wait 60 seconds for the cluster to start up before trying to use it.

jupyter-clusters

WARNING: If you see a profile named “default” (such as shown above), do not click its Start button: the default profile will start up 16 engines all on the first node of your job, and none on any other node, regardless of how many cores are actually available to you on the first node of your job.  The resulting performance is likely to be very poor.  Only start clusters using the ARC_Connect profile.

TIP: If you prefer to manage your IPython cluster via the command line, you can open a Jupyter Terminal or a notebook and start the cluster from there using the following shell command:
ipcluster start –profile=ARC_Connect

The ARC_Connect profile utilizes the ipyparallel MPI Launcher to start engines on all of the job’s nodes, enabling your notebooks to use more cores than are available on any single node.  In order to permit you to run more than one Jupyter job at a time, the ARC_Connect profiles assigns each ipyparallel cluster an ID that is the same as the PBS job ID which you must specify when you use the cluster in your notebook:

import os
import ipyparallel as ipp
c = ipp.Client(cluster_id=os.getenv('PBS_JOBID'))

If you are using a notebook written for another environment, you will need to modify the creation of the Client object to specify the cluster_id as shown above.  If you see the following error message, it means that cluster_id was not specified correctly for the ARC_Connect profile (the key sections of the message are in red type below):

Waiting for connection file: 
~/.ipython/profile_ARC_Connect/security/ipcontroller-client.json
OSErrorTraceback (most recent call last)
<ipython-input-2-9493b1a800cf> in <module>()
----> 1 c = ipp.Client()
/usr/cac/rhel6/lsa/anaconda-arc-connect/latest/lib/python3.5/site-packages/ipyparallel/client/client.py in __init__(self, url_file, profile, profile_dir, ipython_dir, context, debug, sshserver, sshkey, password, paramiko, timeout, cluster_id, **extra_args)
389                      no_file_msg,
390                  ])
--> 391                  raise IOError(msg)
392      if url_file is None:
393          raise IOError(no_file_msg)

OSError: Connection file '~/.ipython/profile_ARC_Connect/security/ipcontroller-client.json' not found.
You have attempted to connect to an IPython Cluster but no Controller could be found.
Please double-check your configuration and ensure that a cluster is running.

 

For further details on how to use ipyparallel, please refer to the ipyparallel documentation or David Masad’s ipyparallel tutorial.

 

Advanced Jupyter topics

Troubleshooting

If you need to see warning and error messages in order to diagnose problems, log files for ARC Connect Jupyter jobs are kept in the directory ~/.jupyter/arc-connect-logs. The command “ls -ltr” may be useful for sorting the files by the time they were modified in order to group files for each Jupyter job together.

For each Jupyter job, the following files are created.  Only the last file will be of interest to most people.

File name Contents
connect_jupyter.config.31433-1464378185.py Copy of the configuration file for just this job at the time it was submitted.
connect_jupyter.lock.19842767 Lock file created by the Jupyter server
connect_jupyter.o19842767 PBS output and error file
connect_jupyter.log.19842767 Jupyter server log file containing warnings and error messages from notebooks.

Installing your own Jupyter kernels

If you think a kernel would be useful to both yourself and other researchers, send email to hpc-support@umich.edu to enquire about whether it can be installed centrally for all users of ARC Connect.  This may be the easiest option, particularly if a kernel has complex requirements or requires a lot of software that is not already installed on the cluster.

You can install new kernels yourself by following the links in the “Name” column of the IPython kernels for other languages page to get instructions for the particular kernel you are interested in.  Run the commands in a Jupyter Terminal window (not from a login node or VNC session).  You may need to modify some of the commands to install the kernel under your home directory (for example, run “pip install --user” instead of just “pip install”).  Once kernels have been installed, their configuration files will be available in the directory ~/.local/share/jupyter/kernels.

For example, to install a Bash kernel for Jupyter under your home directory, start a Jupyter job via ARC Connect, open a new Jupyter Terminal window, and run the following commands:

pip install --user bash_kernel
python -m bash_kernel.install

Customizing the Jupyter configuration

The first time you start a Jupyter job through ARC Connect, a Jupyter configuration file, ~/.jupyter/jupyter_notebook_config.py will be automatically created if it does not exist already. You can edit this file and any change you make will apply to future ARC Connect Jupyter jobs.

Similarly, for ipyparallel clusters, the configuration files are under ~/.ipython/profile_ARC_Connect and any changes you make will apply to future IPython clusters.

Resetting the Jupyter configuration

To completely reset the configuration for Jupyter, end all Jupyter jobs, log in to a login node for the ARC-TS cluster you are using and then run the following commands from the login node shell prompt:

rm -rf ~/.jupyter.old ~/.ipython.old ~/.local/share/jupyter.old ~/.ipynb_checkpoints
mv ~/.jupyter ~/.jupyter.old
mv ~/.ipython ~/.ipython.old
mv ~/.local/share/jupyter ~/.local/share/jupyter.old

Then, the next time you start an ARC Connect Jupyter session, new configuration files will be created.

Because Jupyter itself is written in Python, certain modifications to your Python configuration may prevent Jupyter from running.  If you need to reset your Python configuration, log in to a login node for the ARC-TS cluster you are using and then run the following commands from the login node shell prompt:

rm -rf ~/.local.old
mv ~/.local ~/.local-old
rm -rf ~/.cache/{fontconfig,matplotlib,pip}