Explore ARCExplore ARC

Getting Started

By |

1. Get Duo

You must use Duo authentication to log on to Armis2.  Get more details on the Safe Computing Two-Factor page and enroll here.

2. Get an Armis2 user login

You must establish user login on Armis2 by filling out this form.  If you had a login on Armis, you should have one on Armis2.

3. Get an SSH Client & Connect to Armis2 Login Node

The login node (armis2.arc-ts.umich.edu) is the entry point into the cluster. It is accessible from the Ann Arbor, Dearborn, and Flint campus IP addresses and from the U-M VPN network only and require a valid user account and Duo authentication to log in. They are a shared resource and, as such, it is expected that users do not monopolize them.

If you are trying to log in from off campus, or using an unauthenticated wireless network such as MGuest, you have a couple of options:

See the policies below governing appropriate use of the login nodes.

Mac or Linux:

Open Terminal and type:

ssh uniqname@armis2.arc-ts.umich.edu

Windows (using PuTTY):

Download and install PuTTY here.

Launch PuTTY and enter armis2.arc-ts.umich.edu as the host name, then click open.

All Operating Systems:

At the “Enter a passcode or select one of the following options:” prompt, type the number of your preferred choice for Duo authentication.

4. Get files

You can use SFTP (best for simple transfers of small files) or Globus (best for large files or a commonly used endpoint) to transfer data to your /home directory.

SFTP: Mac or Windows using FileZilla
  1. Open FileZilla and click the “Site Manager” button
  2. Create a New Site, which you can name “Armis2” or something similar
  3. Select the “SFTP (SSH File Transfer Protocol)” option
  4. In the Host field, type armis2-xfer.arc-ts.umich.edu
  5. Select “Interactive” for Logon Type
  6. In the User field, type your uniqname
  7. Click “Connect”
  8. Enter your Kerberos password
  9. Select your Duo method (1-3) and complete authentication
  10. Drag and drop files between the two systems
  11. Click “Disconnect” when finished

On Windows, you can also use WinSCP with similar settings, available alongside PuTTY here.

SFTP: Mac or Linux using Terminal

To copy a single file, type:

scp localfile uniqname@armis2-xfer.arc-ts.umich.edu:~/remotefile

To copy an entire directory, type:

scp -r localdir uniqname@armis2-xfer.arc-ts.umich.edu:~/remotedir

These commands can also be reversed in order to copy files from Armis2 to your machine:

scp -r uniqname@armis2-xfer.arc-ts.umich.edu:~/remotedir localdir

You will need to authenticate via Duo to complete the file transfer.

5. Submit a job

This is a simple guide to get your jobs up and running. For more advanced Slurm features, see the Slurm User Guide for Armis2. If you are familiar with using the resource manager Torque, you may find the migrating from Torque to Slurm guide useful.

Batch Jobs

Most work will be queued to be run on Armis2 and is described through a batch script. The sbatch command is used to submit a batch script to Slurm. To submit a batch script simply run the following from a shared file system; those include your home directory, /scratch, and any directory under /nfs that you can normally use in a job on Armis. Output will be sent to this working directory (jobName-jobID.log). Do not submit jobs from /tmp or any of its subdirectories.

$ sbatch myJob.sh

The batch job script is composed of three main components:

  • The interpreter used to execute the script
  • #SBATCH directives that convey submission options
  • The application(s) to execute along with its input arguments and options

Example:

#!/bin/bash
# The interpreter used to execute the script

#“#SBATCH” directives that convey submission options:

#SBATCH --job-name=example_job
#SBATCH --mail-type=BEGIN,END
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --mem-per-cpu=1000m 
#SBATCH --time=10:00
#SBATCH --account=test
#SBATCH --partition=standard

# The application(s) to execute along with its input arguments and options:

/bin/hostname
sleep 60

How many nodes and processors you request will depend on the capability of your software and what it can do. There are four common scenarios.

Example: One Node, One Processor

This is the simplest case and is shown in the example above. The majority of software cannot use more than this. Some examples of software for which this would be the right configuration are SAS, Stata, R, many Python programs, most Perl programs.

#!/bin/bash
#SBATCH --job-name JOBNAME
#SBATCH --nodes=1
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=1g
#SBATCH --time=00:15:00
#SBATCH --account=test
#SBATCH --partition=standard
#SBATCH --mail-type=NONE

srun hostname -s

Example: One Node, Multiple Processors

This is similar to what a modern desktop or laptop is likely to have. Software that can use more than one processor may be described as multicore, multiprocessor, or mulithreaded. Some examples of software that can benefit from this are MATLAB and Stata/MP. You should read the documentation for your software to see if this is one of its capabilities.

#!/bin/bash
#SBATCH --job-name JOBNAME
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --mem-per-cpu=1g
#SBATCH --time=00:15:00
#SBATCH --account=test
#SBATCH --partition=standard
#SBATCH --mail-type=NONE

srun hostname -s

Example: Multiple Nodes, One Process per CPU

This is the classic MPI approach, where multiple machines are requested, one process per processor on each node is started using MPI. This is the way most MPI-enabled software is written to work.

#!/bin/bash
#SBATCH --job-name JOBNAME
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=4
#SBATCH --mem-per-cpu=1g
#SBATCH --time=00:15:00
#SBATCH --account=test
#SBATCH --partition=standard
#SBATCH --mail-type=NONE

srun hostname -s

Example: Multiple Nodes, Multiple CPUs per Process

This is often referred to as the “hybrid mode” MPI approach, where multiple machines are requested and multiple processes are requested. MPI will start a parent process or processes on each node, and those in turn will be able to use more than one processor for threaded calculations.

#!/bin/bash
#SBATCH --job-name JOBNAME
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=4
#SBATCH --mem-per-cpu=1g
#SBATCH --time=00:15:00
#SBATCH --account=test
#SBATCH --partition=standard
#SBATCH --mail-type=NONE

srun hostname -s
Common Job Submission Options
Description Slurm directive (#SBATCH option) Armis2 Usage
Job name --job-name=<name> --job-name=armis2job1
Account --account=<account> --account=test
Queue --partition=<partition_name> --partition=standard

Available partitions: standard, gpu (GPU jobs only), largemem (large memory jobs only)

Wall time limit --time=<hh:mm:ss> --time=02:00:00
Node count --nodes=<count> --nodes=2
Process count per node --ntasks-per-node=<count> --ntasks-per-node=1
Minimum memory per processor --mem-per-cpu=<memory> --mem-per-cpu=1000m
Request software license(s) --licenses=<application>@slurmdb:<N> --licenses=stata@slurmdb:1
requests one license for Stata
Request event notification

--mail-type=<events>

Note: multiple mail-type requests may be specified in a comma separated list:

--mail-type=BEGIN,END,NONE,FAIL,REQUEUE

--mail-type=BEGIN,END,FAIL

Please note that if your job is set to utilize more than one node, make sure your code is MPI enabled in order to run across these nodes and you must use srun rather then mpirun or mpiexec. More advanced job submission options can be found in the Slurm User Guide for Armis2.

Interactive Jobs

An interactive job is a job that returns a command line prompt (instead of running a script) when the job runs. Interactive jobs are useful when debugging or interacting with an application. The srun command is used to submit an interactive job to Slurm. When the job starts, a command line prompt will appear on one of the compute nodes assigned to the job. From here commands can be executed using the resources allocated on the local node.

[user@login ~]$ srun --pty /bin/bash 
srun: job 309 queued and waiting for resources 
srun: job 309 has been allocated resources 
[user@node0001 ~]$ hostname 
bn01.stage.arc-ts.umich.edu 
[user@node0001 ~]$

Jobs submitted with srun –pty /bin/bash will be assigned the cluster default values of 1 CPU and 1024MB of memory. If additional resources are required, they can be requested as options to the srun command. The following example job is assigned 2 nodes with 4 CPUS and 4GB of memory each:

[user@login ~]$ srun --nodes=2 --ntasks-per-node=4 --mem-per-cpu=1GB --pty /bin/bash
srun: job 894 queued and waiting for resources
srun: job 894 has been allocated resources
[user@node0001 ~]$ srun hostname
node0001.armis2.arc-ts.umich.edu
node0001.armis2.arc-ts.umich.edu
node0002.armis2.arc-ts.umich.edu
node0001.armis2.arc-ts.umich.edu
node0001.armis2.arc-ts.umich.edu
node0002.armis2.arc-ts.umich.edu
node0002.armis2.arc-ts.umich.edu
node0002.armis2.arc-ts.umich.edu

In the above example srun is used within the job from the first compute node to run a command once for every task in the job on the assigned resources. srun can be used to run on a subset of the resources assigned to the job. See the srun man page for more details.

GPU and Large Memory Jobs

Jobs can request GPUs with the job submission options --partition=gpu and --gres=gpu:<count>. GPUs can be requested in both Batch and Interactive jobs.

Similarly, jobs can request nodes with large amounts of RAM with --partition=largemem.

Job Status

Most of a job’s specifications can be seen by invoking scontrol show job <jobID>.  More details about the job can be written to a file by using  scontrol write batch_script <jobID> output.txt. If no output file is specified, the script will be written to slurm<jobID>.sh.

A job’s record remains in Slurm’s memory for 30 minutes after it completes.  scontrol show job will return “Invalid job id specified” for a job that completed more than 30 minutes ago.  At that point, one must invoke the sacct command to retrieve the job’s record from the Slurm database.

To view TRES (Trackable RESource) utilization by user or account, use the following commands (substitute your values for bolded parts):

Shows TRES usage by all users on account during date range:
sreport cluster UserUtilizationByAccount start=mm/dd/yy end=mm/dd/yy account=test --tres type
Shows TRES usage by specified user(s) on account during date range:
sreport cluster UserUtilizationByAccount start=mm/dd/yy end=mm/dd/yy users=un1,un2 account=test --tres type
Lists users alphabetically along with TRES usage and total during date range:
sreport cluster AccountUtilizationByUser start=mm/dd/yy end=mm/dd/yy tree account=test --tres type
Possible TRES types:

cpu
mem
node
gres/gpu

For more reporting options, see the Slurm sreport documentation.

Software

By |

The Armis2 cluster uses the Lmod modules system to provide access to centrally installed software. If you used a cluster at UM previously, then you should review the documentation for the module system as we have changed the configuration to match that used at most national clusters and most other university clusters.

In particular, you should use the command module keyword to look for a module and do not use module available to search for software, as module available will only show software for which all the dependencies (or prerequisites) are already loaded.

So, to search for the software package FFTW, use

$ module keyword fftw

That will show which versions are installed and provide a command to determine what is needed to load it.

Please see our page on using the Lmod modules system for more details and examples.

There are two main categories of software available on the system: software that is installed as part of the installation of the operating system and software that is installed separately. No special action is needed to use the software installed with the operating system. The separately installed software is set up so that you will use a module to use it. The module will set up the environment and make the software available. We do it this way to enable having multiple versions of the same package and to avoid having conflicts between software packages that have mutually exclusive system requirements.

Requesting software licenses

Many of the software packages that are licensed for use on ARC clusters are licensed for a limited number of concurrent uses. If you will use one of those packages, then you must request a license or licenses in your submission script. As an example, to request one Stata license, you would use

#SBATCH --licenses=stata@slurmdb:1

The list of software can be found from Armis2 by using the command

$ scontrol show licenses

Policies

By |

ARMIS2 TERMS OF USAGE

  1. Limited data restoration. The data in your home directory can be restored from snapshots going back 3 days.  Anything beyond 3 days can not be retrieved.  Data stored on outside your home directory such as a group share will be subject to other data-lifetime policies that is setup at the time of purchasing the respective Turbo NFS volume. You are responsible for mitigating your own risk. We suggest you store copies of hard-to-reproduce data in your home directory or on HIPAA-aligned storage you own or purchased from Turbo.
  2. System usage is tracked and is used for billing reports and capacity planning. Job metadata (example: walltime, resource utilization, software accessed) is stored and used to generate usage reports and to analyze patterns and trends. ARC-TS may report this metadata, including your individual metadata data, to your adviser, department head, dean, or other administrator or supervisor for billing or capacity planning purposes.
  3. Maintaining the overall stability of the system is paramount to us. While we make every effort to ensure that every job completes with the most efficient and accurate way possible, the good of the whole is more important to us than the good of an individual. This may affect you, but mostly we hope it benefits you. System availability is based on our best efforts. We are staffed to provide support during normal business hours. We try very hard to provide support as broadly as possible, but cannot guarantee support on a 24 hour per day basis. Additionally, we perform system maintenance on a periodic basis, driven by the availability of software updates, staffing availability, and input from the user community. We do our best to schedule around your needs, but there will be times when the system is unavailable. For scheduled outages, we will announce them at least one month in advance on the ARC-TS home page; for unscheduled outages we will announce them as quickly as we can with as much detail as we have on that same page. You can also track ARC-TS at Twitter name @ARCTS_UM.
  4. Armis2 is intended only for non-commercial, academic research and instruction. Commercial use of some of the software on Armis2 is prohibited by software licensing terms. Prohibited uses include product development or validation, software use supporting any service for which a fee is charged, and, in some cases, research involving proprietary data that will not be made available publicly regardless whether the research is published . Please contact hpc-support@umich.edu if you have any questions about this policy, or about whether your work may violate these terms.
  5. Data subject to export control and HIPAA regulations may be stored or processed on the cluster. The appropriate storage solution for storing export controlled information or PHI that can be accessed on the Armis2 cluster is the  Turbo-NFSv4 with Kerberos offering(See the Sensitive Data Restrictions for Turbo-NFSv4 with Kerberos for further details). It is your responsibility, not ARC’s, to be aware of and comply with all applicable laws, regulations, and universities policies (e.g., ITAR, EAR, HIPAA) as part of any research activity that may raise compliance issues under those laws. For assistance with export controlled research, contact the U-M Export Control Officer at exportcontrols@umich.edu. For assistance with HIPAA-related computational research, contact the ARC liaison to the Medical School at msis.help@umich.edu.

USER RESPONSIBILITIES

Users should make requests by email to hpc-support@umich.edu:

  • Renew allocations at least 2 business days before your current allocation expires in order to have the new allocation provisioned before the old one expires.
  • One day in advance, request users be added to Armis2 accounts you may administer.

Users are responsible for security and compliance related to sensitive code and/or data. Security and compliance are shared responsibilities. If you process or store sensitive university data, software, or libraries on the cluster, you are responsible for understanding and adhering to any relevant legal, regulatory or contractual requirements.

Users are responsible for maintaining MCommunity groups used for MReport authorizations.

Users must manage PHI (protected health information) appropriately and can use the following locations:

  • /home
  • /scratch
  • /tmp
  • Any appropriate PHI-compliant NFS volume mounted on Armis2

LOGIN NODE POLICIES

Appropriate uses for the login nodes:

  • Transferring small files to and from the cluster
  • Creating, modifying, and compiling code and submission scripts
  • Submitting and monitoring the status of jobs
  • Testing executables to ensure they will run on the cluster and its infrastructure. Processes are limited to a maximum of 15 minutes of CPU time to prevent runaway processes and overuse.

Any other uses of the login node may result in the termination of the process in violation. Any production processes (including post processing) should be submitted through the batch system to the cluster. If interactive use is required then you should submit an interactive job to the cluster.