1. Get Duo
2. Get an Armis2 user login
You must establish user login on Armis2 by filling out this form. If you had a login on Armis, you should have one on Armis2.
3. Get an SSH Client & Connect to Armis2 Login Node
The login node (armis2.arc-ts.umich.edu) is the entry point into the cluster. It is accessible from the Ann Arbor, Dearborn, and Flint campus IP addresses and from the U-M VPN network only and require a valid user account and Duo authentication to log in. They are a shared resource and, as such, it is expected that users do not monopolize them.
If you are trying to log in from off campus, or using an unauthenticated wireless network such as MGuest, you have a couple of options:
- Install VPN software on your computer
- SSH to login.itd.umich.edu and continue with the Linux instructions
See the policies below governing appropriate use of the login nodes.
Mac or Linux:
Open Terminal and type:
Windows (using PuTTY):
Download and install PuTTY here.
Launch PuTTY and enter armis2.arc-ts.umich.edu as the host name, then click open.
All Operating Systems:
4. Get files
You can use SFTP (best for simple transfers of small files) or Globus (best for large files or a commonly used endpoint) to transfer data to your /home directory.
SFTP: Mac or Windows using FileZilla
- Open FileZilla and click the “Site Manager” button
- Create a New Site, which you can name “Armis2” or something similar
- Select the “SFTP (SSH File Transfer Protocol)” option
- In the Host field, type armis2-xfer.arc-ts.umich.edu
- Select “Interactive” for Logon Type
- In the User field, type your uniqname
- Click “Connect”
- Enter your Kerberos password
- Select your Duo method (1-3) and complete authentication
- Drag and drop files between the two systems
- Click “Disconnect” when finished
On Windows, you can also use WinSCP with similar settings, available alongside PuTTY here.
SFTP: Mac or Linux using Terminal
To copy a single file, type:
scp localfile firstname.lastname@example.org:~/remotefile
To copy an entire directory, type:
scp -r localdir email@example.com:~/remotedir
These commands can also be reversed in order to copy files from Armis2 to your machine:
scp -r firstname.lastname@example.org:~/remotedir localdir
You will need to authenticate via Duo to complete the file transfer.
5. Submit a job
This is a simple guide to get your jobs up and running. For more advanced Slurm features, see the Slurm User Guide for Armis2. If you are familiar with using the resource manager Torque, you may find the migrating from Torque to Slurm guide useful.
Most work will be queued to be run on Armis2 and is described through a batch script. The sbatch command is used to submit a batch script to Slurm. To submit a batch script simply run the following from a shared file system; those include your home directory, /scratch, and any directory under /nfs that you can normally use in a job on Armis. Output will be sent to this working directory (jobName-jobID.log). Do not submit jobs from /tmp or any of its subdirectories.
$ sbatch myJob.sh
The batch job script is composed of three main components:
- The interpreter used to execute the script
- #SBATCH directives that convey submission options
- The application(s) to execute along with its input arguments and options
#!/bin/bash # The interpreter used to execute the script #“#SBATCH” directives that convey submission options: #SBATCH --job-name=example_job #SBATCH --mail-type=BEGIN,END #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --mem-per-cpu=1000m #SBATCH --time=10:00 #SBATCH --account=test #SBATCH --partition=standard # The application(s) to execute along with its input arguments and options: /bin/hostname sleep 60
How many nodes and processors you request will depend on the capability of your software and what it can do. There are four common scenarios.
This is the simplest case and is shown in the example above. The majority of software cannot use more than this. Some examples of software for which this would be the right configuration are SAS, Stata, R, many Python programs, most Perl programs.
#!/bin/bash #SBATCH --job-name JOBNAME #SBATCH --nodes=1 #SBATCH --cpus-per-task=1 #SBATCH --mem-per-cpu=1g #SBATCH --time=00:15:00 #SBATCH --account=test #SBATCH --partition=standard #SBATCH --mail-type=NONE srun hostname -s
This is similar to what a modern desktop or laptop is likely to have. Software that can use more than one processor may be described as multicore, multiprocessor, or mulithreaded. Some examples of software that can benefit from this are MATLAB and Stata/MP. You should read the documentation for your software to see if this is one of its capabilities.
#!/bin/bash #SBATCH --job-name JOBNAME #SBATCH --nodes=1 #SBATCH --ntasks-per-node=4 #SBATCH --mem-per-cpu=1g #SBATCH --time=00:15:00 #SBATCH --account=test #SBATCH --partition=standard #SBATCH --mail-type=NONE srun hostname -s
This is the classic MPI approach, where multiple machines are requested, one process per processor on each node is started using MPI. This is the way most MPI-enabled software is written to work.
#!/bin/bash #SBATCH --job-name JOBNAME #SBATCH --nodes=2 #SBATCH --ntasks-per-node=4 #SBATCH --mem-per-cpu=1g #SBATCH --time=00:15:00 #SBATCH --account=test #SBATCH --partition=standard #SBATCH --mail-type=NONE srun hostname -s
This is often referred to as the “hybrid mode” MPI approach, where multiple machines are requested and multiple processes are requested. MPI will start a parent process or processes on each node, and those in turn will be able to use more than one processor for threaded calculations.
#!/bin/bash #SBATCH --job-name JOBNAME #SBATCH --nodes=2 #SBATCH --ntasks-per-node=4 #SBATCH --cpus-per-task=4 #SBATCH --mem-per-cpu=1g #SBATCH --time=00:15:00 #SBATCH --account=test #SBATCH --partition=standard #SBATCH --mail-type=NONE srun hostname -s
Common Job Submission Options
|Description||Slurm directive (#SBATCH option)||Armis2 Usage|
Available partitions: standard, gpu (GPU jobs only), largemem (large memory jobs only)
|Wall time limit||--time=<hh:mm:ss>||--time=02:00:00|
|Process count per node||--ntasks-per-node=<count>||--ntasks-per-node=1|
|Minimum memory per processor||--mem-per-cpu=<memory>||--mem-per-cpu=1000m|
|Request software license(s)||--licenses=<application>@slurmdb:<N>||--licenses=stata@slurmdb:1
requests one license for Stata
|Request event notification||
Note: multiple mail-type requests may be specified in a comma separated list:
Please note that if your job is set to utilize more than one node, make sure your code is MPI enabled in order to run across these nodes and you must use srun rather then mpirun or mpiexec. More advanced job submission options can be found in the Slurm User Guide for Armis2.
An interactive job is a job that returns a command line prompt (instead of running a script) when the job runs. Interactive jobs are useful when debugging or interacting with an application. The srun command is used to submit an interactive job to Slurm. When the job starts, a command line prompt will appear on one of the compute nodes assigned to the job. From here commands can be executed using the resources allocated on the local node.
[user@login ~]$ srun --pty /bin/bash srun: job 309 queued and waiting for resources srun: job 309 has been allocated resources [user@node0001 ~]$ hostname bn01.stage.arc-ts.umich.edu [user@node0001 ~]$
Jobs submitted with srun –pty /bin/bash will be assigned the cluster default values of 1 CPU and 1024MB of memory. If additional resources are required, they can be requested as options to the srun command. The following example job is assigned 2 nodes with 4 CPUS and 4GB of memory each:
[user@login ~]$ srun --nodes=2 --ntasks-per-node=4 --mem-per-cpu=1GB --pty /bin/bash srun: job 894 queued and waiting for resources srun: job 894 has been allocated resources [user@node0001 ~]$ srun hostname node0001.armis2.arc-ts.umich.edu node0001.armis2.arc-ts.umich.edu node0002.armis2.arc-ts.umich.edu node0001.armis2.arc-ts.umich.edu node0001.armis2.arc-ts.umich.edu node0002.armis2.arc-ts.umich.edu node0002.armis2.arc-ts.umich.edu node0002.armis2.arc-ts.umich.edu
In the above example srun is used within the job from the first compute node to run a command once for every task in the job on the assigned resources. srun can be used to run on a subset of the resources assigned to the job. See the srun man page for more details.
GPU and Large Memory Jobs
Jobs can request GPUs with the job submission options --partition=gpu and --gres=gpu:<count>. GPUs can be requested in both Batch and Interactive jobs.
Similarly, jobs can request nodes with large amounts of RAM with --partition=largemem.
Most of a job’s specifications can be seen by invoking scontrol show job <jobID>. More details about the job can be written to a file by using scontrol write batch_script <jobID> output.txt. If no output file is specified, the script will be written to slurm<jobID>.sh.
A job’s record remains in Slurm’s memory for 30 minutes after it completes. scontrol show job will return “Invalid job id specified” for a job that completed more than 30 minutes ago. At that point, one must invoke the sacct command to retrieve the job’s record from the Slurm database.
To view TRES (Trackable RESource) utilization by user or account, use the following commands (substitute your values for bolded parts):
Shows TRES usage by all users on account during date range:
sreport cluster UserUtilizationByAccount start=mm/dd/yy end=mm/dd/yy account=test --tres type
Shows TRES usage by specified user(s) on account during date range:
sreport cluster UserUtilizationByAccount start=mm/dd/yy end=mm/dd/yy users=un1,un2 account=test --tres type
Lists users alphabetically along with TRES usage and total during date range:
sreport cluster AccountUtilizationByUser start=mm/dd/yy end=mm/dd/yy tree account=test --tres type
Possible TRES types:
For more reporting options, see the Slurm sreport documentation.