Explore ARCExplore ARC

Policies

By |

Terms of Usage and User Responsibility

  1. Data is not backed up. None of the data on Beta is backed up. The data that you keep in your home directory, /tmp or any other filesystem is exposed to immediate and permanent loss at all times. You are responsible for mitigating your own risk. We suggest you store copies of hard-to-reproduce data on systems that are backed up, for example, the AFS filesystem maintained by ITS.
  2. Your usage is tracked and may be used for reports. We track a lot of job data and store it for a long time. We use this data to generate usage reports and look at patterns and trends. We may report this data, including your individual data, to your adviser, department head, dean, or other administrator or supervisor.
  3. Maintaining the overall stability of the system is paramount to us. While we make every effort to ensure that every job completes with the most efficient and accurate way possible, the stability of the cluster is our primary concern. This may affect you, but mostly we hope it benefits you. System availability is based on our best efforts. We are staffed to provide support during normal business hours. We try very hard to provide support as broadly as possible, but cannot guarantee support on a 24 hours a day basis. Additionally, we perform system maintenance on a periodic basis, driven by the availability of software updates, staffing availability, and input from the user community. We do our best to schedule around your needs, but there will be times when the system is unavailable. For scheduled outages, we will announce them at least one month in advance on the ARC-TS home page; for unscheduled outages we will announce them as quickly as we can with as much detail as we have on that same page. You can also track ARC-TS on Twitter (@ARC-TS ).
  4. Beta is intended only for non-commercial, academic research and instruction. Commercial use of some of the software on Beta is prohibited by software licensing terms. Prohibited uses include product development or validation, any service for which a fee is charged, and, in some cases, research involving proprietary data that will not be made available publicly. Please contact hpc-support@umich.edu if you have any questions about this policy, or about whether your work may violate these terms.
  5. You are responsible for the security of sensitive codes and data. If you will be storing export-controlled or other sensitive or secure software, libraries, or data on the cluster, it is your responsibility that is is secured to the standards set by the most restrictive governing rules.  We cannot reasonably monitor everything that is installed on the cluster, and cannot be responsible for it, leaving the responsibility with you, the end user.
  6. Data subject to HIPAA regulations may not be stored or processed on the cluster.

USER RESPONSIBILITIES

Users must manage data appropriately in their various locations:

  • /home
  • /scratch
  • /tmp and /var/tmp
  • customer-provided NFS

SECURITY ON BETA / USE OF SENSITIVE DATA

The Beta high-performance computing system at the University of Michigan has been built to provide a preview of the Great Lakes HPC cluster. Beta has the same security stance as the Flux cluster.

Applications and data are protected by secure physical facilities and infrastructure as well as a variety of network and security monitoring systems. These systems provide basic but important security measures including:

  • Secure access – All access to Beta is via ssh or Globus. Ssh has a long history of high-security.
  • Built-in firewalls – All of the Beta computers have firewalls that restrict access to only what is needed.
  • Unique users – Beta adheres to the University guideline of one person per login ID and one login ID per person.
  • Multi-factor authentication (MFA) – For all interactive sessions, Beta requires both a UM Kerberos password and Duo authentication. File transfer sessions require a Kerberos password.
  • Private Subnets – Other than the login and file transfer computers that are part of Beta, all of the computers are on a network that is private within the University network and are unreachable from the Internet.
  • Flexible data storage – Researchers can control the security of their own data storage by securing their storage as they require and having it mounted via NFSv3 or NFSv4 on Beta. Another option is to make use of Beta’s local scratch storage, which is considered secure for many types of data. Note: Beta is not considered secure for data covered by HIPAA.

Software

By |

There is a wide variety of software installed on Beta. For more information, please see the Beta software page.

There are two main categories of software available on the system: software that is installed as part of the installation of the operating system and software that is installed separately. No special action is needed to use the software installed with the operating system. The separately installed software is set up so that you will use a module to use it. The module will set up the environment and make the software available. We do it this way to enable having multiple versions of the same package and to avoid having conflicts between software packages that have mutually exclusive system requirements.

See our page on using the Lmod modules system.

Getting Started

By |

1. Get Duo

You must use Duo authentication to log on to Beta.  Get more details and enroll here.

2. Get a Flux user account

All Flux users have access to Beta.  You must establish an account on Flux by filling out this form.

3. Get an SSH Client & Connect to Beta

You must be on campus or on the VPN to connect to Beta.  If you are trying to log in from off campus, or using an unauthenticated wireless network such as MGuest, you have a couple of options:

Mac or Linux:

Open Terminal and type:

ssh uniqname@beta.arc-ts.umich.edu

Windows (using PuTTY):

Download and install PuTTY here.

Launch PuTTY and enter beta.arc-ts.umich.edu as the host name, then click open.

All Operating Systems:

At the “Enter a passcode or select one of the following options:” prompt, type the number of your preferred choice for Duo authentication.

4. Get files

You can use SFTP (best for simple transfers of small files) or Globus (best for large files or a commonly used endpoint) to transfer data to your /home directory.

SFTP: Mac or Windows using FileZilla
  1. Open FileZilla and click the “Site Manager” button
  2. Create a New Site, which you can name “Beta” or something similar
  3. Select the “SFTP (SSH File Transfer Protocol)” option
  4. In the Host field, type beta.arc-ts.umich.edu
  5. Select “Interactive” for Logon Type
  6. In the User field, type your uniqname
  7. Click “Connect”
  8. Enter your Kerberos password
  9. Select your Duo method (1-3) and complete authentication
  10. Drag and drop files between the two systems
  11. Click “Disconnect” when finished

On Windows, you can also use WinSCP with similar settings, available alongside PuTTY here.

SFTP: Mac or Linux using Terminal

To copy a single file, type:

scp localfile uniqname@beta.arc-ts.umich.edu:~/remotefile

To copy an entire directory, type:

scp -r localdir uniqname@beta.arc-ts.umich.edu:~/remotedir

These commands can also be reversed in order to copy files from Beta to your machine:

scp -r uniqname@beta.arc-ts.umich.edu:~/remotedir localdir

You will need to authenticate via Duo to complete the file transfer.

Globus: Windows, Mac, or Linux

Globus is a reliable high performance parallel file transfer service provided by many HPC sites around the world. It enables easy transfer of files from one system to another, as long as they are Globus endpoints.

  • The Globus endpoint for Beta is “umich#beta”.
How to use Globus

Globus Online is a web front end to the Globus transfer service. Globus Online accounts are free and you can create an account with your University identity.

  • Set up your Globus account and learn how to transfer files using the Globus documentation.  Select “University of Michigan” from the dropdown box to get started.
  • Once you are ready to transfer files, enter “umich#beta” as one of your endpoints.
Globus Connect Personal

Globus Online also allows for simple installation of a Globus endpoint for Windows, Mac, and Linux desktops and laptops.

  • Follow the Globus instructions to download the Globus Connect Personal installer and set up an endpoint on your desktop or laptop.
Batch File Copies

A non-standard use of Globus Online is that you can use it to copy files form one location to another on the same cluster. To do this use the same endpoint (umich#beta as an example) for both the sending and receiving machines. Setup the transfer and Globus will make sure the rest happens. The service will email you when the copy is finished.

Command Line Globus

There are Command line tools for Globus that are intended for advanced users. If you wish to use these, contact HPC support.

5. Submit a job

This is a simple guide to get your jobs up and running. For more advanced Slurm features, see the Slurm User Guide for Beta. If you are familiar with using the resource manager Torque, you may find the migrating from Torque to Slurm guide useful.

Batch Jobs

The sbatch command is used to submit a batch script to Slurm. It is designed to reject the job at submission time if there are requests or constraints that Slurm cannot fulfill as specified. This gives the user the opportunity to examine the job request and resubmit it with the necessary corrections. To submit a batch script simply run:

$ sbatch myJob.sh

The batch job script is composed of four main components:

  • The interpreter used to execute the script
  • #SBATCH directives that convey submission options.
  • The setting of environment and/or script variables (if necessary)
  • The application(s) to execute along with its input arguments and options

Example:

#!/bin/bash
# The interpreter used to execute the script

#“#SBATCH” directives that convey submission options:

#SBATCH --job-name=example_job
#SBATCH --mail-user=uniqname@umich.edu
#SBATCH --mail-type=BEGIN,END
#SBATCH --cpus-per-task=1
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --mem-per-cpu=1000m 
#SBATCH --time=10:00
#SBATCH -A test
#SBATCH -p standard
#SBATCH --output=/home/%u/%x-%j.log

# The setting of environment and/or script variables (if necessary):
--export=EDITOR=/bin/vim

# The application(s) to execute along with its input arguments and options:

/bin/hostname
sleep 60
Common Job Submission Options
Option Slurm Command (sbatch) Beta Usage
Script directive #SBATCH #SBATCH
Job name --job-name=<name>

-J <name>

-J betajob1
Account --account=<account>

-A <account>

-A test
Queue --partition=standard

-p standard

-p standard

-p gpu (GPU jobs only)

Wall time limit --time=<hh:mm:ss> --time=02:00:00
Node count --nodes=<count>

-N <count>

-N 2
Process count per node --ntasks-per-node=<count> --ntasks-per-node=1
Core count (per process) --cpus-per-task=<cores> --cpus-per-task=1
Minimum memory per processor --mem-per-cpu=<memory> --mem-per-cpu=1000m
Standard output file --output=<file path> (path must exist) --output=/scratch/pname_flux/uame/jobOut
Copy environment --export=ALL (default)

--export=NONE (to not export environment)

--export=ALL
Copy environment variable --export=<variable=value,var2=val2> --export=EDITOR=/bin/vim
Request event notification

--mail-type=<events>

Note: multiple mail-type requests may be specified in a comma separated list:

--mail-type=BEGIN,END,NONE,FAIL,REQUEUE

--mail-type=BEGIN,END,FAIL

Email address --mail-user=<email address> --mail-user=uniqname@umich.edu

Please note that if your job is set to utilize more than one node, make sure your code is MPI enabled in order to run across these nodes.  More advanced job submission options can be found in the Slurm User Guide for Beta.

Interactive Jobs

An interactive job is a job that returns a command line prompt (instead of running a script) when the job runs. Interactive jobs are useful when debugging or interacting with an application. The srun command is used to submit an interactive job to Slurm. When the job starts, a command line prompt will appear on one of the compute nodes assigned to the job. From here commands can be executed using the resources allocated on the local node.

[user@beta-login ~]$ srun --pty /bin/bash 
srun: job 309 queued and waiting for resources 
srun: job 309 has been allocated resources 
[user@bn01 ~]$ hostname 
bn01.stage.arc-ts.umich.edu 
[user@bn01 ~]$

Jobs submitted with srun –pty /bin/bash will be assigned the cluster default values of 1 CPU and 1024MB of memory. If additional resources are required, they can be requested as options to the srun command. The following example job is assigned 2 nodes with 4 CPUS and 4GB of memory each:

[user@beta-login ~]$ srun --nodes=2 --ntasks-per-node=4 --mem-per-cpu=1GB --cpus-per-task=1 --pty /bin/bash
srun: job 894 queued and waiting for resources
srun: job 894 has been allocated resources
[user@bn01 ~]$ srun hostname
bn01.stage.arc-ts.umich.edu
bn02.stage.arc-ts.umich.edu
bn01.stage.arc-ts.umich.edu
bn01.stage.arc-ts.umich.edu
bn01.stage.arc-ts.umich.edu
bn02.stage.arc-ts.umich.edu
bn02.stage.arc-ts.umich.edu
bn02.stage.arc-ts.umich.edu

In the above example srun is used within the job from the first compute node to run a command once for every task in the job on the assigned resources. srun can be used to run on a subset of the resources assigned to the job. See the srun man page for more details.

GPU Jobs

Jobs can request GPUs with the job submission options --partition=gpu and --gres=gpu:<count>. GPUs can be requested in both Batch and Interactive jobs.

Job Status

Most of a job’s specifications can be seen by invoking scontrol show job <jobID>.  More details about the job can be written to a file by using  scontrol write batch_script <jobID> output.txt. If no output file is specified, the script will be written to slurm<jobID>.sh.

A job’s record remains in Slurm’s memory for 30 minutes after it completes.  scontrol show job will return “Invalid job id specified” for a job that completed more than 30 minutes ago.  At that point, one must invoke the sacct command to retrieve the job’s record from the Slurm database.

System Overview

By |

Beta is a preview cluster that allows users to become familiar with SchedMD’s Slurm Workload Manager, which will be used in our upcoming Great Lakes cluster. All standard compute nodes in the Beta cluster have 16 core Intel(R) Xeon(R) CPU E5-2670 processors at 2.60GHz and 64GB memory. There is one GPU node with 8 K20x GPUs.  Beta will leverage some existing infrastructure (minimal software via modules, storage, etc).