Great Lakes FAQ

1. What is Great Lakes?

Great Lakes is an ARC-TS managed HPC cluster available to faculty (PIs) and their students/researchers.  All computational work is scheduled via the Slurm resource manager and task scheduler.  For detailed hardware information, see the configuration page.  Great Lakes is not suitable for HIPAA or other sensitive data.

2. What forms do I need to fill out?

  1. The Principal Investigator (PI) needs to request a Slurm account, specifying users that can access the account, the people which can administer that account, and payment details.
  2. Each user given access to the account must request a user login.  Please refer to the Great Lakes User Guide for additional steps and usage information.

3. How can I get a trial account on Great Lakes?

If you are a PI that hasn’t used Great Lakes before, you are eligible for a limited trial account.  This account will have $150 worth of cluster time (see the rates page) and will be unable to run jobs after 1 month.  If interested, please contact HPC support specifying that you’d like a trial account with lists of users and admins.

4. Will my Turbo storage be available on Great Lakes?

Since Turbo is a storage service independent of Great Lakes, users that utilized Turbo on Flux will still be able to access their data on Great Lakes.  The cost of Turbo will not change and no data needs to be transferred.  If you have trouble accessing Turbo, please contact HPC support.

5. How do I submit jobs using a web interface?

Great Lakes utilizes Open OnDemand to enable web-based job submission, manage the files in their home directory, view/delete active jobs, and open a web terminal session. Users can also use Matlab, Jupyter Notebooks, RStudio, and get a remote desktop.

You must be on campus or on the VPN to connect to Great Lakes OnDemand.  For more information, see the OnDemand section of the Great Lakes User Guide.

6. How do I view the resource usage on my account?

To view TRES (Trackable RESource) utilization by user or account, use the following commands (substitute bold variables):

Shows TRES usage by all users on account during date range:
sreport cluster UserUtilizationByAccount start=mm/dd/yy end=mm/dd/yy account=test --tres type
Shows TRES usage by specified user(s) on account during date range:
sreport cluster UserUtilizationByAccount start=mm/dd/yy end=mm/dd/yy users=un1,un2 account=test --tres type
Lists users alphabetically along with TRES usage and total during date range:
sreport cluster AccountUtilizationByUser start=mm/dd/yy end=mm/dd/yy tree account=test --tres type
Possible TRES types:

cpu
mem
node
gres/gpu

To view disk usage and availability by user, type:

home-quota -u uniqname

For more reporting options, see the Slurm sreport documentation.

7. What is a “root (_root) account”?

Each PI or project has a collection of Slurm accounts which could be used for different purposes (e.g. different grants or focuses of research) with different users.  These Slurm accounts are contained within the PI/project’s root account (e.g. research_root).  For example:

researcher_root
    researcher
        user1
        user2
    researcher1
        user2
        user3

These accounts can have different limits on them, and are also collectively limited for /scratch usage and overall cluster usage.

8. As a PI, how can I limit usage on my account?

Principal Investigators can request that CPU, GPU, memory, billing units, and walltime be limited per user or group of users on their account.  For more information, see the Great Lakes policy documentation.

Limits must be requested by emailing HPC support.

9. As a PI, can I purchase my own nodes for Great Lakes?

PIs may purchase hardware for use on the Lighthouse cluster by emailing HPC support to develop a hardware plan.  Lighthouse utilizes the same Slurm job scheduler and infrastructure as Great Lakes, but purchased nodes can be used exclusively by the PI’s group.

10. What does my job status mean?

When listing your submitted jobs with squeue -u uniqname, the final column titled “NODELIST(REASON)” will give you the reason that the job is not running yet.  The possible statuses are:

Resources

This job is waiting for the resources (CPUs, Memory, GPUs) it requested to become available. Resources become available when currently running jobs complete. The job with Resources in the NODELIST(REASON) column is the top priority job and should be started next.

Priority

This job is not the top priority, so it must wait in the queue until it becomes the top priority job. Once it becomes the top priority job, the NODELIST(REASON) column will change to “Resources”. The priority of all pending jobs can be shown with the sprio command. A job’s priority is determined by two factors: fairshare and age. The fairshare factor in a job’s priority is influenced by the amount of resources that have been consumed by members of your Slurm account. More recent usage means a lower fairshare priority. The age factor is determined by the job’s queued time. The longer the job has been waiting in the queue, the higher the age priority.

AssocGrpCpuLimit

This job was submitted with a Slurm account that has a limit set on the number of CPUs that may be used at one time. This limit is set for all jobs by all users of the same Slurm account. Once some of the jobs running under this Slurm account complete, this reason will change to Priority or Resources unless there is some other limit or dependency. All jobs running under a given Slurm account can be viewed by running squeue --account=account_name

AssocGrpGRES

This job was submitted with a Slurm account that has a limit set on the number of GPUs that may be used at one time. This limit is set for all jobs by all users of the same Slurm account. Once some of the jobs running under this Slurm account complete, this reason will change to Priority or Resources unless there is some other limit or dependency. All jobs running under a given Slurm account can be viewed by running squeue --account=account_name

AssocGrpMem

This job was submitted with a Slurm account that has a limit set on the amount of memory that may be used at one time. This limit is set for all jobs by all users of the same Slurm account. Once some of the jobs running under this Slurm account complete, this reason will change to Priority or Resources unless there is some other limit or dependency. All jobs running under a given Slurm account can be viewed by running squeue --account=account_name

AssocGrpBillingMinutes

This job was submitted with a Slurm account that has a limit set on the amount of monetary charges that may be accrued. Jobs that are pending with this reason will not start until the limit has been raised or the monthly bill has been processed.

Dependency

This job has a dependency on another job. It will not start until that dependency is met. The most common dependency is waiting for another job to complete.

QOSMinGRES

This job was submitted to the GPU partition, but did not request a GPU. This job will never start. This job should be deleted and resubmitted to a different partition or if a GPU is needed, resubmitted to the GPU partition with a GPU request. A GPU can be requested by adding the following line to a batch script: #SBATCH --gres=gpu:1

11. How Can I Access On-Campus Restricted Software?

From the Command Line

Log into an on-campus login node via ssh client to gl-campus-login.arc-ts.umich.edu

From Open On-Demand

Open your browser (Firefox, Edge, or Chrome in an incognito tab – recommended) and navigate to greatlakes-oncampus.arc-ts.umich.edu.

If you have a problem not listed here, please send an email to hpc-support@umich.edu.

Order Service

Billing for the Great Lakes service began on January 6, 2020. Existing, active Flux accounts and logins have been added to the Great Lakes Cluster. Complete this form to get a new Great Lakes cluster login.

If you would like to create a Great Lakes Cluster account or have any questions, contact hpc-support@umich.edu with lists of users, admins, and a shortcode. Trial accounts are also available for new PIs.