1. What is Great Lakes?
Great Lakes is an ARC-TS managed HPC cluster available to faculty (PIs) and their students/researchers. All computational work is scheduled via the Slurm resource manager and task scheduler. For detailed hardware information, see the configuration page. Great Lakes is not suitable for HIPAA or other sensitive data.
2. What forms do I need to fill out?
- The Principal Investigator (PI) needs to request a Slurm account, specifying users that can access the account, the people which can administer that account, and payment details.
- Each user given access to the account must request a user login. Please refer to the Great Lakes User Guide for additional steps and usage information.
3. How can I get a trial account on Great Lakes?
If you are a PI that hasn’t used Great Lakes before, you are eligible for a limited trial account. This account will have $150 worth of cluster time (see the rates page) and will be unable to run jobs after 1 month. If interested, please contact HPC support specifying that you’d like a trial account with lists of users and admins.
4. Will my Turbo storage be available on Great Lakes?
Since Turbo is a storage service independent of Great Lakes, users that utilized Turbo on Flux will still be able to access their data on Great Lakes. The cost of Turbo will not change and no data needs to be transferred. If you have trouble accessing Turbo, please contact HPC support.
5. How do I view the resource usage on my account?
To view TRES (Trackable RESource) utilization by user or account, use the following commands (substitute bold variables):
Shows TRES usage by all users on account during date range:
sreport cluster UserUtilizationByAccount start=mm/dd/yy end=mm/dd/yy account=test --tres type
Shows TRES usage by specified user(s) on account during date range:
sreport cluster UserUtilizationByAccount start=mm/dd/yy end=mm/dd/yy users=un1,un2 account=test --tres type
Lists users alphabetically along with TRES usage and total during date range:
sreport cluster AccountUtilizationByUser start=mm/dd/yy end=mm/dd/yy tree account=test --tres type
Possible TRES types:
To view disk usage and availability by user, type:
home-quota -u uniqname
For more reporting options, see the Slurm sreport documentation.
6. What is a “root (_root) account”?
Each PI or project has a collection of Slurm accounts which could be used for different purposes (e.g. different grants or focuses of research) with different users. These Slurm accounts are contained within the PI/project’s root account (e.g. research_root). For example:
researcher_root researcher user1 user2 researcher1 user2 user3
These accounts can have different limits on them, and are also collectively limited for /scratch usage and overall cluster usage.
7. As a PI, how can I limit usage on my account?
Principal Investigators can request that CPU, GPU, memory, billing units, and walltime be limited per user or group of users on their account. For more information, see the Great Lakes policy documentation.
Limits must be requested by emailing HPC support.
8. As a PI, can I purchase my own nodes for Great Lakes?
PIs may purchase hardware for use on the Lighthouse cluster by emailing HPC support to develop a hardware plan. Lighthouse utilizes the same Slurm job scheduler and infrastructure as Great Lakes, but purchased nodes can be used exclusively by the PI’s group.
9. What does my job status mean?
When listing your submitted jobs with
squeue -u uniqname, the final column titled “NODELIST(REASON)” will give you the reason that the job is not running yet. The possible statuses are:
This job is waiting for the resources (CPUs, Memory, GPUs) it requested to become available. Resources become available when currently running jobs complete. The job with Resources in the NODELIST(REASON) column is the top priority job and should be started next.
This job is not the top priority, so it must wait in the queue until it becomes the top priority job. Once it becomes the top priority job, the NODELIST(REASON) column will change to “Resources”. The priority of all pending jobs can be shown with the
sprio command. A job’s priority is determined by two factors: fairshare and age. The fairshare factor in a job’s priority is influenced by the amount of resources that have been consumed by members of your Slurm account. More recent usage means a lower fairshare priority. The age factor is determined by the job’s queued time. The longer the job has been waiting in the queue, the higher the age priority.
This job was submitted with a Slurm account that has a limit set on the number of CPUs that may be used at one time. This limit is set for all jobs by all users of the same Slurm account. Once some of the jobs running under this Slurm account complete, this reason will change to Priority or Resources unless there is some other limit or dependency. All jobs running under a given Slurm account can be viewed by running
This job was submitted with a Slurm account that has a limit set on the number of GPUs that may be used at one time. This limit is set for all jobs by all users of the same Slurm account. Once some of the jobs running under this Slurm account complete, this reason will change to Priority or Resources unless there is some other limit or dependency. All jobs running under a given Slurm account can be viewed by running
This job was submitted with a Slurm account that has a limit set on the amount of memory that may be used at one time. This limit is set for all jobs by all users of the same Slurm account. Once some of the jobs running under this Slurm account complete, this reason will change to Priority or Resources unless there is some other limit or dependency. All jobs running under a given Slurm account can be viewed by running
This job was submitted with a Slurm account that has a limit set on the amount of monetary charges that may be accrued. Jobs that are pending with this reason will not start until the limit has been raised or the monthly bill has been processed.
This job has a dependency on another job. It will not start until that dependency is met. The most common dependency is waiting for another job to complete.
This job was submitted to the GPU partition, but did not request a GPU. This job will never start. This job should be deleted and resubmitted to a different partition or if a GPU is needed, resubmitted to the GPU partition with a GPU request. A GPU can be requested by adding the following line to a batch script:
10. How Can I Access On-Campus Restricted Software?
From the Command Line
Log into an on-campus login node via ssh client to gl-campus-login.arc-ts.umich.edu
From Open On-Demand
Open your browser (Firefox, Edge, or Chrome in an incognito tab – recommended) and navigate to greatlakes-oncampus.arc-ts.umich.edu.