Slurm partitions represent collections of nodes, and are equivalent to Torque queues. For more Great Lakes hardware specifications, see the Configuration page.
- debug: The goal of debug is to allow users to run jobs for debugging purposes right away.
- Max walltime: 4 hours
- Max jobs per user: 1
- Highest priority
- standard: Standard compute nodes used for most work.
- Max walltime: 14 days
- Default partition if none specified
- gpu: Allows use of NVIDIA Tesla V100 GPUs.
- Max walltime: 14 days
- Max GPUs per account: 5
- viz: Allows use of NVIDIA Tesla P40 GPUs for visualization purposes.
- Max walltime: 1 day
- Max GPUs per account: 1
- largemem: Allows use of a compute node with 1.5 TB of RAM.
- Max walltime: 14 days
- Max memory: 33% of partition (currently 1.5 TB or 1 node)
- standard-oc: These nodes will be configured with an additional software mount with oncampus exclusive software, but are otherwise identical to standard compute nodes.
Slurm associations are a combination of cluster, account, user names and optionally a partition. An association can have limits (e.g. account ‘testaccount’ using partition ‘msbritt’ on cluster ‘greatlakes’ has a running job limit of X). TRES (Trackable Resources) are resources which can be tracked for usage or used to enforce limits. Common examples include CPU, memory, and GRES for GPUs.
Current Great Lakes partition limits:
- 3,456 CPU cores
- 18 TB RAM
- 180 CPU cores,
- 945 GB RAM
- 10 GPUs (Tesla V100)
- 36 CPU cores
- 189 GB RAM
- 1 GPU (Tesla P40)
- 36 CPU cores
- 1.5 TB RAM (1 node)
Limits can be set on the user association as well as the account association. This allows a PI to limit individual users or the collective set of users in an account as the PI sees fit. The following values can be limited if requested:
- Maximum number of jobs each user is allowed to run at one time in this association
- Maximum wall clock time each job is able to use in this association
- MaxTRES (CPU, Memory, GPU or billing units)
- Maximum number of TRES each job is able to use in this association
- GrpTRES (billing units)
- Maximum number of TRES running jobs are able to be allocated in aggregate for this association and all associations which are children of this association
Please contact ARC-TS if you would like to implement any of these limits.
A job will be charged based on the percentage of the node it uses (based on CPU, memory, and GPU usage). This is done by using the maximum of the weighted charges for CPU, memory (and GPU if appropriate). If you use 1 core and all the memory of the machine or all the cores and minimal memory, you’ll be charged for the entire machine.
Multiple shortcodes, up to four, can be used. If more than one shortcode is used, the amount charged to each can be split by percentage.
Great Lakes accounts can be initiated by sending an email to firstname.lastname@example.org.
Terms of Usage and User Responsibility
- Data is not backed up. None of the data on Great Lakes is backed up. The data that you keep in your home directory, /tmp or any other filesystem is exposed to immediate and permanent loss at all times. You are responsible for mitigating your own risk. ARC-TS provides more durable storage on Turbo, Locker, and Data Den. For more information on these, look here.
- Your usage is tracked and may be used for reports. We track a lot of job data and store it for a long time. We use this data to generate usage reports and look at patterns and trends. We may report this data, including your individual data, to your adviser, department head, dean, or other administrator or supervisor.
- Maintaining the overall stability of the system is paramount to us. While we make every effort to ensure that every job completes with the most efficient and accurate way possible, the stability of the cluster is our primary concern. This may affect you, but mostly we hope it benefits you. System availability is based on our best efforts. We are staffed to provide support during normal business hours. We try very hard to provide support as broadly as possible, but cannot guarantee support on a 24 hours a day basis. Additionally, we perform system maintenance on a periodic basis, driven by the availability of software updates, staffing availability, and input from the user community. We do our best to schedule around your needs, but there will be times when the system is unavailable. For scheduled outages, we will announce them at least one month in advance on the ARC-TS home page; for unscheduled outages we will announce them as quickly as we can with as much detail as we have on that same page. You can also track ARC-TS on Twitter (@ARC-TS ).
- Great Lakes is intended only for non-commercial, academic research and instruction. Commercial use of some of the software on Great Lakes is prohibited by software licensing terms. Prohibited uses include product development or validation, any service for which a fee is charged, and, in some cases, research involving proprietary data that will not be made available publicly. Please contact email@example.com if you have any questions about this policy, or about whether your work may violate these terms.
- You are responsible for the security of sensitive codes and data. If you will be storing export-controlled or other sensitive or secure software, libraries, or data on the cluster, it is your responsibility that is is secured to the standards set by the most restrictive governing rules. We cannot reasonably monitor everything that is installed on the cluster, and cannot be responsible for it, leaving the responsibility with you, the end user.
- Data subject to HIPAA regulations may not be stored or processed on the cluster.
Users must manage data appropriately in their various locations:
- 80 GB quota, mounted on Turbo
- Flux home directories will be mounted read-only on Great Lakes through 2019. Please move any needed data elsewhere during this time.
- /scratch (more information below)
- customer-provided NFS
SCRATCH STORAGE POLICIES
Every user has a /scratch directory for every Slurm account they are a member of. Additionally for that account, there is a shared data directory for collaboration with other members of that account. The account directory group ownership is set using the Slurm account-based UNIX groups, so all files created in the /scratch directory are accessible by any group member, to facilitate collaboration.
File quotas on /scratch are per root account (a PI or project account) and shared between child accounts (individual users):
- 10 TB storage limit
- 1 million file (inode) limit
If needed, these limits may be increased by contacting HPC support with an acceptable reason.
Users should keep in mind that /scratch has an auto-purge policy on unaccessed files, which means that any unaccessed data will be automatically deleted by the system after 60 days. Scratch file systems are not backed up. Critical files should be backed up to another location.
LOGIN NODE USAGE
Appropriate uses for the Great Lakes login nodes include:
- Transferring small files to and from the cluster
- Ordinary data management tasks, such as moving files, creating directories, etc.
- Creating, modifying, and compiling code and submission scripts
- Submitting and monitoring the status of jobs
- Testing executables to ensure they will run on the cluster and its infrastructure.
You should limit your use of Great Lakes login nodes to programs using 4 processors or fewer, less than 16 GB of memory, and that do not run longer than 5 minutes. Larger or longer processes may cause problems for other users of the login nodes. We reserve the right to terminate processes if we think they are causing or may cause a problem. If your program needs to run for more than 5 minutes or more extensive resources, you should use an interactive job.
Any other uses of the login nodes may result in the termination of the process in violation. Any production processes (including post processing) should be submitted through the batch system to the cluster. If interactive use is required then you should submit an interactive job to the cluster.
SECURITY ON GREAT LAKES & USE OF SENSITIVE DATA
Applications and data are protected by secure physical facilities and infrastructure as well as a variety of network and security monitoring systems. These systems provide basic but important security measures including:
- Secure access – All access to Great Lakes is via SSH or Globus. SSH has a long history of high-security.
- Built-in firewalls – All of the Great Lakes servers have firewalls that restrict access to only what is needed.
- Unique users – Great Lakes adheres to the University guideline of one person per login ID and one login ID per person.
- Multi-factor authentication (MFA) – For all interactive sessions, Great Lakes requires both a UM Kerberos password and Duo authentication. File transfer sessions require a Kerberos password.
- Private subnets – Other than the login and file transfer computers that are part of Great Lakes, all of the computers are on a network that is private within the University network and are unreachable from the Internet.
- Flexible data storage – Researchers can control the security of their own data storage by securing their storage as they require and having it mounted via NFSv3 or NFSv4 on Great Lakes. Another option is to make use of Great Lakes’ local scratch storage, which is considered secure for many types of data. Note: Great Lakes is not considered secure for data covered by HIPAA.