Slurm partitions represent collections of nodes, and are equivalent to Torque queues. Each PI’s standard compute nodes are identified by the PI’s uniqname and have a maximum job walltime of 14 days (can be increased up to 4 weeks at the PI’s request). During the transition from FOE to Lighthouse, each partition will include two test nodes prior to the migration of the PI’s nodes into Lighthouse for testing.
Slurm associations are a combination of cluster, account, user names and optionally a partition. An association can have limits (e.g. account ‘testaccount’ using partition ‘msbritt’ on cluster ‘lighthouse’ has a running job limit of X). TRES (Trackable Resources) are resources which can be tracked for usage or used to enforce limits. Common examples include CPU, memory, and GRES for GPUs.
Limits can be set on the user association as well as the account association. This allows a PI to limit individual users or the collective set of users in an account as the PI sees fit. Please contact ARC-TS if you would like to implement any of these limits.
Terms of Usage and User Responsibility
- Data is not backed up. None of the data on Lighthouse is backed up. The data that you keep in your home directory, /tmp or any other filesystem is exposed to immediate and permanent loss at all times. You are responsible for mitigating your own risk. ARC-TS provides more durable storage on Turbo, Locker, and Data Den. For more information on these, look here.
- Your usage is tracked and may be used for reports. We track a lot of job data and store it for a long time. We use this data to generate usage reports and look at patterns and trends. We may report this data, including your individual data, to your adviser, department head, dean, or other administrator or supervisor.
- Maintaining the overall stability of the system is paramount to us. While we make every effort to ensure that every job completes with the most efficient and accurate way possible, the stability of the cluster is our primary concern. This may affect you, but mostly we hope it benefits you. System availability is based on our best efforts. We are staffed to provide support during normal business hours. We try very hard to provide support as broadly as possible, but cannot guarantee support on a 24 hours a day basis. Additionally, we perform system maintenance on a periodic basis, driven by the availability of software updates, staffing availability, and input from the user community. We do our best to schedule around your needs, but there will be times when the system is unavailable. For scheduled outages, we will announce them at least one month in advance on the ARC-TS home page; for unscheduled outages we will announce them as quickly as we can with as much detail as we have on that same page. You can also track ARC-TS on Twitter (@ARC-TS ).
- Lighthouse is intended only for non-commercial, academic research and instruction. Commercial use of some of the software on Lighthouse is prohibited by software licensing terms. Prohibited uses include product development or validation, any service for which a fee is charged, and, in some cases, research involving proprietary data that will not be made available publicly. Please contact email@example.com if you have any questions about this policy, or about whether your work may violate these terms.
- You are responsible for the security of sensitive codes and data. If you will be storing export-controlled or other sensitive or secure software, libraries, or data on the cluster, it is your responsibility that is is secured to the standards set by the most restrictive governing rules. We cannot reasonably monitor everything that is installed on the cluster, and cannot be responsible for it, leaving the responsibility with you, the end user.
- Data subject to HIPAA regulations may not be stored or processed on the cluster.
- For more information on HIPAA, see the ITS Guide
- For questions about Protected Health Information (PHI), contact Michigan Medicine Corporate Compliance at compliance-Group@med.umich.edu.
Users must manage data appropriately in their various locations:
- /scratch (more information below)
- customer-provided NFS
SCRATCH STORAGE POLICIES
Scratch directories are slightly different than as implemented on Flux. The goal is to simplify data use while still giving flexibility for data sharing for researchers with multiple groups working for them that should not be able to share/see data.
An initial /scratch directory is created for each Slurm account, with its name and group-ownership based on the Slurm account/UNIX group name (e.g. /scratch/msbritt with UNIX group-owner msbritt). Additional Slurm accounts (e.g. msbritt1) have the option in the order form to create a new /scratch directory if needed (/scratch/msbritt1). This new /scratch/msbritt1 will be owned by the initial UNIX group (msbritt), unless the PI requests a new UNIX group (msbritt1). ARC-TS generates /scratch/msbritt/<uniqname>/ directories for every member of the UNIX group “msbritt” with the group ownership set accordingly (e.g. “msbritt”).
Users may use /scratch with no set quota and with an auto-purge policy on unaccessed files over 60 days. Scratch file systems are not backed up. Critical files should be backed up to another location.
SECURITY ON LIGHTHOUSE/ USE OF SENSITIVE DATA
Applications and data are protected by secure physical facilities and infrastructure as well as a variety of network and security monitoring systems. These systems provide basic but important security measures including:
- Secure access – All access to Lighthouse is via SSH or Globus. SSH has a long history of high-security.
- Built-in firewalls – All of the Lighthouse computers have firewalls that restrict access to only what is needed.
- Unique users – Lighthouse adheres to the University guideline of one person per login ID and one login ID per person.
- Multi-factor authentication (MFA) – For all interactive sessions, Lighthouse requires both a UM Kerberos password and Duo authentication. File transfer sessions require a Kerberos password.
- Private Subnets – Other than the login and file transfer computers that are part of Lighthouse, all of the computers are on a network that is private within the University network and are unreachable from the Internet.
- Flexible data storage – Researchers can control the security of their own data storage by securing their storage as they require and having it mounted via NFSv3 or NFSv4 on Lighthouse. Another option is to make use of Lighthouse’s local scratch storage, which is considered secure for many types of data. Note: Lighthouse is not considered secure for data covered by HIPAA.