Loading Events
  • This event has passed.

ARC-TS Winter 2020 Maintenance

March 9, 2020 @ 6:00 am - March 11, 2020 @ 5:00 pm

Update: 03/17/2020 –  Important changes implemented during maintenance
Maintenance is complete, and the following changes have been made on Great Lakes, Armis2, and Lighthouse:
  • Operating system update to CentOS 7.7
  • Slurm update to 19.05.5
  • Singularity update to 3.5.2
  • All Turbo volumes migrated to new Turbo cluster

Update: 03/12/2020 3:00 PM – Maintenance has been extended into March 12 due to hardware issues
A bad node was discovered on the new Turbo Research Storage service during this Maintenance, which prevented the Great Lakes, Armis2, and Lighthouse high-performance computing (HPC) Clusters from coming online. ARC-TS staff is working hard to resolve the matter and bring everything back online ASAP.

In order to minimize disruption to researchers due to the Turbo storage upgrades, we have rescheduled the previously announced winter maintenance for Great Lakes, Lighthouse, Armis2, ConFlux, and their storage systems (/home and /scratch) to begin at 6 a.m. on March 9, 2020, and return to service on March 11, 2020. The Cavium/ThunderX Cluster maintenance will begin at 6 a.m. on March 9, 2020, and return to service on March 12, 2020.
Updates are being made to software, hardware, and operating systems to improve the performance and stability of ARC-TS services. We try to encapsulate the required changes into two maintenance periods per year and work to complete these tasks quickly, as we understand the impact of the maintenance on your research.
As previously announced, the Flux and Armis login nodes and storage systems (/home and /scratch) will be unavailable starting on February 10, 2010.

Maintenance Details

Great Lakes, Lighthouse, and Armis2 maintenance tasks:

  • OS updates
  • Update to Slurm 19.05 (pending testing)
  • OFED (OpenFabrics) updates for InfiniBand
  • GPFS updates (Great Lakes)
  • Globus hardware update (Lighthouse: this will not affect how anyone will access Lighthouse via Globus)
  • Transition /home directories to new hardware (all clusters)
  • Transition /scratch directories to new hardware (Lighthouse, Armis2) 

ConFlux maintenance tasks:

  • OS updates
  • GPFS/ESS updates
  • CUDA update
  • InfiniBand switch software updates
Cavium Hadoop.
  • Updating and Patching the operating system

Locker and Data Den Maintenance tasks:

  • Updating Locker Server software and storage firmware.
  • Change a setting to fix an NFS issue pertaining to recalls from tape.

Turbo Maintenance tasks:

  • Migration of ARC-TS specific shares from the old Turbo cluster to new Turbo cluster.

Countdown to maintenance 

For Great Lakes, Lighthouse, and Armis2 HPC jobs, you can use the command “maxwalltime” to discover the amount of time remaining until the beginning of the maintenance. Jobs requesting more walltime than remains before the maintenance will be queued and started after the maintenance is completed.

Status updates

We will post status updates on our Twitter feed ( https://twitter.com/arcts_um ) throughout the course of the maintenance and send an email to all HPC users when the maintenance has been completed. Updates will also be posted at http://arc-ts.umich.edu/winter-2020-maintenance/

Contact hpc-support@umich.edu if you have any questions.

Details

Start:
March 9, 2020 @ 6:00 am
End:
March 11, 2020 @ 5:00 pm
Event Categories:
, , , , , ,