To accommodate updates to software, hardware, and operating systems, Flux, Armis, Great Lakes, Lighthouse, ConFlux, Cavium HPC, and their storage systems (/home and /scratch) will be unavailable starting at 6 a.m. Monday, August 12th and returning to service on Wednesday, August 14th. These updates will improve the performance and stability of ARC-TS services. We try to encapsulate the required changes into two maintenance periods per year and work to complete these tasks quickly, as we understand the impact of the maintenance on your research.
Planned infrastructure maintenance tasks:
* Annual preventive maintenance and electrical work at Modular Data Center (Flux, Armis, Lighthouse)
* InfiniBand networking updates (firmware and software) (Flux/Armis/Lighthouse/ConFlux)
* Network switch upgrades to improve throughput between ARC-TS services (Turbo, Flux, Armis, Lighthouse, Great Lakes )
Flux, Armis, and Lighthouse maintenance tasks:
* /scratch storage hardware (Flux only)
* Hardware maintenance on Infiniband
ConFlux maintenance was cancelled.
Cavium HPC updates:
* OS updates
* Slurm version upgraded to 18.08.7
For Flux, Lighthouse, and Armis HPC jobs, you can use the command “maxwalltime” to discover the amount of time remaining until the beginning of the maintenance. Jobs requesting more walltime than remains before the maintenance will be queued and started after the maintenance is completed.
All Flux, Armis, Lighthouse, ConFlux, and Cavium HPC file systems will be unavailable during the maintenance. We encourage you to copy any data that might be needed during that time prior to the start of the maintenance.
We will post status updates on our Twitter feed ( https://twitter.com/arcts_um ) throughout the course of the maintenance and send an email to all HPC users when the maintenance has been completed. Updates will also be compiled at http://arc-ts.umich.edu/summer-2019-maintenance/. Please contact email@example.com if you have any questions.
Maintenance completed on 2019-08-14
Flux, Armis, Great Lakes (for Early Users), and Lighthouse were all returned to production.
Current outstanding issues:
- Beta is unavailable
- There are several nodes which are still down on Flux, including faculty-owned hardware in FOE. We will be working on restoring these ASAP.