To accommodate updates to software, hardware, and operating systems, Flux, Armis, ConFlux, Flux Hadoop, and their storage systems (/home and /scratch) will be unavailable starting at 7 a.m. Tuesday, January 2nd and returning to service on Friday, January 5th. These updates will improve the performance and stability of ARC-TS services. We try to encapsulate the required changes into two maintenance periods per year and work to complete these tasks quickly, as we understand the impact of the maintenance on your research.
During this time, the following maintenance tasks are planned:
- In-rack Uninterruptible Power Supply (UPS) replacements for all racks in the Modular Data Center (MDC) (Flux/Armis/Flux Hadoop)
- Campus network hardware and software updates (Flux/Armis/Flux Hadoop)
- InfiniBand networking updates (firmware and software) (Flux/Armis/ConFlux)
- Operating system, compiler, and software updates (All clusters).
- Resource manager and job scheduling software updates (All clusters).
- Lmod default software version changes (Flux/Armis/ConFlux)
- Hadoop ecosystem updates including migration from Cloudera 5.7 to Hortonworks Data Platform (HDP 2.6), Kerberos, WebHDFS, Apache Spark 2.x, SparkR,, Apache NiFi, Apache Zeppelin notebooks, and support for Rstudio integration (Flux Hadoop)
- Migration of NFS volumes, including /home and software volumes, from MiStorage to Turbo (Flux/Flux Hadoop) for more consistent performance
- Update firmware and software of the Lustre file systems that provide /scratch (Flux)
- Perform consistency checks on the Lustre file systems that provide /scratch (Flux)
- Update Elastic Storage Server to 5.2 (ConFlux)
For Flux HPC jobs, you can use the command “maxwalltime” to discover the amount of time remaining until the beginning of the maintenance. Jobs requesting more walltime than remains before the maintenance will be queued and started after the maintenance is completed.
For Flux Hadoop, persistent data that is stored in the Hadoop Distributed Filesystem (HDFS) will be destroyed during the upgrade. If you have persistent data that must be preserved and need assistance making arrangements to find a storage location for this data, please contact firstname.lastname@example.org as soon as possible. If you have not used Flux Hadoop in the last six months, you will need to request access to use the updated cluster.
The UPS replacements are expected to take up to 15 business days and will not be completed by the end of the maintenance period. We will be replacing the UPSes in the highest priority racks first in order to return those systems to full capacity as quickly as possible. The remaining racks will run at partial capacity until their respective UPSes are replaced.
All Flux, Armis, ConFlux, and Flux Hadoop filesystems will be unavailable during the maintenance. We encourage you to copy any data that might be needed during that time from Flux prior to the start of the maintenance.
We will post status updates on our Twitter feed ( https://twitter.com/arcts_um ) throughout the course of the maintenance and send an email to all HPC and Hadoop users when the maintenance has been completed. Please contact email@example.com if you have any questions.