Winter 2018 Maintenance

By |

To accommodate updates to software, hardware, and operating systems, Flux, Armis, ConFlux, Flux Hadoop, and their storage systems (/home and /scratch) will be unavailable starting at 7 a.m. Tuesday, January 2nd and returning to service on Friday, January 5th.  These updates will improve the performance and stability of ARC-TS services.  We try to encapsulate the required changes into two maintenance periods per year and work to complete these tasks quickly, as we understand the impact of the maintenance on your research.

For up-to-date information on the maintenance, visit http://arc-ts.umich.edu/winter-2018-maintenance/

Service Disruption: MiStorage Silver (Value Storage) Maintenance

By |

The ITS Storage team will be applying an operating system patch on the MiStorage Silver (Value Storage) environment, which provides home directories for both Flux and Flux Hadoop.  The ITS maintenance window will be from December 2nd 11:00pm to December 3rd 7:00am (8 hours total) and ARC-TS will prepare Flux, Armis, and Flux Hadoop for the maintenance starting at 10:00pm.  This update might be potentially disruptive to the stability of the nodes and jobs running on them.

The ITS status page for this incident is here:  http://status.its.umich.edu/report.php?id=141155

For Flux and Armis users: we have created a reservation on Flux so no jobs will be running or impacted.  We will remove the reservation after we receive the update from the ITS storage team of a successful update.

For Flux Hadoop users:  The scheduler and user logins will be deactivated when the outage starts, and any user currently logged into the cluster will be logged out for the duration of the outage.  We will reactivate access when we have received the all-clear from the ITS storage team of a successful update.

Status updates will be posted on the ARC-TS Twitter feed: https://twitter.com/arcts_um  and if you have any questions, please email us at hpc-support@umich.edu.

Potential service disruption for Value Storage maintenance — Dec. 2

By | Flux, General Interest, Happenings, HPC, News

The ITS Storage team will be applying an operating system patch on the MiStorage Silver environment, which provides home directories for both Flux and Flux Hadoop.  The ITS maintenance window will be from December 2nd 11:00pm to December 3rd 7:00am (8 hours total).  This update might be potentially disruptive to the stability of the nodes and jobs running on them.

The ITS status page for this incident is here:  http://status.its.umich.edu/report.php?id=141155

For Flux users: we have created a reservation on Flux so no jobs will be running or impacted.  We will remove the reservation after we receive the update from the ITS storage team of a successful update.

For Flux Hadoop users:  The scheduler and user logins will be deactivated when the outage starts, and any user currently logged into the cluster will be logged out for the duration of the outage.  We will reactivate access when we have received the all-clear from the ITS storage team of a successful update.

Status updates will be posted on the ARC-TS Twitter feed: https://twitter.com/arcts_um  and if you have any questions, please email us at hpc-support@umich.edu.

Summer HPC maintenance

By |

To accommodate equipment repairs, and upgrades to software, hardware, and operating systems, Flux, Armis, ConFlux, Flux Hadoop, and their storage systems (/home and /scratch) will be unavailable starting at 6 a.m. Saturday, July 29, returning to service on Wednesday, August 2.  

During this time, the following updates are planned:

  • Annual power maintenance at the Modular Data Center.  All systems will be powered off. (Flux/Armis/Flux Hadoop)
  • Campus network hardware and software updates (Flux/Armis/Flux Hadoop)
  • InfiniBand networking updates (firmware and software) (Flux/Armis/ConFlux)
  • Operating system and software updates (All clusters).
  • Resource manager and job scheduling software updates (Flux/Armis).
  • Migrate NFS volumes, including /home, from Value Storage to Turbo (Flux)
  • Update hardware and software of the Lustre file systems that provide /scratch (Flux)

For Flux HPC jobs, you can use the command “maxwalltime” to discover the amount of time remaining until the beginning of the maintenance. Jobs requesting more walltime than remains before the maintenance will be queued and started after the maintenance is completed.

All Flux, Armis, ConFlux, and Flux Hadoop filesystems will be unavailable during the maintenance. We encourage you to copy any data that might be needed during that time from Flux prior to the start of the maintenance.

We will post status updates on our Twitter feed ( https://twitter.com/arcts_um ) throughout the course of the maintenance and send an email to all HPC and Hadoop users when the maintenance has been completed.

HPC Maintenance

By | | No Comments

To accommodate upgrades to software and operating systems, Flux, Armis, and their storage systems (/home and /scratch) will be unavailable starting at 9am Saturday, January 7th, returning to service on Monday, January 9th.  Additionally, external Turbo mounts will be unavailable 11pm Saturday, January 7th, until 7am Sunday, January 8th.

During this time, the following updates are planned:

  • Operating system and software updates (minor updates) on Flux and Armis.  This should not require any changes to user software or processes.
  • Resource manager and job scheduling software updates.
  • Operating system updates on Turbo.

For HPC jobs, you can use the command “maxwalltime” to discover the amount of time before the beginning of the maintenance. Jobs that cannot complete prior to the beginning of the maintenance will be able to start when the clusters are returned to service.

We will post status updates on our Twitter feed ( https://twitter.com/arcts_um ) and send an email to all HPC users when the outage has been completed.

HPC maintenance scheduled for January 7 – 9

By | Flux, General Interest, News

To accommodate upgrades to software and operating systems, Flux, Armis, and their storage systems (/home and /scratch) will be unavailable starting at 9am Saturday, January 7th, returning to service on Monday, January 9th.  Additionally, external Turbo mounts will be unavailable 11pm Saturday, January 7th, until 7am Sunday, January 8th.

During this time, the following updates are planned:

  • Operating system and software updates (minor updates) on Flux and Armis.  This should not require any changes to user software or processes.
  • Resource manager and job scheduling software updates.
  • Operating system updates on Turbo.

For HPC jobs, you can use the command “maxwalltime” to discover the amount of time before the beginning of the maintenance. Jobs that cannot complete prior to the beginning of the maintenance will be able to start when the clusters are returned to service.

We will post status updates on our Twitter feed ( https://twitter.com/arcts_um ) and send an email to all HPC users when the outage has been completed.

ARC-TS HPC Maintenance

By |

Flux, Armis, Flux Hadoop, and their storage systems (/home, /scratch, and HDFS on Flux Hadoop) will be unavailable starting at Saturday, July 16 at 2 p.m., with a return to service targeted for mid-day July 22nd. During this time, ARC-TS will update several key systems. Among other improvements, the updates will provide access to more current versions of popular software and libraries, allow new features and more consistent runtimes for job scheduling, and migrate two-factor authentication for the login servers to a new system.

NOTE: With the University migrating to Duo from RSA for multifactor authentication in July, ARC-TS will switch to Duo for access to our login nodes during this maintenance period. (Units will be leading the switch to Duo with their faculty, staff and students who currently use MTokens. Questions about this change should be directed to IT or administrative leaders in units. More information can be found here:  http://www.itcs.umich.edu/identity/2factor/)

The updates will consist of:

  • OS and supporting software updates for the cluster. This will be a major update to the currently installed RedHat version (RHEL 6.6) moving to CentOS 7.1. This will provide newer versions of commonly used software and libraries, as well as helps us deliver more user-facing features in the coming months.
  • Cluster management software will be updated and reconfigured. This will include Torque 6, which has a new set of resource options. The new Torque version will give better language for defining tasks, more consistent runtimes, and a platform for new  features.
  • The Flux Hadoop environment will be updated to Cloudera 5.7, which now includes Hive-On-Spark.
  • /scratch on Flux will be updated and serviced.
  • The modules environment will transition from the current Environment Modules to a system called Lmod. The Lmod User Guide can be found here: https://www.tacc.utexas.edu/research-development/tacc-projects/lmod/user-guide.
    Many commands are the same, and we will document any significant differences.
  • The paths in which many software packages are installed will also change; e.g., folders like /home/software and /usr/cac will have new locations. This will also be documented.
  • Many default software versions will be changed and some older software packages and/or versions will be retired. In particular, OpenMPI and the compilers will all get updated to new versions.

Status updates will be posted on the ARC-TS Twitter feed  https://twitter.com/arcts_um.