Explore ARCExplore ARC

Winter HPC maintenance completed

By | Beta, Flux, General Interest, Happenings, HPC, News

Flux, Beta, Armis, Cavium, and ConFlux, and their storage systems (/home and /scratch) are back online after three days of maintenance.  The updates that have been completed will improve the performance and stability of ARC-TS services. 

The following maintenance tasks were done:

  • Preventative maintenance at the Modular Data Center (MDC) which requires a full power outage
  • InfiniBand networking updates (firmware and software)
  • Ethernet networking updates (datacenter distribution layer switches)
  • Operating system and software updates
  • Migration of Turbo networking to new switches (affects /home and /sw)
  • Perform consistency checks on the Lustre file systems that provide /scratch
  • Update firmware and software of the GPFS file systems (ConFlux, starting 9 a.m., Monday, Jan. 7)
  • Perform consistency checks on the GPFS file systems that provide /gpfs (ConFlux, starting 9 a.m., Monday, Jan. 7) 

Please contact hpc-support@umich.edu if you have any questions.

Winter 2019 HPC Maintenance

By |

To accommodate updates to software, hardware, and operating systems, Flux, Beta, Armis, Cavium, and ConFlux, and their storage systems (/home and /scratch) will be unavailable starting at 6 a.m. Sunday, January 6th and returning to service on Wednesday, January 9th.  These updates will improve the performance and stability of ARC-TS services. We try to encapsulate the required changes into two maintenance periods per year and work to complete these tasks quickly, as we understand the impact of the maintenance on your research.

During this time, the following maintenance tasks are planned:

  • Preventative maintenance at the Modular Data Center (MDC) which requires a full power outage
  • InfiniBand networking updates (firmware and software)
  • Ethernet networking updates (datacenter distribution layer switches)
  • Operating system and software updates
  • Potential updates to job scheduling software
  • Migration of Turbo networking to new switches (affects /home and /sw)
  • Perform consistency checks on the Lustre file systems that provide /scratch
  • Update firmware and software of the GPFS file systems (ConFlux, starting 9 a.m., Monday, Jan. 7)
  • Perform consistency checks on the GPFS file systems that provide /gpfs (ConFlux, starting 9 a.m., Monday, Jan. 7) 

You can use the command “maxwalltime” to discover the amount of time remaining until the beginning of the maintenance. Jobs requesting more walltime than remains before the maintenance will be queued and started after the maintenance is completed.

All filesystems will be unavailable during the maintenance. We encourage you to copy any data that might be needed during that time from Flux prior to the start of the maintenance.

We will post status updates on our Twitter feed ( https://twitter.com/arcts_um ) throughout the course of the maintenance and send an email to all users when the maintenance has been completed.  Please contact hpc-support@umich.edu if you have any questions.

Winter HPC maintenance scheduled for Jan. 6-9

By | Beta, Flux, General Interest, Happenings, HPC, News

To accommodate updates to software, hardware, and operating systems, Flux, Beta, Armis, Cavium, and ConFlux, and their storage systems (/home and /scratch) will be unavailable starting at 6 a.m. Sunday, January 6th and returning to service on Wednesday, January 9th.  These updates will improve the performance and stability of ARC-TS services. We try to encapsulate the required changes into two maintenance periods per year and work to complete these tasks quickly, as we understand the impact of the maintenance on your research.

During this time, the following maintenance tasks are planned:

  • Preventative maintenance at the Modular Data Center (MDC) which requires a full power outage
  • InfiniBand networking updates (firmware and software)
  • Ethernet networking updates (datacenter distribution layer switches)
  • Operating system and software updates
  • Potential updates to job scheduling software
  • Migration of Turbo networking to new switches (affects /home and /sw)
  • Perform consistency checks on the Lustre file systems that provide /scratch
  • Update firmware and software of the GPFS file systems (ConFlux, starting 9 a.m., Monday, Jan. 7)
  • Perform consistency checks on the GPFS file systems that provide /gpfs (ConFlux, starting 9 a.m., Monday, Jan. 7) 

You can use the command “maxwalltime” to discover the amount of time remaining until the beginning of the maintenance. Jobs requesting more walltime than remains before the maintenance will be queued and started after the maintenance is completed.

All filesystems will be unavailable during the maintenance. We encourage you to copy any data that might be needed during that time from Flux prior to the start of the maintenance.

We will post status updates on our Twitter feed ( https://twitter.com/arcts_um ) throughout the course of the maintenance and send an email to all users when the maintenance has been completed.  Please contact hpc-support@umich.edu if you have any questions.

Cluster and storage maintenance set for Aug. 5-9

By | Flux, General Interest, Happenings, HPC, News

To accommodate updates to software, hardware, and operating systems, Flux, Armis, ConFlux, Flux Hadoop, and their storage systems (/home and /scratch) will be unavailable starting at 9 a.m. Sunday, August 5th and returning to service on Thursday, August 9th. These updates will improve the performance and stability of ARC-TS services.  We try to encapsulate the required changes into two maintenance periods per year and work to complete these tasks quickly, as we understand the impact of the maintenance on your research.

During this time, the following maintenance tasks are planned:

  • Operating system, compiler, and software updates (All clusters).
  • InfiniBand networking updates (firmware and software) (Flux/Armis/ConFlux)
  • Resource manager and job scheduling software updates (All clusters).
  • Lmod default software version changes (Flux/Armis/ConFlux)
  • Upgrade HPC systems to CUDA 9.X (Flux/Armis/ConFlux)
  • Update software of the Lustre file systems that provide /scratch (Flux)
  • Update Elastic Storage Server (ConFlux)
  • Enable 32-bit file IDs on home and software volumes (Flux/Armis)
  • Network switch maintenance (Turbo)

For Flux and Armis HPC jobs, you can use the command “maxwalltime” to discover the amount of time remaining until the beginning of the maintenance. Jobs requesting more walltime than remains before the maintenance will be queued and started after the maintenance is completed.

All Flux, Armis, ConFlux, and Flux Hadoop filesystems will be unavailable during the maintenance. We encourage you to copy any data that might be needed during that time from Flux prior to the start of the maintenance.

Turbo storage will be unavailable starting at 6 a.m Monday, August 6th and will return to service at 10 a.m.

We will post status updates on our Twitter feed ( https://twitter.com/arcts_um ) throughout the course of the maintenance and send an email to all HPC and Hadoop users when the maintenance has been completed.  Updates will also be compiled at http://arc-ts.umich.edu/summer-2018-maintenance/. Please contact hpc-support@umich.edu if you have any questions.

 

Winter 2018 Maintenance

By |

To accommodate updates to software, hardware, and operating systems, Flux, Armis, ConFlux, Flux Hadoop, and their storage systems (/home and /scratch) will be unavailable starting at 7 a.m. Tuesday, January 2nd and returning to service on Friday, January 5th.  These updates will improve the performance and stability of ARC-TS services.  We try to encapsulate the required changes into two maintenance periods per year and work to complete these tasks quickly, as we understand the impact of the maintenance on your research.

For up-to-date information on the maintenance, visit http://arc-ts.umich.edu/winter-2018-maintenance/

Service Disruption: MiStorage Silver (Value Storage) Maintenance

By |

The ITS Storage team will be applying an operating system patch on the MiStorage Silver (Value Storage) environment, which provides home directories for both Flux and Flux Hadoop.  The ITS maintenance window will be from December 2nd 11:00pm to December 3rd 7:00am (8 hours total) and ARC-TS will prepare Flux, Armis, and Flux Hadoop for the maintenance starting at 10:00pm.  This update might be potentially disruptive to the stability of the nodes and jobs running on them.

The ITS status page for this incident is here:  http://status.its.umich.edu/report.php?id=141155

For Flux and Armis users: we have created a reservation on Flux so no jobs will be running or impacted.  We will remove the reservation after we receive the update from the ITS storage team of a successful update.

For Flux Hadoop users:  The scheduler and user logins will be deactivated when the outage starts, and any user currently logged into the cluster will be logged out for the duration of the outage.  We will reactivate access when we have received the all-clear from the ITS storage team of a successful update.

Status updates will be posted on the ARC-TS Twitter feed: https://twitter.com/arcts_um  and if you have any questions, please email us at hpc-support@umich.edu.

Potential service disruption for Value Storage maintenance — Dec. 2

By | Flux, General Interest, Happenings, HPC, News

The ITS Storage team will be applying an operating system patch on the MiStorage Silver environment, which provides home directories for both Flux and Flux Hadoop.  The ITS maintenance window will be from December 2nd 11:00pm to December 3rd 7:00am (8 hours total).  This update might be potentially disruptive to the stability of the nodes and jobs running on them.

The ITS status page for this incident is here:  http://status.its.umich.edu/report.php?id=141155

For Flux users: we have created a reservation on Flux so no jobs will be running or impacted.  We will remove the reservation after we receive the update from the ITS storage team of a successful update.

For Flux Hadoop users:  The scheduler and user logins will be deactivated when the outage starts, and any user currently logged into the cluster will be logged out for the duration of the outage.  We will reactivate access when we have received the all-clear from the ITS storage team of a successful update.

Status updates will be posted on the ARC-TS Twitter feed: https://twitter.com/arcts_um  and if you have any questions, please email us at hpc-support@umich.edu.

Summer HPC maintenance

By |

To accommodate equipment repairs, and upgrades to software, hardware, and operating systems, Flux, Armis, ConFlux, Flux Hadoop, and their storage systems (/home and /scratch) will be unavailable starting at 6 a.m. Saturday, July 29, returning to service on Wednesday, August 2.  

During this time, the following updates are planned:

  • Annual power maintenance at the Modular Data Center.  All systems will be powered off. (Flux/Armis/Flux Hadoop)
  • Campus network hardware and software updates (Flux/Armis/Flux Hadoop)
  • InfiniBand networking updates (firmware and software) (Flux/Armis/ConFlux)
  • Operating system and software updates (All clusters).
  • Resource manager and job scheduling software updates (Flux/Armis).
  • Migrate NFS volumes, including /home, from Value Storage to Turbo (Flux)
  • Update hardware and software of the Lustre file systems that provide /scratch (Flux)

For Flux HPC jobs, you can use the command “maxwalltime” to discover the amount of time remaining until the beginning of the maintenance. Jobs requesting more walltime than remains before the maintenance will be queued and started after the maintenance is completed.

All Flux, Armis, ConFlux, and Flux Hadoop filesystems will be unavailable during the maintenance. We encourage you to copy any data that might be needed during that time from Flux prior to the start of the maintenance.

We will post status updates on our Twitter feed ( https://twitter.com/arcts_um ) throughout the course of the maintenance and send an email to all HPC and Hadoop users when the maintenance has been completed.

HPC Maintenance

By | | No Comments

To accommodate upgrades to software and operating systems, Flux, Armis, and their storage systems (/home and /scratch) will be unavailable starting at 9am Saturday, January 7th, returning to service on Monday, January 9th.  Additionally, external Turbo mounts will be unavailable 11pm Saturday, January 7th, until 7am Sunday, January 8th.

During this time, the following updates are planned:

  • Operating system and software updates (minor updates) on Flux and Armis.  This should not require any changes to user software or processes.
  • Resource manager and job scheduling software updates.
  • Operating system updates on Turbo.

For HPC jobs, you can use the command “maxwalltime” to discover the amount of time before the beginning of the maintenance. Jobs that cannot complete prior to the beginning of the maintenance will be able to start when the clusters are returned to service.

We will post status updates on our Twitter feed ( https://twitter.com/arcts_um ) and send an email to all HPC users when the outage has been completed.

HPC maintenance scheduled for January 7 – 9

By | Flux, General Interest, News

To accommodate upgrades to software and operating systems, Flux, Armis, and their storage systems (/home and /scratch) will be unavailable starting at 9am Saturday, January 7th, returning to service on Monday, January 9th.  Additionally, external Turbo mounts will be unavailable 11pm Saturday, January 7th, until 7am Sunday, January 8th.

During this time, the following updates are planned:

  • Operating system and software updates (minor updates) on Flux and Armis.  This should not require any changes to user software or processes.
  • Resource manager and job scheduling software updates.
  • Operating system updates on Turbo.

For HPC jobs, you can use the command “maxwalltime” to discover the amount of time before the beginning of the maintenance. Jobs that cannot complete prior to the beginning of the maintenance will be able to start when the clusters are returned to service.

We will post status updates on our Twitter feed ( https://twitter.com/arcts_um ) and send an email to all HPC users when the outage has been completed.