Great Lakes Update: August 2019

By | General Interest, Great Lakes, Happenings, HPC, News

Great Lakes cluster is available for general access

What is the current status of the Great Lakes cluster

Now that we have completed Early User testing, the Great Lakes cluster is available for general access to the University community. Until the migration from Flux is complete on November 25, 2019, there will be no charge for using the Great Lakes cluster.

Noteworthy Features

  • The Great Lakes cluster compute nodes use the new Intel Skylake processor. In particular, the Skylake CPUs on the standard and large memory compute nodes will provide researchers more consistent performance, regardless of how many other jobs are on the machine. 
  • The Great Lakes cluster has 20 GPU nodes, each of which contains two NVidia V100 GPUs which are significantly faster than the K20 and K40 GPUs on Flux.
  • The HDR100 InfiniBand network will provide consistent 100Gb/s performance across all nodes. On Flux, this ranged from 40-100Gb/s, depending on the node your job used.
  • The high performance GPFS /scratch system, with a capacity of approximately two petabytes, is significantly faster than /scratch on Flux. 
  • The Torque-based batch job submission environment has been replaced with the Slurm resource manager. We expect this system to be significantly more responsive and quicker at starting jobs than was the case on Flux.
  • For web-based job submission, the Open OnDemand system will replace the ARC Connect environment for providing web based file access, job submission, remote desktop, graphical Matlab, Jupyter Notebooks, and more. For more information, see the web-based access section in our user guide. 

How do I get access?

Every Flux user has a login on the Great Lakes cluster; you should be able to log in via ssh to greatlakes.arc-ts.umich.edu. We have created Slurm accounts for each PI or project based on the current Flux accounts. You can see what Slurm accounts you have access to by running the command `my_accounts.`  

Additionally, you can access the Great Lakes cluster via the web through our Open OnDemand portal. Here you can submit jobs, see submitted jobs, create Jupyter Notebooks and more. Please see the Great Lakes Cluster User Guide for more information.

Where do I read more about the Great Lakes cluster and how to use it?

The current documentation for the Great Lakes cluster, including configuration, user guides, and known issues can be found at https://arc-ts.umich.edu/greatlakes.

There is a schedule for upcoming training sessions on the CSCAR website, and we will communicate new sessions through Twitter and email.

Software

Almost all of the software packages available on Flux have been recompiled on the Great Lakes cluster for improved performance anticipated from the Intel SkyLake architecture. In most cases, the latest software version available is being provided. If you need older versions or need additional packages, let us know via email at hpc-support@umich.edu

We have also reorganized the software module structure to make it easier to find packages you want to load as well as automatically loading prerequisites. To search for packages, use the “module spider” command along with the name of the package or keywords. In many cases we combined similar packages into “Collections” such as Chemistry and BioInformatics. The command “module load Chemistry” will make any Chemistry package available to you and packages in the Chemistry collection will then be discoverable via the “module available” command. After loading a specific collection, you must then load any individual packages within that collection that you would like to use.

What are the rates? 

We are working with ITS and UM Finance for approved service rates. Current plans are to have proposed1 rates identified by end of August. As soon as this information is more concrete, we will provide an update on the Great Lakes cluster website and in our email communication. We understand that this information is necessary for planning purposes and apologize for any impacts this has had on your budget planning. 

What can be shared at this time is the new approach to billing that will be used for the Great Lakes cluster. Unlike Flux, there are no monthly allocations with fixed fees regardless of whether they are used or not. On the Great Lakes cluster, the monthly charge for an account will be calculated based on the resources used by jobs each month. The cost calculation for each job will be based on the amount and type of resources the job reserves and how long the job runs. This should be a significantly more flexible system and won’t require updating allocations as your computing needs change over time.

1 Rates are not considered final until they have been formally approved by OFA.

Flux to the Great Lakes cluster transition efforts

If you have not already, you should be developing a plan to migrate your work from Flux to the Great Lakes cluster.  If you need help in developing a plan, please contact us and we can provide assistance during this migration period. 

  • ARC-TS and academic unit support teams will be offering training sessions around campus. We will have a training sessions schedule on the ARC-TS website. We also communicate new sessions through Twitter and email.
  • To assist your transition, if you have any Turbo or MiStorage NFS mounts on Flux, those mounts will also be available on the Great Lakes cluster.  If you would prefer to not have those volumes mounted on the Great Lakes cluster, email us at hpc-support@umich.edu.

Ensure that your migration from Flux to the Great Lakes cluster is completed by November 25, 2019. No jobs on Flux will run after November 25, 2019.

Additional Information

We will be adding new capabilities in the coming weeks and months and will continue to communicate these capabilities by email as they become available. If you have any questions, email us at hpc-support@umich.edu.

Great Lakes Update: March 2019

By | Flux, General Interest, Great Lakes, Happenings, HPC, News

ARC-TS previously shared much of this information through the December 2018 ARC Newsletter and on the ARC-TS website. We have added some additional details surrounding the timeline for Great Lakes as well as for users who would like to participate in Early User testing.

What is Great Lakes?

The Great Lakes service is a next generation HPC platform for University of Michigan researchers, which will provide several performance advantages compared to Flux. Great Lakes is built around the latest Intel CPU architecture called Skylake and will have standard, large memory, visualization, and GPU-accelerated nodes.  For more information on the technical aspects of Great Lakes, please see the Great Lakes configuration page.

Key Features:

  • Approximately 13,000 Intel Skylake Gold processors providing AVX512 capability providing over 1.5 TFlop of performance per node
  • 2 PB scratch storage system providing approximately 80 GB/s performance (compared to 8 GB/s on Flux)
  • New InfiniBand network with improved architecture and 100 Gb/s to each node
  • Each compute node will have significantly faster I/O via SSD-accelerated storage
  • Large Memory Nodes with 1.5 TB memory per node
  • GPU Nodes with NVidia Volta V100 GPUs (2 GPUs per node)
  • Visualization Nodes with Tesla P40 GPUs

Great Lakes will be using Slurm as the resource manager and scheduler, which will replace Torque and Moab on Flux. This will be the most immediate difference between the two clusters and will require some work on your part to transition from Flux to Great Lakes.

Another significant change is that we are making Great Lakes easier to use through a simplified accounting structure.  Unlike Flux where you need an account for each resource, on Great Lakes you can use the same account and simply request the resources you need, from GPUs to large memory.

There will be two primary ways to get access to compute time: 1) the on-demand model, which adds up the account’s job charges (reserved resources multiplied by the time used) and is billed monthly, similar to Flux On-Demand and 2) node purchases.  In the node purchase model, you will own the hardware which will reside in Great Lakes through the life of the cluster. You will receive an equivalent credit which you can use anywhere on the cluster, including on GPU and large memory nodes. We believe this will be preferable to buying actual hardware in the FOE model, as your daily computational usage can increase and decrease as your research requires. Send us an email at hpc-support@umich.edu if you have any questions or are interested in purchasing hardware on Great Lakes.

When will Great Lakes be available?

The ARC-TS team will prepare the cluster in April 2019 for an Early User period beginning in May, which will continue for approximately 4 weeks to ensure sufficient time to address any issues. General availability of Great Lakes should occur in June 2019.  We have a timeline for the Great Lakes project which will have more detail.

How does this impact me? Why Great Lakes?

After being the primary HPC cluster for the University for 8 years, Flux will be retired in September 2019.  Once Great Lakes becomes available to the University community, we will provide a few months to transition from Flux to Great Lakes.  Flux will be retired after that period due to aging hardware as well as expiring service contracts and licenses. We highly recommend preparing to migrate as early as possible so your research will not be interrupted.  Later in this email, we have suggestions for what you can do to make this migration process as easy as possible.

When Great Lakes becomes generally available to the University community, we will no longer be accepting new Flux accounts or allocations.  All new work should be focused on Great Lakes.

You can see the HPC timeline, including Great Lakes, Beta and Flux, here.

What is the current status of Great Lakes?

Today, the Great Lakes HPC compute hardware and high-performance Storage System has been fully installed and configured. In parallel with this work, the ARC-TS and Unit Support team members have been readying the new service with new software, modules as well as developing training to support the transition onto Great Lakes. A key feature of the new Great Lakes service is the just released HDR InfiniBand from Mellanox. Today, the hardware is installed but the firmware is still in its final stages of testing with the supplier with a target delivery of of mid-April 2019. Given the delays, ARC-TS and the suppliers have discussed an adjusted plan that allows quicker access to the cluster while supporting the future update once the firmware becomes available.

We are working with ITS Finance to define rates for Great Lakes.  We will update the Great Lakes documentation when we have final rates and let everyone know in subsequent communications.

What should I do to transition to Great Lakes?

We hope the transition from Flux to Great Lakes will be relatively straightforward, but to minimize disruptions to your research, we recommend you do your testing early.  In October 2018, we announced availability of the HPC cluster Beta in order to help users with this migration. Primarily, it allows users to migrate their PBS/Torque job submission scripts to Slurm.  You can and should also see the new Modules environments, as they have changed from their current configuration on Flux. Beta is using the same generation of hardware as Flux, so your performance will be similar to that on Flux. You should continue to use Flux for your production work; Beta is only to help test your Slurm job scripts and not for any production work.

Every user on Flux has an account on Beta.  You can login into Beta at beta.arc-ts.umich.edu.  You will have a new home directory on Beta, so you will need to migrate any scripts and data files you need to test your workloads into this new directory.  Beta should not be used for any PHI, HIPAA, Export Controlled, or any sensitive data!  We highly recommend that you use this time to convert your Torque scripts to Slurm and test that everything works as you would expect it to.  

To learn how to use Slurm, we have provided documentation on our Beta website.  Additionally, ARC-TS and academic unit support teams will be offering training sessions around campus. We will have a schedule on the ARC-TS website as well as communicate new sessions through Twitter and email.

If you have compiled software for use on Flux, we highly recommend that you recompile on Great Lakes once it becomes available.  Great Lakes is using the latest CPUs from Intel and by recompiling, your code may get performance gains by taking advantage of new capabilities on the new CPUs.

Questions? Need Assistance?

Contact hpc-support@umich.edu

H.V. Jagadish appointed director of MIDAS

By | General Interest, Happenings, News

H.V. Jagadish has been appointed director of the Michigan Institute for Data Science (MIDAS), effective February 15, 2019.

Jagadish, the Bernard A. Galler Collegiate Professor of Electrical Engineering and Computer Science at the University of Michigan, was one of the initiators of an earlier concept of a data science initiative on campus. With support from all academic units and the Institute for Social Research, the Office of the Provost and Office of the Vice President for Research, MIDAS was established in 2015 as part of the university-wide Data Science Initiative to promote interdisciplinary collaboration in data science and education.

“I have a longstanding passion for data science, and I understand its importance in addressing a variety of important societal issues,” Jagadish said. “As the focal point for data science research at Michigan, I am thrilled to help lead MIDAS into its next stage and further expand our data science efforts across disciplines.”

Jagadish replaces MIDAS co-directors Brian Athey and Alfred Hero, who completed their leadership appointments in December 2018.

“Professor Jagadish is a leader in the field of data science, and over the past two decades, he has exhibited national and international leadership in this area,” said S. Jack Hu, U-M vice president for research. “His leadership will help continue the advancement of data science methodologies and the application of data science in research in all disciplines.”

MIDAS has built a cohort of 26 active core faculty members and more than 200 affiliated faculty members who span all three U-M campuses. Institute funding has catalyzed several multidisciplinary research projects in health, transportation, learning analytics, social sciences and the arts, many of which have generated significant external funding. MIDAS also plays a key role in establishing new educational opportunities, such as the graduate certificate in data science, and provides additional support for student groups, including one team that used data science to help address the Flint water crisis.

As director, Jagadish aims to expand the institute’s research focus and strengthen its partnerships with industry.

“The number of academic fields taking advantage of data science techniques and tools has been growing dramatically,” Jagadish said. “Over the next several years, MIDAS will continue to leverage the university’s strengths in data science methodologies to advance research in a wide array of fields, including the humanities and social sciences.”

Jagadish joined U-M in 1999. He previously led the Database Research Department at AT&T Labs.

His research, which focuses on information management, has resulted in more than 200 journal articles and 37 patents. Jagadish is a fellow of the Association for Computing Machinery and the American Association for the Advancement of Science, and he served nine years on the Computing Research Association board.

Winter HPC maintenance completed

By | Beta, Flux, General Interest, Happenings, HPC, News

Flux, Beta, Armis, Cavium, and ConFlux, and their storage systems (/home and /scratch) are back online after three days of maintenance.  The updates that have been completed will improve the performance and stability of ARC-TS services. 

The following maintenance tasks were done:

  • Preventative maintenance at the Modular Data Center (MDC) which requires a full power outage
  • InfiniBand networking updates (firmware and software)
  • Ethernet networking updates (datacenter distribution layer switches)
  • Operating system and software updates
  • Migration of Turbo networking to new switches (affects /home and /sw)
  • Perform consistency checks on the Lustre file systems that provide /scratch
  • Update firmware and software of the GPFS file systems (ConFlux, starting 9 a.m., Monday, Jan. 7)
  • Perform consistency checks on the GPFS file systems that provide /gpfs (ConFlux, starting 9 a.m., Monday, Jan. 7) 

Please contact hpc-support@umich.edu if you have any questions.

University of Michigan joins Ceph Foundation

By | General Interest, Happenings, News

Motivated by Ceph usage in the OSiRIS project, the University of Michigan has joined the Ceph Foundation as an Associate Member. We join other educational, government, and research organizations engaged in the Ceph foundation at this membership level.

From the Foundation website: The Ceph Foundation exists to enable industry members to collaborate and pool resources to support the Ceph project community. The Foundation provides an open, collaborative, and neutral home for project stakeholders to coordinate their development and community investments in the Ceph ecosystem.

Read more…

Great Lakes Update: December 2018

By | Flux, General Interest, Great Lakes, Happenings, News

What is Great Lakes?

The Great Lakes service is a next generation HPC platform for University of Michigan researchers. Great Lakes will provide several performance advantages compared to Flux, primarily in the areas of storage and networking. Great Lakes is built around the latest Intel CPU architecture called Skylake and will have standard, large memory, visualization, and GPU-accelerated nodes.  For more information on the technical aspects of Great Lakes, please see the Great Lakes configuration page.

Key Features:

  • Approximately 13,000 Intel Skylake Gold processors providing AVX512 capability providing over 1.5 TFlop of performance per node
  • 2 PB scratch storage system providing approximately 80 GB/s performance (compared to 8 GB/s on Flux)
  • New InfiniBand network with improved architecture and 100 Gb/s to each node
  • Each compute node will have significantly faster I/O via SSD-accelerated storage
  • Large Memory Nodes with 1.5 TB memory per node
  • GPU Nodes with NVidia Volta V100 GPUs (2 GPUs per node)
  • Visualization Nodes with Tesla P40 GPUs

Great Lakes will be using Slurm as the resource manager and scheduler, which will replace Torque and Moab on Flux. This will be the most immediate difference between the two clusters and will require some work on your part to transition from Flux to Great Lakes.

Another significant change is that we are making Great Lakes easier to use through a simplified accounting structure.  Unlike Flux where you need an account for each resource, on Great Lakes you can use the same account and simply request the resources you need, from GPUs to large memory.

There will be two primary ways to get access to compute time: 1) the pay-as-you-go model similar to Flux On-Demand and 2) node purchases.  Node purchases will give you computational time commensurate to 4 years multiplied by the number of nodes you buy. We believe this will be preferable to buying actual hardware in the FOE model, as your daily computational usage can increase and decrease as your research requires.  Additionally you will not be limited by hardware failures on your specific nodes, as your jobs can run anywhere on Great Lakes. Send us an email at hpc-support@umich.edu if you have any questions or are interested in purchasing hardware on Great Lakes.

When will Great Lakes be available?

The ARC-TS team will prepare the cluster in February/March 2019 for an Early User period which will continue for several weeks to ensure sufficient time to address any issues. General availability of Great Lakes should occur in April.

How does this impact me? Why Great Lakes?

After being the primary HPC cluster for the University for 8 years, Flux will be retired in September 2019.  Once Great Lakes becomes available to the University community, we will provide a few months to transition from Flux to Great Lakes.  Flux will be retired after that period due to aging hardware as well as expiring service contracts and licenses. We highly recommend preparing to migrate as early as possible so your research will not be interrupted.  Later in this email, we have suggestions for what you can do to make this migration process as easy as possible.

When Great Lakes becomes generally available to the University community, we will no longer be accepting new Flux accounts or allocations.  All new work should be focused on Great Lakes.

What is the current status of Great Lakes?

Today, the Great Lakes HPC compute hardware has been fully installed and the high-performance Storage System configuration is in progress. In parallel with this work, the ARC-TS and Unit Support team members have been readying the new service with new software, modules as well as developing training to support the transition onto Great Lakes. A key feature of the new Great Lakes service is the just released HDR InfiniBand from Mellanox. Today, the hardware is available but the firmware is still in its final stages of testing with the supplier with a target delivery date of March (2019). Given the delays, ARC-TS and the suppliers have discussed an adjusted plan that allows quicker access to the cluster while supporting the future update once the firmware becomes available.

What should I do to transition to Great Lakes?

We hope the transition from Flux to Great Lakes will be relatively straightforward, but to minimize disruptions to your research, we recommend you do your testing early.  In October, we announced availability of the HPC cluster Beta in order to help users with this migration. Primarily, it allows users to migrate their PBS/Torque job submission scripts to Slurm.  You can also see the new Modules environments, as they have changed from their current configuration on Flux. Beta is using the same generation of hardware as Flux, so your performance will be similar to that on Flux.  You should continue to use Flux for your production work; Beta is only to help test your Slurm job scripts and not for any production work.

Every user on Flux has an account on Beta.  You can login into Beta at beta.arc-ts.umich.edu.  You will have a new home directory on Beta, so you will need to migrate any scripts and data files you need to test your workloads into this new directory.  Beta should not be used for any PHI, HIPAA, Export Controlled, or any sensitive data!  We highly recommend that you use this time to convert your Torque scripts to Slurm and test that everything works as you would expect it to.  

To learn how to use Slurm, we have provided documentation on our Beta website.  Additionally, ARC-TS and academic unit support teams will be offering training sessions around campus.  We’ll have a schedule on the ARC-TS website as well as communicate new sessions through Twitter and email.

If you have compiled software for use on Flux, we highly recommend that you recompile on Great Lakes once it becomes available.  Great Lakes is using the latest CPUs from Intel and by recompiling, your code may get performance gains by taking advantage of new capabilities on the new CPUs.

Questions? Need Assistance?

Contact hpc-support@umich.edu

Most CSCAR workshops will be free for the U-M community starting in January 2019

By | Educational, General Interest, Happenings, News

Beginning in January of 2019, most of CSCAR’s workshops will be offered free of charge to UM students, faculty, and staff.

CSCAR is able to do this thanks to funding from UM’s Data Science Initiative.  Registration for CSCAR workshops is still required, and seats are limited.

CSCAR requests that participants please cancel their registration if they decide not to attend a workshop for which they have previously registered.

Note that a small number of workshops hosted by CSCAR but taught by non-CSCAR personnel will continue to have a fee, and fees will continue to apply for people who are not UM students, faculty or staff.