Using machine learning and the Great Lakes HPC Cluster for COVID-19 research

By | General Interest, Great Lakes, HPC, News, Research, Uncategorized

A researcher in the College of Literature, Science, and the Arts (LSA) is pioneering two separate, ongoing efforts for measuring and forecasting COVID-19: pandemic modeling and a risk tracking site

The projects are led by Sabrina Corsetti, a senior undergraduate student pursuing dual degrees in honors physics and mathematical sciences, and supervised by Thomas Schwarz, Ph.D., associate professor of physics. 

The modeling uses a machine learning algorithm that can forecast future COVID-19 cases and deaths. The weekly predictions are made using the ARC-TS Great Lakes High-Performance Computing Cluster, which provides the speed and dexterity to run the modeling algorithms and data analysis needed for data-informed decisions that affect public health. 

Each week, 51 processes (one for each state and one for the U.S.) are run in parallel (at the same time). “Running all 51 analyses on our own computers would take an extremely long time. The analysis places heavy demands on the hardware running the computations, which makes crashes somewhat likely on a typical laptop. We get all 51 done in the time it would take to do 1,” said Corsetti. “It is our goal to provide accurate data that helps our country.”

The predictions for the U.S. at the national and state levels are fed into the COVID-19 Forecasting Hub, which is led by the UMass-Amherst Influenza Forecasting Center of Excellence based at the Reich Lab. The weekly predictions generated by the hub are then read out by the CDC for their weekly forecast updates Center for Disease Control (CDC) COVID-19 Forecasting Hub

The second project, a risk tracking site, involves COVID-19 data-acquisition from a Johns Hopkins University repository and the Michigan Safe Start Map. This is done on a daily basis, and the process runs quickly. It only takes about five minutes, but the impact is great. The data populates the COVID-19 risk tracking site for the State of Michigan that shows by county the total number of COVID-19 cases, the average number of new cases in the past week, and the risk level.

“Maintaining the risk tracking site requires us to reliably update its data every day. We have been working on implementing these daily updates using Great Lakes so that we can ensure that they happen at the same time each day. These updates consist of data pulls from the Michigan Safe Start Map (for risk assessments) and the Johns Hopkins COVID-19 data repository (for case counts),” remarked Corsetti.

“We are proud to support this type of impactful research during the global pandemic,” said Brock Palen, director of Advanced Research Computing – Technology Services. “Great Lakes provides quicker answers and optimized support for simulation, machine learning, and more. It is designed to meet the demands of the University of Michigan’s most intensive research.”

ARC-TS is a division of Information and Technology Services (ITS). 

Related information 

Bring the power of the HPC clusters to your laptop 

By | Great Lakes, HPC, News

Open OnDemand (OOD) is a tool that brings to researchers and students the power of Great Lakes, the university’s flagship open-science, high-performance, computing cluster. 

Open OnDemand is a way for researchers and students to use a web interface to access the Advanced Research Computing – Technology Services (ARC-TS) Great Lakes and Lighthouse High-Performance Computing resources. Because users do not need to have any technical training, it’s as simple as going to a browser and logging in. Users can start working immediately. 

“It’s your laptop, but 1,000 times bigger,” said Brock Palen, director, ARC-TS. “Open OnDemand offers our customers the speed and capacity of the HPC clusters without investing hours in training.”

The benefits of OOD are many, including providing easy file management, command-line shell access to the HPC clusters, job management and monitoring, and graphical desktop environments and desktop interactive applications such as RStudio, MATLAB, and Jupyter Notebook.

“This system works well for a range of fields from engineering to the physical and social sciences. Open OnDemand has lowered the barrier to access powerful HPC clusters so that students and researchers can do incredibly innovative work,” said Matt Britt, ARC-TS HPC manager. 

Additional resources:

ARC-TS is a division of Information and Technology Services (ITS).

3-2-1…blast off! COE students use ARC-TS HPC clusters for rocket design

By | Educational, General Interest, Great Lakes, Happenings, HPC, News
MASA team photo

The MASA team has been working with the ARC-TS and the Great Lakes High-Performance Computing Clusters to rapidly iterate simulations. What previously took six hours on another cluster, takes 15 minutes on Great Lakes. (Image courtesy of MASA)

This article was written by Taylor Gribble, the ARC-TS summer 2020 intern. 

The Michigan Aeronautical Science Association (MASA) is a student-run engineering team at U-M that has been designing, building, and launching rockets since its inception in 2003. Since late 2017, MASA has focused on developing liquid-bipropellant rockets—which are rockets that react to a liquid fuel with a liquid oxidizer to produce thrust—in an effort to remain at the forefront of collegiate rocketry. The team is made up of roughly 70 active members including both undergraduate and graduate students who participate year-round.

Since 2018, MASA has been working on the Tangerine Space Machine (TSM) rocket which aims to be the first student-built liquid-bipropellant rocket to ever be launched to space. When completed, the rocket’s all-metal airframe will stand over 25 feet tall. The TSM will reach an altitude of 400,000 feet and will fly to space at over five times the speed of sound.

MASA is building this rocket as part of the Base 11 Space Challenge which was organized by the Base 11 Organization to encourage high school and college students to get involved in STEM fields. The competition has a prize of $1 million, to be awarded to the first team to successfully reach space. MASA is currently leading the competition, having won Phase 1 of the challenge in 2019 with the most promising preliminary rocket design.

Since the start of the TSM project, MASA has made great strides towards achieving its goals. The team has built and tested many parts of the complete system, including custom tanks, electronics, and ground support equipment. In 2020, the experimental rocket engine designed by MASA for the rocket broke the student thrust record when it was tested, validating the work that the team had put into the test.

The team’s rapid progress was made possible in-part by the extensive and lightning-quick simulations using the ARC-TS Great Lakes High-Performance Computing Cluster.

The student engineers are Edward Tang, Tommy Woodbury, and Theo Rulko, and they have been part of MASA for over two years.

Tang is MASA’s aerodynamics and recovery lead and a junior studying aerospace engineering with a minor in computer science. His team is working to develop advanced in-house flight simulation software to predict how the rocket will behave during its trip to space.

“Working on the Great Lakes HPC Cluster allows us to do simulations that we can’t do anywhere else. The simulations are complicated and can be difficult to run. We have to check it, and do it again; over and over and over,” said Tang. “The previous computer we used would take as long as six hours to render simulations. It took 15 minutes on Great Lakes.”

A computer simulation of Liquid Oxygen Dome Coupled Thermal-Structural

This image shows a Liquid Oxygen Dome Coupled Thermal-Structural simulation that was created on the ARC-TS Great Lakes HPC Cluster. (Image courtesy of MASA)

Rulko, the team’s president, is a junior studying aerospace engineering with a minor in materials science and engineering.

Just like Tang, Rulko has experience using the Great Lakes cluster. “Almost every MASA subteam has benefited from access to Great Lakes. For example, the Structures team has used it for Finite Element Analysis simulations of complicated assemblies to make them as lightweight and strong as possible, and the Propulsion team has used it for Computational Fluid Dynamics simulations to optimize the flow of propellants through the engine injector. These are both key parts of what it takes to design a rocket to go to space which we just wouldn’t be able to realistically do without access to the tools provided by ARC-TS.”

Rulko’s goals for the team include focusing on developing as much hardware/software as possible in-house so that members can control and understand the entire process. He believes MASA is about more than just building rockets; his goal for the team is to teach members about custom design and fabrication and to make sure that they learn the problem-solving skills they need to tackle real-world engineering challenges. “We want to achieve what no other student team has.”

MASA has recently faced unforeseen challenges due to the COVID-19 pandemic that threaten to hurt not only the team’s timeline but also to derail the team’s cohesiveness. “Beaucase of the pandemic, the team is dispersed literally all over the world. Working with ARC-TS has benefitted the entire team. The system has helped us streamline and optimize our workflow, and has made it easy to connect to Great Lakes, which allows us to rapidly develop and iterate our simulations while working remotely from anywhere,” said Tang. “The platform has been key to allowing us to continue to make progress during these difficult times.”

Tommy Woodbury is a senior studying aerospace engineering. Throughout his time on MASA he has been able to develop many skills. “MASA is what has made my time here at Michigan a really positive experience. Having a group of highly-motivated and supportive individuals has undoubtedly been one of the biggest factors in my success transferring to Michigan.

This image depicts the Liquid Rocket Engine Injector simulation.

This image depicts the Liquid Rocket Engine Injector simulation. (Image courtesy of MASA)

ARC-TS is a division of Information and Technology Services. Great Lakes is available without charge for student teams and organizations who need HPC resources. This program aims to enable students access to high-performance computing to enhance their team’s mission.

Open OnDemand Update on Great Lakes and Lighthouse May 21, 2020

By | Great Lakes, HPC, News

We are migrating Open OnDemand from version 1.4 to 1.6 to fix a security issue on May 21, 2020. Users will not be able to use the service during the upgrade process but running jobs should continue to run, based on our testing. If you need access during this period of time, we recommend ending your existing job and resubmitting when the service is restored.

Lighthouse will be upgraded from 9 a.m. to 12:00 p.m.  (ITS Status Page Link)

Great Lakes will be upgraded from 1 p.m. to 5:00 p.m.  (ITS Status Page Link)

Great Lakes Update: August 2019

By | General Interest, Great Lakes, Happenings, HPC, News

Great Lakes cluster is available for general access

What is the current status of the Great Lakes cluster

Now that we have completed Early User testing, the Great Lakes cluster is available for general access to the University community. Until the migration from Flux is complete on November 25, 2019, there will be no charge for using the Great Lakes cluster.

Noteworthy Features

  • The Great Lakes cluster compute nodes use the new Intel Skylake processor. In particular, the Skylake CPUs on the standard and large memory compute nodes will provide researchers more consistent performance, regardless of how many other jobs are on the machine. 
  • The Great Lakes cluster has 20 GPU nodes, each of which contains two NVidia V100 GPUs which are significantly faster than the K20 and K40 GPUs on Flux.
  • The HDR100 InfiniBand network will provide consistent 100Gb/s performance across all nodes. On Flux, this ranged from 40-100Gb/s, depending on the node your job used.
  • The high performance GPFS /scratch system, with a capacity of approximately two petabytes, is significantly faster than /scratch on Flux. 
  • The Torque-based batch job submission environment has been replaced with the Slurm resource manager. We expect this system to be significantly more responsive and quicker at starting jobs than was the case on Flux.
  • For web-based job submission, the Open OnDemand system will replace the ARC Connect environment for providing web based file access, job submission, remote desktop, graphical Matlab, Jupyter Notebooks, and more. For more information, see the web-based access section in our user guide. 

How do I get access?

Every Flux user has a login on the Great Lakes cluster; you should be able to log in via ssh to greatlakes.arc-ts.umich.edu. We have created Slurm accounts for each PI or project based on the current Flux accounts. You can see what Slurm accounts you have access to by running the command `my_accounts.`  

Additionally, you can access the Great Lakes cluster via the web through our Open OnDemand portal. Here you can submit jobs, see submitted jobs, create Jupyter Notebooks and more. Please see the Great Lakes Cluster User Guide for more information.

Where do I read more about the Great Lakes cluster and how to use it?

The current documentation for the Great Lakes cluster, including configuration, user guides, and known issues can be found at https://arc-ts.umich.edu/greatlakes.

There is a schedule for upcoming training sessions on the CSCAR website, and we will communicate new sessions through Twitter and email.

Software

Almost all of the software packages available on Flux have been recompiled on the Great Lakes cluster for improved performance anticipated from the Intel SkyLake architecture. In most cases, the latest software version available is being provided. If you need older versions or need additional packages, let us know via email at arcts-support@umich.edu.

We have also reorganized the software module structure to make it easier to find packages you want to load as well as automatically loading prerequisites. To search for packages, use the “module spider” command along with the name of the package or keywords. In many cases we combined similar packages into “Collections” such as Chemistry and BioInformatics. The command “module load Chemistry” will make any Chemistry package available to you and packages in the Chemistry collection will then be discoverable via the “module available” command. After loading a specific collection, you must then load any individual packages within that collection that you would like to use.

What are the rates? 

We are working with ITS and UM Finance for approved service rates. Current plans are to have proposed1 rates identified by end of August. As soon as this information is more concrete, we will provide an update on the Great Lakes cluster website and in our email communication. We understand that this information is necessary for planning purposes and apologize for any impacts this has had on your budget planning. 

What can be shared at this time is the new approach to billing that will be used for the Great Lakes cluster. Unlike Flux, there are no monthly allocations with fixed fees regardless of whether they are used or not. On the Great Lakes cluster, the monthly charge for an account will be calculated based on the resources used by jobs each month. The cost calculation for each job will be based on the amount and type of resources the job reserves and how long the job runs. This should be a significantly more flexible system and won’t require updating allocations as your computing needs change over time.

1 Rates are not considered final until they have been formally approved by OFA.

Flux to the Great Lakes cluster transition efforts

If you have not already, you should be developing a plan to migrate your work from Flux to the Great Lakes cluster.  If you need help in developing a plan, please contact us and we can provide assistance during this migration period. 

  • ARC-TS and academic unit support teams will be offering training sessions around campus. We will have a training sessions schedule on the ARC-TS website. We also communicate new sessions through Twitter and email.
  • To assist your transition, if you have any Turbo or MiStorage NFS mounts on Flux, those mounts will also be available on the Great Lakes cluster.  If you would prefer to not have those volumes mounted on the Great Lakes cluster, email us at arcts-support@umich.edu.

Ensure that your migration from Flux to the Great Lakes cluster is completed by November 25, 2019. No jobs on Flux will run after November 25, 2019.

Additional Information

We will be adding new capabilities in the coming weeks and months and will continue to communicate these capabilities by email as they become available. If you have any questions, email us at arcts-support@umich.edu.

Great Lakes Update: March 2019

By | Flux, General Interest, Great Lakes, Happenings, HPC, News

ARC-TS previously shared much of this information through the December 2018 ARC Newsletter and on the ARC-TS website. We have added some additional details surrounding the timeline for Great Lakes as well as for users who would like to participate in Early User testing.

What is Great Lakes?

The Great Lakes service is a next generation HPC platform for University of Michigan researchers, which will provide several performance advantages compared to Flux. Great Lakes is built around the latest Intel CPU architecture called Skylake and will have standard, large memory, visualization, and GPU-accelerated nodes.  For more information on the technical aspects of Great Lakes, please see the Great Lakes configuration page.

Key Features:

  • Approximately 13,000 Intel Skylake Gold processors providing AVX512 capability providing over 1.5 TFlop of performance per node
  • 2 PB scratch storage system providing approximately 80 GB/s performance (compared to 8 GB/s on Flux)
  • New InfiniBand network with improved architecture and 100 Gb/s to each node
  • Each compute node will have significantly faster I/O via SSD-accelerated storage
  • Large Memory Nodes with 1.5 TB memory per node
  • GPU Nodes with NVidia Volta V100 GPUs (2 GPUs per node)
  • Visualization Nodes with Tesla P40 GPUs

Great Lakes will be using Slurm as the resource manager and scheduler, which will replace Torque and Moab on Flux. This will be the most immediate difference between the two clusters and will require some work on your part to transition from Flux to Great Lakes.

Another significant change is that we are making Great Lakes easier to use through a simplified accounting structure.  Unlike Flux where you need an account for each resource, on Great Lakes you can use the same account and simply request the resources you need, from GPUs to large memory.

There will be two primary ways to get access to compute time: 1) the on-demand model, which adds up the account’s job charges (reserved resources multiplied by the time used) and is billed monthly, similar to Flux On-Demand and 2) node purchases.  In the node purchase model, you will own the hardware which will reside in Great Lakes through the life of the cluster. You will receive an equivalent credit which you can use anywhere on the cluster, including on GPU and large memory nodes. We believe this will be preferable to buying actual hardware in the FOE model, as your daily computational usage can increase and decrease as your research requires. Send us an email at arcts-support@umich.edu if you have any questions or are interested in purchasing hardware on Great Lakes.

When will Great Lakes be available?

The ARC-TS team will prepare the cluster in April 2019 for an Early User period beginning in May, which will continue for approximately 4 weeks to ensure sufficient time to address any issues. General availability of Great Lakes should occur in June 2019.  We have a timeline for the Great Lakes project which will have more detail.

How does this impact me? Why Great Lakes?

After being the primary HPC cluster for the University for 8 years, Flux will be retired in September 2019.  Once Great Lakes becomes available to the University community, we will provide a few months to transition from Flux to Great Lakes.  Flux will be retired after that period due to aging hardware as well as expiring service contracts and licenses. We highly recommend preparing to migrate as early as possible so your research will not be interrupted.  Later in this email, we have suggestions for what you can do to make this migration process as easy as possible.

When Great Lakes becomes generally available to the University community, we will no longer be accepting new Flux accounts or allocations.  All new work should be focused on Great Lakes.

You can see the HPC timeline, including Great Lakes, Beta and Flux, here.

What is the current status of Great Lakes?

Today, the Great Lakes HPC compute hardware and high-performance Storage System has been fully installed and configured. In parallel with this work, the ARC-TS and Unit Support team members have been readying the new service with new software, modules as well as developing training to support the transition onto Great Lakes. A key feature of the new Great Lakes service is the just released HDR InfiniBand from Mellanox. Today, the hardware is installed but the firmware is still in its final stages of testing with the supplier with a target delivery of of mid-April 2019. Given the delays, ARC-TS and the suppliers have discussed an adjusted plan that allows quicker access to the cluster while supporting the future update once the firmware becomes available.

We are working with ITS Finance to define rates for Great Lakes.  We will update the Great Lakes documentation when we have final rates and let everyone know in subsequent communications.

What should I do to transition to Great Lakes?

We hope the transition from Flux to Great Lakes will be relatively straightforward, but to minimize disruptions to your research, we recommend you do your testing early.  In October 2018, we announced availability of the HPC cluster Beta in order to help users with this migration. Primarily, it allows users to migrate their PBS/Torque job submission scripts to Slurm.  You can and should also see the new Modules environments, as they have changed from their current configuration on Flux. Beta is using the same generation of hardware as Flux, so your performance will be similar to that on Flux. You should continue to use Flux for your production work; Beta is only to help test your Slurm job scripts and not for any production work.

Every user on Flux has an account on Beta.  You can login into Beta at beta.arc-ts.umich.edu.  You will have a new home directory on Beta, so you will need to migrate any scripts and data files you need to test your workloads into this new directory.  Beta should not be used for any PHI, HIPAA, Export Controlled, or any sensitive data!  We highly recommend that you use this time to convert your Torque scripts to Slurm and test that everything works as you would expect it to.  

To learn how to use Slurm, we have provided documentation on our Beta website.  Additionally, ARC-TS and academic unit support teams will be offering training sessions around campus. We will have a schedule on the ARC-TS website as well as communicate new sessions through Twitter and email.

If you have compiled software for use on Flux, we highly recommend that you recompile on Great Lakes once it becomes available.  Great Lakes is using the latest CPUs from Intel and by recompiling, your code may get performance gains by taking advantage of new capabilities on the new CPUs.

Questions? Need Assistance?

Contact arcts-support@umich.edu.

Great Lakes Update: December 2018

By | Flux, General Interest, Great Lakes, Happenings, News

What is Great Lakes?

The Great Lakes service is a next generation HPC platform for University of Michigan researchers. Great Lakes will provide several performance advantages compared to Flux, primarily in the areas of storage and networking. Great Lakes is built around the latest Intel CPU architecture called Skylake and will have standard, large memory, visualization, and GPU-accelerated nodes. For more information on the technical aspects of Great Lakes, please see the Great Lakes configuration page.

Key Features:

  • Approximately 13,000 Intel Skylake Gold processors providing AVX512 capability providing over 1.5 TFlop of performance per node
  • 2 PB scratch storage system providing approximately 80 GB/s performance (compared to 8 GB/s on Flux)
  • New InfiniBand network with improved architecture and 100 Gb/s to each node
  • Each compute node will have significantly faster I/O via SSD-accelerated storage
  • Large Memory Nodes with 1.5 TB memory per node
  • GPU Nodes with NVidia Volta V100 GPUs (2 GPUs per node)
  • Visualization Nodes with Tesla P40 GPUs

Great Lakes will be using Slurm as the resource manager and scheduler, which will replace Torque and Moab on Flux. This will be the most immediate difference between the two clusters and will require some work on your part to transition from Flux to Great Lakes.

Another significant change is that we are making Great Lakes easier to use through a simplified accounting structure. Unlike Flux where you need an account for each resource, on Great Lakes you can use the same account and simply request the resources you need, from GPUs to large memory.

There will be two primary ways to get access to compute time: 1) the pay-as-you-go model similar to Flux On-Demand and 2) node purchases.  Node purchases will give you computational time commensurate to 4 years multiplied by the number of nodes you buy. We believe this will be preferable to buying actual hardware in the FOE model, as your daily computational usage can increase and decrease as your research requires. Additionally you will not be limited by hardware failures on your specific nodes, as your jobs can run anywhere on Great Lakes. Send us an email at arcts-support@umich.edu if you have any questions or are interested in purchasing hardware on Great Lakes.

When will Great Lakes be available?

The ARC-TS team will prepare the cluster in February/March 2019 for an Early User period which will continue for several weeks to ensure sufficient time to address any issues. General availability of Great Lakes should occur in April.

How does this impact me? Why Great Lakes?

After being the primary HPC cluster for the University for 8 years, Flux will be retired in September 2019.  Once Great Lakes becomes available to the University community, we will provide a few months to transition from Flux to Great Lakes. Flux will be retired after that period due to aging hardware as well as expiring service contracts and licenses. We highly recommend preparing to migrate as early as possible so your research will not be interrupted. Later in this email, we have suggestions for what you can do to make this migration process as easy as possible.

When Great Lakes becomes generally available to the University community, we will no longer be accepting new Flux accounts or allocations.  All new work should be focused on Great Lakes.

What is the current status of Great Lakes?

Today, the Great Lakes HPC compute hardware has been fully installed and the high-performance Storage System configuration is in progress. In parallel with this work, the ARC-TS and Unit Support team members have been readying the new service with new software, modules as well as developing training to support the transition onto Great Lakes. A key feature of the new Great Lakes service is the just released HDR InfiniBand from Mellanox. Today, the hardware is available but the firmware is still in its final stages of testing with the supplier with a target delivery date of March (2019). Given the delays, ARC-TS and the suppliers have discussed an adjusted plan that allows quicker access to the cluster while supporting the future update once the firmware becomes available.

What should I do to transition to Great Lakes?

We hope the transition from Flux to Great Lakes will be relatively straightforward, but to minimize disruptions to your research, we recommend you do your testing early.  In October, we announced availability of the HPC cluster Beta in order to help users with this migration. Primarily, it allows users to migrate their PBS/Torque job submission scripts to Slurm. You can also see the new Modules environments, as they have changed from their current configuration on Flux. Beta is using the same generation of hardware as Flux, so your performance will be similar to that on Flux. You should continue to use Flux for your production work; Beta is only to help test your Slurm job scripts and not for any production work.

Every user on Flux has an account on Beta.  You can login into Beta at beta.arc-ts.umich.edu. You will have a new home directory on Beta, so you will need to migrate any scripts and data files you need to test your workloads into this new directory. Beta should not be used for any PHI, HIPAA, Export Controlled, or any sensitive data! We highly recommend that you use this time to convert your Torque scripts to Slurm and test that everything works as you would expect it to.  

To learn how to use Slurm, we have provided documentation on our Beta website. Additionally, ARC-TS and academic unit support teams will be offering training sessions around campus. We’ll have a schedule on the ARC-TS website as well as communicate new sessions through Twitter and email.

If you have compiled software for use on Flux, we highly recommend that you recompile on Great Lakes once it becomes available. Great Lakes is using the latest CPUs from Intel and by recompiling, your code may get performance gains by taking advantage of new capabilities on the new CPUs.

Questions? Need Assistance?

Contact arcts-support@umich.edu.