Explore ARCExplore ARC

U-M selects Dell EMC, Mellanox and DDN to Supply New “Great Lakes” Computing Cluster

By | Flux, General Interest, Happenings, HPC, News

The University of Michigan has selected Dell EMC as lead vendor to supply its new $4.8 million Great Lakes computing cluster, which will serve researchers across campus. Mellanox Technologies will provide networking solutions, and DDN will supply storage hardware.

Great Lakes will be available to the campus community in the first half of 2019, and over time will replace the Flux supercomputer, which serves more than 2,500 active users at U-M for research ranging from aerospace engineering simulations and molecular dynamics modeling to genomics and cell biology to machine learning and artificial intelligence.

Great Lakes will be the first cluster in the world to use the Mellanox HDR 200 gigabit per second InfiniBand networking solution, enabling faster data transfer speeds and increased application performance.

“High-performance research computing is a critical component of the rich computing ecosystem that supports the university’s core mission,” said Ravi Pendse, U-M’s vice president for information technology and chief information officer. “With Great Lakes, researchers in emerging fields like machine learning and precision health will have access to a higher level of computational power. We’re thrilled to be working with Dell EMC, Mellanox, and DDN; the end result will be improved performance, flexibility, and reliability for U-M researchers.”

“Dell EMC is thrilled to collaborate with the University of Michigan and our technology partners to bring this innovative and powerful system to such a strong community of researchers,” said Thierry Pellegrino, vice president, Dell EMC High Performance Computing. “This Great Lakes cluster will offer an exceptional boost in performance, throughput and response to reduce the time needed for U-M researches to make the next big discovery in a range of disciplines from artificial intelligence to genomics and bioscience.”

The main components of the new cluster are:

  • Dell EMC PowerEdge C6420 compute nodes, PowerEdge R640 high memory nodes, and PowerEdge R740 GPU nodes
  • Mellanox HDR 200Gb/s InfiniBand ConnectX-6 adapters, Quantum switches and LinkX cables, and InfiniBand gateway platforms
  • DDN GRIDScaler® 14KX® and 100 TB of usable IME® (Infinite Memory Engine) memory

“HDR 200G InfiniBand provides the highest data speed and smart In-Network Computing acceleration engines, delivering HPC and AI applications with the best performance, scalability and efficiency,” said Gilad Shainer, vice president of marketing at Mellanox Technologies. “We are excited to collaborate with the University of Michigan, Dell EMC and DataDirect Networks, in building a leading HDR 200G InfiniBand-based supercomputer, serving the growing demands of U-M researchers.”

“DDN has a long history of working with Dell EMC and Mellanox to deliver optimized solutions for our customers. We are happy to be a part of the new Great Lakes cluster, supporting its mission of advanced research and computing. Partnering with forward-looking thought leaders as these is always enlightening and enriching,” said Dr. James Coomer, SVP Product Marketing and Benchmarks at DDN.

Great Lakes will provide significant improvement in computing performance over Flux. For example, each compute node will have more cores, higher maximum speed capabilities, and increased memory. The cluster will also have improved internet connectivity and file system performance, as well as NVIDIA Tensor GPU cores, which are very powerful for machine learning compared to prior generations of GPUs.

“Users of Great Lakes will have access to more cores, faster cores, faster memory, faster storage, and a more balanced network,” said Brock Palen, Director of Advanced Research Computing – Technology Services (ARC-TS).

The Flux cluster was created approximately 8 years ago, although many of the individual nodes have been added since then. Great Lakes represents an architectural overhaul that will result in better performance and efficiency. Based on extensive input from faculty and other stakeholders across campus, the new Great Lakes cluster will be designed to deliver similar services and capabilities as Flux, including the ability to accommodate faculty purchases of hardware, access to GPUs and large-memory nodes, and improved support for emerging uses such as machine learning and genomics.

ARC-TS will operate and maintain the cluster once it is built. Allocations of computing resources through ARC-TS include access to hundreds of software titles, as well as support and consulting from professional staff with decades of combined experience in research computing.

Updates on the progress of Great Lakes will be available at https://arc-ts.umich.edu/greatlakes/.

ARC-TS begins work on new “Great Lakes” cluster to replace Flux

By | Flux, Happenings, HPC, News

Advanced Research Computing – Technology Services (ARC-TS) is starting the process of creating a new, campus-wide computing cluster, “Great Lakes,” that will serve the broad needs of researchers across the University. Over time, Great Lakes will replace Flux, the shared research computing cluster that currently serves over 300 research projects and 2,500 active users.

“Researchers will see improved performance, flexibility and reliability associated with newly purchased hardware, as well as changes in policies that will result in greater efficiencies and ease of use,” said Brock Palen, director of ARC-TS.

The Great Lakes cluster will be available to all researchers on campus for simulation, modeling, machine learning, data science, genomics, and more. The platform will provide a balanced combination of computing power, I/O performance, storage capability, and accelerators.

ARC-TS is in the process of procuring the cluster. Only minimal interruption to ongoing research is expected. A “Beta” cluster will be available to help researchers learn the new system before Great Lakes is deployed in the first half of 2019.

The Flux cluster is approximately 8 years old, although many of the individual nodes are newer. One of the benefits of replacing the cluster is to create a more homogeneous platform.

Based on extensive input from faculty and other stakeholders across campus, the new Great Lakes cluster will be designed to deliver similar services and capabilities as Flux, including the ability to accommodate faculty purchases of hardware, access to GPUs and large-memory nodes, and improved support for emerging uses such as machine learning and genomics. The cluster will consist of approximately 20,000 cores.

For more information, contact hpc-support@umich.edu, and see arc-ts.umich.edu/systems-services/greatlakes, where updates to the project will be posted.

Service Disruption: MiStorage Silver (Value Storage) Maintenance

By |

The ITS Storage team will be applying an operating system patch on the MiStorage Silver (Value Storage) environment, which provides home directories for both Flux and Flux Hadoop.  The ITS maintenance window will be from December 2nd 11:00pm to December 3rd 7:00am (8 hours total) and ARC-TS will prepare Flux, Armis, and Flux Hadoop for the maintenance starting at 10:00pm.  This update might be potentially disruptive to the stability of the nodes and jobs running on them.

The ITS status page for this incident is here:  http://status.its.umich.edu/report.php?id=141155

For Flux and Armis users: we have created a reservation on Flux so no jobs will be running or impacted.  We will remove the reservation after we receive the update from the ITS storage team of a successful update.

For Flux Hadoop users:  The scheduler and user logins will be deactivated when the outage starts, and any user currently logged into the cluster will be logged out for the duration of the outage.  We will reactivate access when we have received the all-clear from the ITS storage team of a successful update.

Status updates will be posted on the ARC-TS Twitter feed: https://twitter.com/arcts_um  and if you have any questions, please email us at hpc-support@umich.edu.

CSCAR provides walk-in support for new Flux users

By | Data, Educational, Flux, General Interest, HPC, News

CSCAR now provides walk-in support during business hours for students, faculty, and staff seeking assistance in getting started with the Flux computing environment.  CSCAR consultants can walk a researcher through the steps of applying for a Flux account, installing and configuring a terminal client, connecting to Flux, basic SSH and Unix command line, and obtaining or accessing allocations.  

In addition to walk-in support, CSCAR has several staff consultants with expertise in advanced and high performance computing who can work with clients on a variety of topics such as installing, optimizing, and profiling code.  

Support via email is also provided via hpc-support@umich.edu.  

CSCAR is located in room 3550 of the Rackham Building (915 E. Washington St.). Walk-in hours are from 9 a.m. – 5 p.m., Monday through Friday, except for noon – 1 p.m. on Tuesdays.

See the CSCAR web site (cscar.research.umich.edu) for more information.

HPC training workshops begin Thursday, Sept. 21

By | Educational, Events, General Interest, HPC, News

series of training workshops in high performance computing will be held Sept. 21 through Oct. 31, 2017, presented by CSCAR in conjunction with Advanced Research Computing – Technology Services (ARC-TS). All sessions are held at East Hall, Room B254, 530 Church St.

Introduction to the Linux command Line
This course will familiarize the student with the basics of accessing and interacting with Linux computers using the GNU/Linux operating system’s Bash shell, also known as the “command line.”
Dates: (Please sign up for only one)
• Thursday, Sept. 21, 9 a.m. – noon (full descriptionregistration)
• Thursday, Sept. 28, 9 a.m. – noon (full description | registration)
Location:
East Hall, Room B250, 530 Church St.

Introduction to the Flux cluster and batch computing
This workshop will provide a brief overview of the components of the Flux cluster, including the resource manager and scheduler, and will offer students hands-on experience.
Dates: (Please sign up for only one)
• Thursday, Sept. 28, 1 – 4 p.m. (full description | registration)
• Monday, Oct. 2, 9 a.m. – noon (full description | registration)
Location:
East Hall, Room B254, 530 Church St.

Advanced batch computing on the Flux cluster
This course will cover advanced areas of cluster computing on the Flux cluster, including common parallel programming models, dependent and array scheduling, among other topics.
Dates: (Please sign up for only one)
• Tuesday, Oct. 10, 1 – 5 p.m. (full description | registration) Location: East Hall, Room B254, 530 Church St.
• Thursday, Oct. 12, 9 a.m. – noon (full description | registration) Location: East Hall, Room B254, 530 Church St.
• Friday, Oct. 13, 9 a.m. – noon (full description | registration) Location: East Hall, Room B250, 530 Church St.

Hadoop Workshop
Learn how to process large amounts (up to terabytes) of data using SQL and/or simple programming models available in Python, Scala, and Java.
Date:
• Tuesday, Oct. 31, 1 – 5 p.m. (full description | registration)
Location:
East Hall, Room B254, 530 Church St.

Summer HPC maintenance

By |

To accommodate equipment repairs, and upgrades to software, hardware, and operating systems, Flux, Armis, ConFlux, Flux Hadoop, and their storage systems (/home and /scratch) will be unavailable starting at 6 a.m. Saturday, July 29, returning to service on Wednesday, August 2.  

During this time, the following updates are planned:

  • Annual power maintenance at the Modular Data Center.  All systems will be powered off. (Flux/Armis/Flux Hadoop)
  • Campus network hardware and software updates (Flux/Armis/Flux Hadoop)
  • InfiniBand networking updates (firmware and software) (Flux/Armis/ConFlux)
  • Operating system and software updates (All clusters).
  • Resource manager and job scheduling software updates (Flux/Armis).
  • Migrate NFS volumes, including /home, from Value Storage to Turbo (Flux)
  • Update hardware and software of the Lustre file systems that provide /scratch (Flux)

For Flux HPC jobs, you can use the command “maxwalltime” to discover the amount of time remaining until the beginning of the maintenance. Jobs requesting more walltime than remains before the maintenance will be queued and started after the maintenance is completed.

All Flux, Armis, ConFlux, and Flux Hadoop filesystems will be unavailable during the maintenance. We encourage you to copy any data that might be needed during that time from Flux prior to the start of the maintenance.

We will post status updates on our Twitter feed ( https://twitter.com/arcts_um ) throughout the course of the maintenance and send an email to all HPC and Hadoop users when the maintenance has been completed.

ARC-TS Town Hall on Next Generation HPC Cluster

By |

The University of Michigan is beginning the process of building our next generation HPC platform, “Big House.”  Flux, the shared HPC cluster, has reached the end of its useful life. Flux has served us well for more than five years, but as we move forward with replacement, we want to make sure we’re meeting the needs of the research community.

ARC-TS will be holding a series of town halls to take input from faculty and researchers on the next HPC platform to be built by the University.  These town halls are open to anyone.

Your input will help to ensure that U-M is on course for providing HPC, so we hope you will make time to attend one of these sessions. If you cannot attend, please email hpc-support@umich.edu with any input you want to share.

ARC-TS Town Hall on Next Generation HPC Cluster

By |

The University of Michigan is beginning the process of building our next generation HPC platform, “Big House.”  Flux, the shared HPC cluster, has reached the end of its useful life. Flux has served us well for more than five years, but as we move forward with replacement, we want to make sure we’re meeting the needs of the research community.

ARC-TS will be holding a series of town halls to take input from faculty and researchers on the next HPC platform to be built by the University.  These town halls are open to anyone.

Your input will help to ensure that U-M is on course for providing HPC, so we hope you will make time to attend one of these sessions. If you cannot attend, please email hpc-support@umich.edu with any input you want to share.

ARC-TS Town Hall on Next Generation HPC Cluster

By |

The University of Michigan is beginning the process of building our next generation HPC platform, “Big House.”  Flux, the shared HPC cluster, has reached the end of its useful life. Flux has served us well for more than five years, but as we move forward with replacement, we want to make sure we’re meeting the needs of the research community.

ARC-TS will be holding a series of town halls to take input from faculty and researchers on the next HPC platform to be built by the University.  These town halls are open to anyone.

Your input will help to ensure that U-M is on course for providing HPC, so we hope you will make time to attend one of these sessions. If you cannot attend, please email hpc-support@umich.edu with any input you want to share.

ARC-TS Town Hall on Next Generation HPC Cluster

By |

The University of Michigan is beginning the process of building our next generation HPC platform, “Big House.”  Flux, the shared HPC cluster, has reached the end of its useful life. Flux has served us well for more than five years, but as we move forward with replacement, we want to make sure we’re meeting the needs of the research community.

ARC-TS will be holding a series of town halls to take input from faculty and researchers on the next HPC platform to be built by the University.  These town halls are open to anyone.

Your input will help to ensure that U-M is on course for providing HPC, so we hope you will make time to attend one of these sessions. If you cannot attend, please email hpc-support@umich.edu with any input you want to share.