Explore ARCExplore ARC

Winter HPC maintenance completed

By | Beta, Flux, General Interest, Happenings, HPC, News

Flux, Beta, Armis, Cavium, and ConFlux, and their storage systems (/home and /scratch) are back online after three days of maintenance.  The updates that have been completed will improve the performance and stability of ARC-TS services. 

The following maintenance tasks were done:

  • Preventative maintenance at the Modular Data Center (MDC) which requires a full power outage
  • InfiniBand networking updates (firmware and software)
  • Ethernet networking updates (datacenter distribution layer switches)
  • Operating system and software updates
  • Migration of Turbo networking to new switches (affects /home and /sw)
  • Perform consistency checks on the Lustre file systems that provide /scratch
  • Update firmware and software of the GPFS file systems (ConFlux, starting 9 a.m., Monday, Jan. 7)
  • Perform consistency checks on the GPFS file systems that provide /gpfs (ConFlux, starting 9 a.m., Monday, Jan. 7) 

Please contact hpc-support@umich.edu if you have any questions.

University of Michigan joins Ceph Foundation

By | General Interest, Happenings, News

Motivated by Ceph usage in the OSiRIS project, the University of Michigan has joined the Ceph Foundation as an Associate Member. We join other educational, government, and research organizations engaged in the Ceph foundation at this membership level.

From the Foundation website: The Ceph Foundation exists to enable industry members to collaborate and pool resources to support the Ceph project community. The Foundation provides an open, collaborative, and neutral home for project stakeholders to coordinate their development and community investments in the Ceph ecosystem.

Read more…

Great Lakes Update: December 2018

By | Flux, General Interest, Great Lakes, Happenings, News

What is Great Lakes?

The Great Lakes service is a next generation HPC platform for University of Michigan researchers. Great Lakes will provide several performance advantages compared to Flux, primarily in the areas of storage and networking. Great Lakes is built around the latest Intel CPU architecture called Skylake and will have standard, large memory, visualization, and GPU-accelerated nodes.  For more information on the technical aspects of Great Lakes, please see the Great Lakes configuration page.

Key Features:

  • Approximately 13,000 Intel Skylake Gold processors providing AVX512 capability providing over 1.5 TFlop of performance per node
  • 2 PB scratch storage system providing approximately 80 GB/s performance (compared to 8 GB/s on Flux)
  • New InfiniBand network with improved architecture and 100 Gb/s to each node
  • Each compute node will have significantly faster I/O via SSD-accelerated storage
  • Large Memory Nodes with 1.5 TB memory per node
  • GPU Nodes with NVidia Volta V100 GPUs (2 GPUs per node)
  • Visualization Nodes with Tesla P40 GPUs

Great Lakes will be using Slurm as the resource manager and scheduler, which will replace Torque and Moab on Flux. This will be the most immediate difference between the two clusters and will require some work on your part to transition from Flux to Great Lakes.

Another significant change is that we are making Great Lakes easier to use through a simplified accounting structure.  Unlike Flux where you need an account for each resource, on Great Lakes you can use the same account and simply request the resources you need, from GPUs to large memory.

There will be two primary ways to get access to compute time: 1) the pay-as-you-go model similar to Flux On-Demand and 2) node purchases.  Node purchases will give you computational time commensurate to 4 years multiplied by the number of nodes you buy. We believe this will be preferable to buying actual hardware in the FOE model, as your daily computational usage can increase and decrease as your research requires.  Additionally you will not be limited by hardware failures on your specific nodes, as your jobs can run anywhere on Great Lakes. Send us an email at hpc-support@umich.edu if you have any questions or are interested in purchasing hardware on Great Lakes.

When will Great Lakes be available?

The ARC-TS team will prepare the cluster in February/March 2019 for an Early User period which will continue for several weeks to ensure sufficient time to address any issues. General availability of Great Lakes should occur in April.

How does this impact me? Why Great Lakes?

After being the primary HPC cluster for the University for 8 years, Flux will be retired in September 2019.  Once Great Lakes becomes available to the University community, we will provide a few months to transition from Flux to Great Lakes.  Flux will be retired after that period due to aging hardware as well as expiring service contracts and licenses. We highly recommend preparing to migrate as early as possible so your research will not be interrupted.  Later in this email, we have suggestions for what you can do to make this migration process as easy as possible.

When Great Lakes becomes generally available to the University community, we will no longer be accepting new Flux accounts or allocations.  All new work should be focused on Great Lakes.

What is the current status of Great Lakes?

Today, the Great Lakes HPC compute hardware has been fully installed and the high-performance Storage System configuration is in progress. In parallel with this work, the ARC-TS and Unit Support team members have been readying the new service with new software, modules as well as developing training to support the transition onto Great Lakes. A key feature of the new Great Lakes service is the just released HDR InfiniBand from Mellanox. Today, the hardware is available but the firmware is still in its final stages of testing with the supplier with a target delivery date of March (2019). Given the delays, ARC-TS and the suppliers have discussed an adjusted plan that allows quicker access to the cluster while supporting the future update once the firmware becomes available.

What should I do to transition to Great Lakes?

We hope the transition from Flux to Great Lakes will be relatively straightforward, but to minimize disruptions to your research, we recommend you do your testing early.  In October, we announced availability of the HPC cluster Beta in order to help users with this migration. Primarily, it allows users to migrate their PBS/Torque job submission scripts to Slurm.  You can also see the new Modules environments, as they have changed from their current configuration on Flux. Beta is using the same generation of hardware as Flux, so your performance will be similar to that on Flux.  You should continue to use Flux for your production work; Beta is only to help test your Slurm job scripts and not for any production work.

Every user on Flux has an account on Beta.  You can login into Beta at beta.arc-ts.umich.edu.  You will have a new home directory on Beta, so you will need to migrate any scripts and data files you need to test your workloads into this new directory.  Beta should not be used for any PHI, HIPAA, Export Controlled, or any sensitive data!  We highly recommend that you use this time to convert your Torque scripts to Slurm and test that everything works as you would expect it to.  

To learn how to use Slurm, we have provided documentation on our Beta website.  Additionally, ARC-TS and academic unit support teams will be offering training sessions around campus.  We’ll have a schedule on the ARC-TS website as well as communicate new sessions through Twitter and email.

If you have compiled software for use on Flux, we highly recommend that you recompile on Great Lakes once it becomes available.  Great Lakes is using the latest CPUs from Intel and by recompiling, your code may get performance gains by taking advantage of new capabilities on the new CPUs.

Questions? Need Assistance?

Contact hpc-support@umich.edu

Most CSCAR workshops will be free for the U-M community starting in January 2019

By | Educational, General Interest, Happenings, News

Beginning in January of 2019, most of CSCAR’s workshops will be offered free of charge to UM students, faculty, and staff.

CSCAR is able to do this thanks to funding from UM’s Data Science Initiative.  Registration for CSCAR workshops is still required, and seats are limited.

CSCAR requests that participants please cancel their registration if they decide not to attend a workshop for which they have previously registered.

Note that a small number of workshops hosted by CSCAR but taught by non-CSCAR personnel will continue to have a fee, and fees will continue to apply for people who are not UM students, faculty or staff.

Eric Michielssen completes term as Associate Vice President for Research – Advanced Research Computing

By | General Interest, Happenings, News

Eric Michielssen will step down from his position as Associate Vice President for Research – Advanced Research Computing on December 31, 2018, after serving in that leadership role for almost six years. Dr. Michielssen will return to his faculty role in the Department of Electrical Engineering and Computer Science in the College of Engineering.

Under his leadership, Advanced Research Computing has helped empower computational discovery through the Michigan Institute for Computational Discovery and Engineering (MICDE), the Michigan Institute for Data Science (MIDAS), Advanced Research Computing-Technology Services (ARC-TS) and Consulting for Statistics, Computing and Analytics Research (CSCAR).

In 2015, Eric helped launch the university’s $100 million Data Science initiative, which enhances opportunities for researchers across campus to tap into the enormous potential of big data. He also serves as co-director of the university’s Precision Health initiative, launched last year to harness campus-wide research aimed at finding personalized solutions to improve the health and wellness of individuals and communities.

The Office of Research will convene a group to assess the University’s current and emerging needs in the area of research computing and how best to address them.

U-M approves new graduate certificate in computational neuroscience

By | Educational, General Interest, Happenings, News

The new Graduate Certificate in Computational Neuroscience will help bridge the gap between experimentally focused studies and quantitative modeling and analysis, giving graduate students a chance to broaden their skill sets in the diversifying field of brain science.

“The broad, practical training provided in this certificate program will help prepare both quantitatively focused and lab-based students for the increasingly cross-disciplinary job market in neuroscience,” said Victoria Booth, Professor of Mathematics and Associate Professor of Anesthesiology, who will oversee the program.

To earn the certificate, students will be required to take core computational neuroscience courses and cross-disciplinary courses outside of their home departments; participate in a specialized interdisciplinary journal club; and complete a practicum.

Cross-discplinary courses will depend on a student’s focus: students in experimental neuroscience programs will take quantitative coursework, and students in quantitative science programs such as physics, biophysics, mathematics and engineering will take neuroscience coursework.

The certificate was approved this fall, and will be jointly administered by the Neuroscience Graduate Program (NGP) and the Michigan Institute for Computational Discovery and Engineering (MICDE).

For more information, visit micde.umich.edu/comput-neuro-certificate. Enrollment is not yet open, but information sessions will be scheduled early next year. Please register for the program’s mailing list if you’re interested.

Along with the Graduate Certificate in Computational Neuroscience, U-M offers several other graduate programs aimed at training students in computational and data-intensive science, including:

  • The Graduate Certificate in Computational Discovery and Engineering, which is focused on quantitative and computing techniques that can be applied broadly to all sciences.
  • The Graduate Certificate in Data Science, which specializes in statistical and computational methods required to analyze large data sets.
  • The Ph.D in Scientific Computing, intended for students who will make extensive use of large-scale computation, computational methods, or algorithms for advanced computer architectures in their doctoral studies. This degree is awarded jointly with an existing program, so that a student receives, for example, a Ph.D in Aerospace engineering and Scientific Computing.

 

U-M awarded a Clare Boothe Luce grant for fellowships to support women in STEM

By | Educational, General Interest, Happenings, News

The Clare Boothe Luce Program of the Henry Luce Foundation has awarded a $270,000 grant to the University of Michigan. The funding will support women PhD students through the Michigan Institute for Computational Discovery and Engineering (MICDE). The program aims to encourage women “to enter, study, graduate and teach” in science, and the funding will support women PhD students who make use of computational science in their research.

“We’re very excited to be able to promote women in scientific computing,” said Mariana Carrasco-Teja, manager of the grant and Associate Director of MICDE. “These resources generously provided by the Clare Boothe Luce program will make a huge difference in the careers of women pursuing computational science at U-M.”

For details on applying, and fellowship requirements, see the fellowship page at micde.umich.edu/academic-programs/cbl/.

The fellowships carry a $35,000 annual stipend and tuition, among other benefits. They will be awarded to students applying for PhD programs in fall 2019 in the College of Engineering, or several programs in the College of Literature, Science and the Arts (Applied and Interdisciplinary Mathematics, Applied Physics, Astronomy, Chemistry, Earth & Environmental Sciences, Mathematics, Physics, and Statistics).

The CBL program at U-M is funded by the Clare Boothe Luce Program of the Henry Luce Foundation, with additional support from the Rackham School of Graduate Studies, the College of Engineering, the College of Literature, Sciences and the Arts, and MICDE.

Winter HPC maintenance scheduled for Jan. 6-9

By | Beta, Flux, General Interest, Happenings, HPC, News

To accommodate updates to software, hardware, and operating systems, Flux, Beta, Armis, Cavium, and ConFlux, and their storage systems (/home and /scratch) will be unavailable starting at 6 a.m. Sunday, January 6th and returning to service on Wednesday, January 9th.  These updates will improve the performance and stability of ARC-TS services. We try to encapsulate the required changes into two maintenance periods per year and work to complete these tasks quickly, as we understand the impact of the maintenance on your research.

During this time, the following maintenance tasks are planned:

  • Preventative maintenance at the Modular Data Center (MDC) which requires a full power outage
  • InfiniBand networking updates (firmware and software)
  • Ethernet networking updates (datacenter distribution layer switches)
  • Operating system and software updates
  • Potential updates to job scheduling software
  • Migration of Turbo networking to new switches (affects /home and /sw)
  • Perform consistency checks on the Lustre file systems that provide /scratch
  • Update firmware and software of the GPFS file systems (ConFlux, starting 9 a.m., Monday, Jan. 7)
  • Perform consistency checks on the GPFS file systems that provide /gpfs (ConFlux, starting 9 a.m., Monday, Jan. 7) 

You can use the command “maxwalltime” to discover the amount of time remaining until the beginning of the maintenance. Jobs requesting more walltime than remains before the maintenance will be queued and started after the maintenance is completed.

All filesystems will be unavailable during the maintenance. We encourage you to copy any data that might be needed during that time from Flux prior to the start of the maintenance.

We will post status updates on our Twitter feed ( https://twitter.com/arcts_um ) throughout the course of the maintenance and send an email to all users when the maintenance has been completed.  Please contact hpc-support@umich.edu if you have any questions.

U-M participates in SC18 conference in Dallas

By | General Interest, Happenings, News

University of Michigan researchers and IT staff wrapped up a successful Supercomputing ‘18 (SC18) in Dallas from Nov. 11-16, 2018, taking part in a number of different aspects of the conference.

SC “Perennial” Quentin Stout, U-M professor of Electrical Engineering and Computer Science and one of only 19 people who have been to every Supercomputing conference, co-presented a tutorial titled Parallel Computing 101.

And with the recent announcement of a new HPC cluster on campus called Great Lakes, IT staff from Advanced Research Computing – Technology Services (ARC-TS) made presentations around the conference on the details of the new supercomputer.

U-M once again shared a booth with Michigan State University booth, highlighting our computational and data-intensive research as well as the comprehensive set of tools and services we provide to our researchers. Representatives from all ARC units were at the booth: ARC-TS, the Michigan Institute for Data Science (MIDAS), the Michigan Institute for Computational Discovery and Engineering (MICDE), and Consulting for Statistics, Computing and Analytics Research (CSCAR).

The booth also featured two demonstrations: one on the Open Storage Research Infrastructure or OSiRIS, the multi-institutional software-defined data storage system, and the Services Layer At The Edge (SLATE) project, both of which are supported by the NSF; the other tested conference-goers’ ability to detect “fake news” stories compared to an artificial intelligence system created by researchers supported by MIDAS.

Gallery

U-M Activities

  • Tutorial: Parallel Computing 101: Prof. Stout and Associate Professor Christiane Jablonowski of the U-M Department of Climate and Space Sciences and Engineering provided a comprehensive overview of parallel computing.
  • Introduction to Kubernetes. Presented by Bob Killen, Research Cloud Administrator, and Scott Paschke, Research Cloud Solutions Designer, both from ARC-TS. Containers have shifted the way applications are packaged and delivered. Their use in data science and machine learning is skyrocketing with the beneficial side effect of enabling reproducible research. This rise in use has necessitated the need to explore and adopt better container-centric orchestration tools. Of these tools, Kubernetes – an open-source container platform born within Google — has become the de facto standard. This half-day tutorial introduced researchers and sys admins who may already be familiar with container concepts to the architecture and fundamental concepts of Kubernetes. Attendees explored these concepts through a series of hands-on exercises and left with the leg-up in continuing their container education, and gained a better understanding of how Kubernetes may be used for research applications.
  • Brock Palen, Director of ARC-TS, spoke about the new Great Lakes HPC cluster:
    • DDN booth (3123)
    • Mellanox booth (3207)
    • Dell booth (3218)
    • SLURM booth (1242)
  • Todd Raeker, Research Technology Consultant for ARC-TS, went to the Globus booth (4201) to talk about U-M researchers’ use of the service.
  • Birds of a Feather: Meeting HPC Container Challenges as a Community. Bob Killen, Research Cloud Administrator at ARC-TS, gave a lightning talk as part of this session that presented, prioritized, and gathered input on top issues and budding solutions around containerization of HPC applications.
  • Sharon Broude Geva, Director of ARC, was live on the SC18 News Desk discussing ARC HPC services, Women in HPC, and the Coalition for Scientific Academic Computation (CASC). The stream was available from the Supercomputing Twitter account: https://twitter.com/Supercomputing
  • Birds of a Feather: Ceph Applications in HPC Environments: Ben Meekhof, HPC Storage Administrator at ARC-TS, gave a lightning talk on Ceph and OSiRIS as part of this session. More details at https://www.msi.umn.edu/ceph-hpc-environments-sc18
  • ARC was a sponsor of the Women in HPC Reception. See the event description for more details and to register. Sharon Broude Geva, Director of ARC, gave a presentation.
  • Birds of a Feather: Cloud Infrastructure Solutions to Run HPC Workloads: Bob Killen, Research Cloud Administrator at ARC-TS, presented at this session aimed at architects, administrators, software engineers, and scientists interested in designing and deploying cloud infrastructure solutions such as OpenStack, Docker, Charliecloud, Singularity, Kubernetes, and Mesos.
  • Jing Liu of the Michigan Institute for Data Science, participated in a panel discussion at the Purdue University booth.

Follow ARC on Twitter at https://twitter.com/ARC_UM for updates.

Beta cluster available for learning Slurm; new scheduler to be part of upcoming cluster updates

By | Flux, General Interest, Happenings, HPC, News

New HPC resources to replace Flux and updates to Armis are coming.  They will run a new scheduling system (Slurm). You will need to learn the commands in this system and update your batch files to successfully run jobs. Read on to learn the details and how to get training and adapt your files.

In anticipation of these changes, ARC-TS has created the test cluster “Beta,” which will provide a testing environment for the transition to Slurm. Slurm will be used on Great Lakes; the Armis HIPAA-aligned cluster; and a new cluster called “Lighthouse” which will succeed the Flux Operating Environment in early 2019.

Currently, Flux and Armis use the Torque (PBS) resource manager and the Moab scheduling system; when completed, Great Lakes and Lighthouse will use the Slurm scheduler and resource manager, which will enhance the performance and reliability of the new resources. Armis will transition from Torque to Slurm in early 2019.

The Beta test cluster is available to all Flux users, who can login via ssh at ‘beta.arc-ts.umich.edu’. Beta has its own /home directory, so users will need to create or transfer any files they need, via scp/sftp or Globus.

Slurm commands will be needed to submit jobs. For a comparison of Slurm and Torque commands, see our Torque to Slurm migration page. For more information, see the Beta home page.

Support staff from ARC-TS and individual academic units will conduct several in-person and online training sessions to help users become familiar with Slurm. We have been testing Slurm for several months, and believe the performance gains, user communications, and increased reliability will significantly improve the efficiency and effectiveness of the HPC environment at U-M.

The tentative time frame for replacing or transitioning current ARC-TS resources is:

  • Flux to Great Lakes, first half of 2019
  • Armis from Torque to Slurm, January 2019
  • Flux Operating Environment to Lighthouse, first half of 2019
  • Open OnDemand on Beta, which replaces ARC Connect for web-based job submissions, Jupyter Notebooks, Matlab, and additional software packages, fall 2018