Explore ARCExplore ARC

The Cavium ThunderX Cluster

By | Systems and Services

The Cavium Thunder X Hadoop cluster is a next-generation Hadoop cluster available to researchers at the University of Michigan.  The Cavium Thunder X cluster is an on-campus resource that currently holds 3 PB of storage for researchers to approach and analyze data science problems.

The cluster consists of 40 servers each containing 96 ARMv8 cores and 512 GB of RAM per server.  It is made possible though a partnership with Cavium in cooperation with Hortonworks, a leading open-source company and provider of the Hortonworks Data Platform.

The Cavium Thunder X platform is currently available as a pilot platform for researchers, with no associated charges, with a goal of general availability in the Spring of 2018.  The cluster provides a different service level than most cloud-based Hadoop offerings, including:

  • high-bandwidth data transfer to and from other campus data storage locations with no data transfer costs
  • very high-speed inter-node connections using 40Gb/s Ethernet.

The cluster provides 3 PB of total disk space, 40GbE inter-node networking, and Hortonworks Data Platform version 2.6.2, which includes hive, kafka, spark,  and sqoop.

For more information, contact hpc-support@umich.edu.

Order Service

To request an account, send an email to hpc-support@umich.edu. Users of the service must have a U-M uniqname and Duo security authentication to access this service.

When requesting an account, please supply your name, uniqname, and a short summary of what you’d like to do on the cluster.

Yottabyte Research Cloud

By | Systems and Services

yb-logoThe Yottabyte Research Cloud is a partnership between ARC and Yottabyte that provides U-M researchers with high performance, secure and flexible computing environments enabling the analysis of sensitive data sets restricted by federal privacy laws, proprietary access agreements, or confidentiality requirements.

The system is built on Yottabyte’s composable, software-defined infrastructure platform, called Cloud Composer and represents U-M’s first use of software-defined infrastructure for research, allowing the on-the-fly personalized configuration of any-scale computing resources.

Cloud Composer software inventories the physical CPU, RAM and storage components of Cloud Blox appliances into definable and configurable virtual resource groups that may be used to build multi-tenant, multi-site infrastructure as a service.

See the Sept. 2016 press release for more information.

The YBRC platform can accommodate restricted data. Please see this recent announcement for details.

Capabilities

The Yottabyte Research Cloud supports several existing and planned platforms for researchers at the University of Michigan:

  • Data Pipeline Tools, which include databases, message buses, data processing and storage solutions. This platform is suitable for restricted and unrestricted data.  These tools are currently available for users with unrestricted data.
  • Research Database Hosting, an environment that can house research-focused data stored in a number of different database engines.
  • Glovebox, a virtual desktop service for researchers who have restricted data and require higher security. (planned)
  • Virtual desktops for research. This service is similar to Glovebox but is suitable for unrestricted data. (planned)
  • Docker Container Service. This service can take any research application that can be containerized for deployment. This service will be suitable for restricted and unrestricted data (planned)

Researchers who need to use Hadoop or Spark for data-intensive work should explore ARC-TS’s separate Hadoop cluster.

Contact arcts-support@umich.edu for more information.

Hardware

The system deploys 40 high performance Hyperconverged YottaBlox nodes (H2400i-E5), each consisting of two, Intel Xeon E5-2680V4 CPU (1,120 cores total), 512GB DDR4 2400MHz RAM (20,480GB total), dual port 40GbE network adapters (80 total) and (2) 800GB NVMe SSD DC P3700 drives (64TB); and 20 storage YottaBlox nodes (S2400i-E5-HDD), each consisting of two, Intel Xeon E5-2620V4 CPU (320 cores total), 128 GB DDR4 2133MHz RAM (2,560 GB total), quad port 10GbE network adapters (80 total),  (2) 800 GB DC S3610 SSD (32 TB total) and 12 x 6 TB 7200 RPM (1,440TB total).

Access

These tools are offered to all researchers at the University of Michigan free of charge, provided that certain usage restrictions are not exceeded. Large-scale users who outgrow the no-cost allotment may purchase additional YBRC resources. All interested parties should contact arcts-support@umich.edu.

Sensitive Data

The U-M Research Ethics and Compliance webpage on Controlled Unclassified Information provides details on handling this type of data. The U-M Sensitive Data Guide to IT Services is a comprehensive guide to sensitive data.

Order Service

The Yottabyte Research Cloud is a pilot program available to all U-M researchers.

Access to Yottabyte Research Cloud resources involves a single email to us at arcts-support@umich.edu. Please include:

  • Your name or your advisor’s name
  • Your unit
  • What you would like to use YBRC for
  • Whether you plan to use restricted data.

Someone from your unit IT staff or an ARC-TS IT staff member will reach out to you and arrange details to determine the best path to make your request work within the Yottabyte Cloud environment.

General Questions

What is the Yottabyte Research Cloud?

The Yottabyte Research Cloud (YBRC) is the University’s private cloud environment for research.   It’s a collection of processors, memory, storage, and networking that can be subdivided into smaller units and allocated to research projects on an as-needed basis to be accessed by virtual machines and containers.

How do I get access to Yottabyte Research Cloud Resources?

Access to Yottabyte Research Cloud resources involves a single email to us at arcts-support@umich.edu. Please include:

  • Your name or your advisor’s name
  • Your unit
  • What you would like to use YBRC for
  • Whether you plan to use restricted data.

Someone from your unit IT staff or an ARC-TS IT staff member will reach out to you and arrange details to determine the best path to make your request work within the Yottabyte Cloud environment.  

What class of problems is Yottabyte Research Cloud designed to solve?

Yottabyte Research Cloud resources are aimed squarely at research and the teaching and training of students involved in research.  Primarily, Yottabyte resources are for sponsored research.  Yottabyte Research Cloud is not for administrative or clinical use (business of the university or the hospital).  Clinical research is acceptable as long as it is sponsored research.  

How large is the Yottabyte Research Cloud?

In total, Yottabyte Research Cloud (YBRC) has 960 processing cores for each Yottabyte cluster, 7.5 Terabytes, and roughly 330 TB of scratch storage available in Maize and Blue each.   

What does Maize Yottabyte Research Cloud and Blue Yottabyte Research Cloud stand for?

Yottabyte resources are divided up between two clusters of computing and storage.    Maize YBRC is for restricted data analyses and storage, and Blue YBRC is for unrestricted data analyses and storage.

What can I do with the Yottabyte Research Cloud?

The initial offering of YBRC is focused on a few different types of use cases:  

  1. Database hosting and data ingestion of streaming data from an external source into a database. We can host many types of databases within Yottabyte, including most structured and unstructured databases.  Examples include MariaDB, PostgreSQL, and MongoDB.
  2. Hosting for applications that you can’t host locally in your lab or you would like to connect to our HPC and data science clusters, such as Material Studio, Galaxy, and SAS Studio.
  3. Hosting of Virtual Desktops and Servers for restricted data use cases, such as statistical analysis of health data, or an analytical project for Controlled Unsecured Information (CUI).  Most people in this case may need a powerful workstation for SAS, Stata or R analyses, for example, or some other application.  

Are these the only things I can do with resources in the Yottabyte Research Cloud?

No!  Contact us at arcts-support@umich.edu if you want to learn whether or not your idea can be done within YBRC!  

How do I get help if I have an issue with something in Yottabyte?

The best way to get help is to send an email to arcts-support@umich.edu with a brief description of the issues that you are seeing.  

What are the support hours for the Yottabyte Research Cloud?

Yottabyte is supported between the hours of 9am to 5pm Monday through Friday.  Response times for support outside of these hours will be longer.

Usage Questions

What’s the biggest machine I can build within Yottabyte Research Cloud?

Because of the way that YBRC divides up resources, the largest Virtual Machine within the cluster is 16 processing cores, and 128 GB of RAM.  

How many Yottabyte Research Cloud resources am I able to access at no cost?

ARC-TS policy is to limit no-cost individual allocations to 100 cores, so that access is always open to multiple research groups.

What if I need more than the no-cost maximum?

If you need to use more than 100 cores of YBRC, we recommend that you purchase YBRC physical infrastructure of your own and add it to the cluster.  Physical infrastructure can be purchased in 96 physical core chunks, which can be oversubscribed as memory allows.  For every block purchased, the researcher will also receive 4 years of hardware and OS support for that block in the case of failure.  For a cost estimate of buying your own blocks of infrastructure and adding to the cluster, please email arcts-support@umich.edu.  

What is ‘scratch’ storage?

Scratch storage for Yottabyte Research Cloud is the storage area network that OS storage and active data storage on the local virtual machines that are not actively being backed up or replicated to a separate infrastructure.  Like the scratch storage on Flux, we don’t recommend storing any data solely on the local disk of any machines.  Make sure that you have backups on other machines, like Turbo, Locker, or some other service.  

HIPAA Compliance Questions

What can I do inside of an HIPAA network enclave?

For researchers with restricted data with a HIPAA classification, we provide a small menu of Linux and Windows workstations to be installed within your enclave.  We do not delegate administrative rights for those workstations to researchers or research staff.  We may delegate administrative rights for workstations and services in your enclaves to IT staff in your unit who have successfully completed the HIPAA IT training coursework given by ITS or HITS, and are familiar with desktop and virtual machine environments.  

Machines in the HIPAA network enclaves are encircled by a deny first firewall that prevents most traffic from entering the enclaves.  Researchers can still visit external-to-campus websites from within a HIPAA network enclave.  Researchers within a HIPAA network enclave can use storage services such as Turbo and MiStorage Silver (via CIFS) to host data for longer-term storage.

What are a researcher and research group responsibilities when they have HIPAA data within YBRC?

All researchers, staff, and students that use YBRC when analyzing restricted data have a shared responsibility in keeping their restricted data secure.

  • Researchers need to be aware of the personnel in their labs who have access to the data in their enclaves.  
    • Each lab should have a process for adding and removing users from enclaves that includes removing departed lab members from access to restricted data as soon as possible after they have left the lab.
    • Each lab should review who has access to their data and enclaves on a twice yearly basis via checking the memberships of their M-Community and Active Directory groups to ensure that people have been removed as requested.  
  • Each lab user must store their restricted data in a specific directory, as discussed during their introductory meeting with YBRC staff.  They must keep the data only in this directory over the life of the data on the system.  

CUI Compliance Questions

What can I do inside of a Secure Enclave Service CUI enclave?

Staff will work with researchers using CUI-classified data to determine the types of analysis that can be conducted on YBRC resources that comply with relevant regulations.

What are a researcher and research group responsibilities when they have CUI data within YBRC?

All researchers, staff, and students that use YBRC when analyzing restricted data have a shared responsibility in keeping their restricted data secure.

  • Researchers need to be aware of the personnel in their labs who have access to the data in their enclaves.  
    • Each lab should have a process for adding and removing users from enclaves that includes removing departed lab members from access to restricted data as soon as possible after they have left the lab.
    • Each lab should review who has access to their data and enclaves on a twice yearly basis via checking the memberships of their M-Community and Active Directory groups to ensure that people have been removed as requested.  
  • Each lab user must store their restricted data in a specific directory, as discussed during their introductory meeting with YBRC staff.  They must keep the data only in this directory over the life of the data on the system.  

ConFlux

By | Systems and Services

conflux1-300x260ConFlux is a cluster that seamlessly combines the computing power of HPC with the analytical power of data science. The next generation of computational physics requires HPC applications (running on external clusters) to interconnect with large data sets at run time. ConFlux provides low latency communications for in- and out- of-core data, cross-platform storage, as well as high throughput interconnects and massive memory allocations. The file-system and scheduler natively handle extreme-scale machine learning and traditional HPC modules in a tightly integrated work flow — rather than in segregated operations — leading to significantly lower latencies, fewer algorithmic barriers and less data movement.

The ConFlux cluster is built with ~58 IBM Power8 CPU two-socket “Firestone” S822LC compute nodes providing 20 cores in each.  Seventeen Power8 CPU two-socket “Garrison” S822LC compute nodes provide an additional 20 cores and host four NVIDIA Pascal GPUs connected via NVIDIA’s NVLink technology to the Power8 system bus.  Each GPU based node has a local high-speed NVMe flash memory for random access.

All compute and storage is connected via a 100 Gb/s InfiniBand fabric. The IBM and NVLink connectivity, combined with IBM CAPI Technology provide an unprecedented data transfer throughput required for the data-driven computational physics researchers will be conducting.

ConFlux is funded by a National Science Foundation grant; the Principal Investigator is Karthik Duraisamy, Assistant Professor of Aerospace Engineering and Director of the Center for Data-Driven Computational Physics (CDDCP). ConFlux and the CDDCP are under the auspices of the Michigan Institute for Computational Discovery and Engineering.

Order Service

A portion of the cycles on ConFlux will be available through a competitive application process. More information will be posted as it becomes available.

Banner image of Flux

Flux Hadoop Cluster

By | Systems and Services

hadoopThe Flux Hadoop Cluster is an upgraded Hadoop cluster currently available as a technology preview with no associated charges to U-M researchers. The cluster is an on-campus resource that provides a different service level than most cloud-based Hadoop offerings, including:

  • high-bandwidth data transfer to and from other campus data storage locations with no data transfer costs
  • very high-speed inter-node connections using 40Gb/s Ethernet.

The cluster provides 112TB of total usable disk space, 40GbE inter-node networking, Hadoop version 2.6.0, and several additional data science tools.

Aside from Hadoop and its Distributed File System, the ARC-TS data science service includes:

  • Pig, a high-level language that enables substantial parallelization, allowing the analysis of very large data sets.
  • Hive, data warehouse software that facilitates querying and managing large datasets residing in distributed storage using a SQL-like language called HiveQL.
  • Sqoop, a tool for transferring data between SQL databases and the Hadoop Distributed File System.
  • Rmr, an extension of the R Statistical Language to support distributed processing of large datasets stored in the Hadoop Distributed File System.
  • Spark, a general processing engine compatible with Hadoop data
  • mrjob, allows MapReduce jobs in Python to run on Hadoop

The software versions are as follows:

Title Version
Hadoop 2.6.0
Hive 1.1.0
Sqoop 1.4.6
Pig 0.12.0
R/rhdfs/rmr 3.0.3
Spark 1.6.0
mrjob 0.4.3-dev, commit

226a741548cf125ecfb549b7c50d52cda932d045

 

Order Service

Using the Flux Hadoop environment requires a Flux user account (available at no cost), but currently does not require a Flux allocation.

To order:

Email hpc-support@umich.edu.

For more information: data-science-support@umich.edu.

Related Events

There are no upcoming events at this time.