Consulting

By | Uncategorized

Advanced Research Computing – Technology Services (ARC-TS), a division of ITS, is pleased to offer a pilot Data Science Consulting service to help researchers implement data analytics and workflows within their research projects. This includes navigating technical resources like high-performance computing and storage.

The ARC-TS Data Consulting team will be your guide to navigating the complex technical world: from implementing intense data projects, to teaching you how the technical systems work, to assist in identifying proper tools, to guiding you on how to hire a programmer.

Areas of expertise:

  • Data Science
    • Data Workflows
    • Data Analytics
    • Machine Learning
    • Programming
  • Grant Proposals
    • Compute Technologies
    • Data Storage and Management
    • Budgeting cost for computing and storage
  • Scientific Computing/Programming
    • Getting started with advanced Computing
    • Code optimization
    • Parallel computing
    • GPU/Accelerator Programing
  • Additional Resources
    • Facilitating Collaborations/User Communities
    • Workshops and Training

Who can use this service?

  • All researchers and their collaborators from any of the university’s three campuses, including faculty, staff, and students
  • Units that want help including technical information when preparing grants
  • Anyone who has a need for HPC services and needs help navigating resources

How much does it cost?

  • Initial consultation, grant pre-work, and short term general guidance/feedback on methods and code are available at no cost.
  • For protracted longer engagements, research teams will be asked to contribute to the cost of providing the service.

Partnership

The ARC-TS Data Science Consulting Service works in partnership with the Consulting for Statistics, Computing, and Analytics Research team (CSCAR), Biomedical Research Core Facilities, and others. ARC-TS may refer or engage complimentary groups as required by the project.

Get started

Send an email to arcts-consulting@umich.edu with the following information:

  • Research topic and goal
  • What you would like ARC-TS to help you with
  • Any current or future data types and sources
  • Current technical resources
  • Current tools (programs, software)
  • Timeline – when do you need the help or information?

Get help

If you have any questions or wish to setup a consult, please contact us at arcts-consulting@umich.edu. Be sure to include as much information as possible from the “How to get started” section noted above.

Data Science

By | Uncategorized

Data Science Consulting Details

Data Workflows

We are available to assist researchers along the entire lifecycle of the data workflow, from the conceptual stage to ingest, preprocessing, cleansing, and storage solutions. We can advise in the following areas:

  • Establishing and troubleshooting dataflows between systems
  • Selecting the appropriate systems for short-term and long-term storage
  • Transformation of raw data into structured formats
  • Data deduplication and cleansing
  • Conversion of data between different formats to aide in analysis
  • Automation of dataflow tasks

Analytics

The data science consulting team can assist with data analytics to support research:

  • Choosing the appropriate tools and techniques for performing analysis
  • Development of data analytics in a variety of frameworks
  • Cloud-based (Hadoop) analytic development

Machine Learning

Machine learning is an application of artificial intelligence (AI) that focuses on the development of computer programs to learn information from data.

We are available to consult on the following. This includes a general overview of concepts, discussion into what tools and architectures best fit your needs, or technical support on implementation.

Languages Tools/Architectures Models
Python Python data tools (scikit, numpy, etc) Neural networks
C++ TensorFlow Decision trees
Java Jupyter notebooks Support vector machines
Matlab

Programming

We also provide consulting on programming in a variety of programming languages (including but not limited to: C++, Java, and Python) to support your data science needs. We can assist in algorithm design and implementation, as well as optimizing and parallelizing code to efficiently utilize high performance computing (HPC) resources where possible/necessary. We can help identify available commercial and open-source software packages to simplify your data analysis.

If you have any questions or wish to setup a consult please contact us at arcts-consulting@umich.edu

The ThunderX Cluster

By | Systems and Services

The ThunderX Hadoop cluster is a next-generation Hadoop cluster available to researchers at the University of Michigan. The ThunderX cluster is an on-campus resource that currently holds 3PB of storage for researchers to approach and analyze data science problems.

The cluster consists of 40 servers each containing 96 ARMv8 cores and 512GB of RAM per server. It is made possible through a partnership with Marvell.

The ThunderX platform is currently available as a pilot platform for researchers, with no associated charges. The cluster provides a different service level than most cloud-based Hadoop offerings, including:

  • high-bandwidth data transfer to and from other campus data storage locations with no data transfer costs
  • very high-speed inter-node connections using 40Gb/s Ethernet.

The cluster provides 1PB of total disk space, 40GbE inter-node networking, and Hadoop 2.7.2 with Spark 2 and Hive 2. For more information, contact arcts-support@umich.edu.

Order Service

To request an account, please fill out fill out this form completely to request an account, making sure to read and accept the terms of usage.

Yottabyte Research Cloud

By | Systems and Services

yb-logoThe Yottabyte Research Cloud is a partnership between ARC and Yottabyte that provides U-M researchers with high performance, secure and flexible computing environments enabling the analysis of sensitive data sets restricted by federal privacy laws, proprietary access agreements, or confidentiality requirements.

The system is built on Yottabyte’s composable, software-defined infrastructure platform, called Cloud Composer and represents U-M’s first use of software-defined infrastructure for research, allowing the on-the-fly personalized configuration of any-scale computing resources.

Cloud Composer software inventories the physical CPU, RAM and storage components of Cloud Blox appliances into definable and configurable virtual resource groups that may be used to build multi-tenant, multi-site infrastructure as a service.

See the September 2016 press release for more information.

The YBRC platform can accommodate sensitive institutional data classified up to High — including CUI — as identified in the Sensitive Data Guide.

Capabilities

The Yottabyte Research Cloud supports several existing and planned platforms for researchers at the University of Michigan:

  • Data Pipeline Tools, which include databases, message buses, data processing and storage solutions. This platform is suitable for sensitive institutional data classified up to High — including CUI, and data that is not classified as sensitive.
  • Research Database Hosting, an environment that can house research-focused data stored in a number of different database engines.
  • Glovebox, a virtual desktop service for researchers who have sensitive institutional data classified up to High — including CUI — and require higher security. (planned)
  • Virtual desktops for research. This service is similar to Glovebox but is suitable for data that is not classified as sensitive. (planned)
  • Docker Container Service. This service can take any research application that can be containerized for deployment. This service will be suitable for sensitive institutional data classified up to High — including CUI, and data that is not classified as sensitive. (planned)

Researchers who need to use Hadoop or Spark for data-intensive work should explore ARC-TS’s separate Hadoop cluster.

Contact arcts-support@umich.edu for more information.

Hardware

The system deploys 40 high performance Hyperconverged YottaBlox nodes (H2400i-E5), each consisting of two, Intel Xeon E5-2680V4 CPU (1,120 cores total), 512GB DDR4 2400MHz RAM (20,480GB total), dual port 40GbE network adapters (80 total) and (2) 800GB NVMe SSD DC P3700 drives (64TB); and 20 storage YottaBlox nodes (S2400i-E5-HDD), each consisting of two, Intel Xeon E5-2620V4 CPU (320 cores total), 128 GB DDR4 2133MHz RAM (2,560 GB total), quad port 10GbE network adapters (80 total),  (2) 800 GB DC S3610 SSD (32 TB total) and 12 x 6 TB 7200 RPM (1,440TB total).

Access

These tools are offered to all researchers at the University of Michigan free of charge, provided that certain usage restrictions are not exceeded. Large-scale users who outgrow the no-cost allotment may purchase additional YBRC resources. All interested parties should contact arcts-support@umich.edu.

Sensitive Data

The U-M Research Ethics and Compliance webpage on Controlled Unclassified Information provides details on handling this type of data. The U-M Sensitive Data Guide to IT Services is a comprehensive guide to sensitive data.

Order Service

The Yottabyte Research Cloud is a pilot program available to all U-M researchers.

Access to Yottabyte Research Cloud resources involves a single email to us at arcts-support@umich.edu. Please include:

  • Your name or your advisor’s name
  • Your unit
  • What you would like to use YBRC for
  • Whether you plan to use restricted data.

Someone from your unit IT staff or an ARC-TS IT staff member will reach out to you and arrange details to determine the best path to make your request work within the Yottabyte Cloud environment.

General Questions

What is the Yottabyte Research Cloud?

The Yottabyte Research Cloud (YBRC) is the University’s private cloud environment for research.   It’s a collection of processors, memory, storage, and networking that can be subdivided into smaller units and allocated to research projects on an as-needed basis to be accessed by virtual machines and containers.

How do I get access to Yottabyte Research Cloud Resources?

Access to Yottabyte Research Cloud resources involves a single email to us at arcts-support@umich.edu. Please include:

  • Your name or your advisor’s name
  • Your unit
  • What you would like to use YBRC for
  • Whether you plan to use restricted data.

Someone from your unit IT staff or an ARC-TS IT staff member will reach out to you and arrange details to determine the best path to make your request work within the Yottabyte Cloud environment.  

What class of problems is Yottabyte Research Cloud designed to solve?

Yottabyte Research Cloud resources are aimed squarely at research and the teaching and training of students involved in research. Primarily, Yottabyte resources are for sponsored research. Yottabyte Research Cloud is not for administrative or clinical use (business of the university or the hospital). Clinical research is acceptable as long as it is sponsored research.  

How large is the Yottabyte Research Cloud?

In total, Yottabyte Research Cloud (YBRC) has 960 processing cores for each Yottabyte cluster, 7.5 Terabytes, and roughly 330 TB of scratch storage available in Maize and Blue each.   

What does Maize Yottabyte Research Cloud and Blue Yottabyte Research Cloud stand for?

Yottabyte resources are divided up between two clusters of computing and storage.    Maize YBRC is for restricted data analyses and storage, and Blue YBRC is for unrestricted data analyses and storage.

What can I do with the Yottabyte Research Cloud?

The initial offering of YBRC is focused on a few different types of use cases:  

  1. Database hosting and data ingestion of streaming data from an external source into a database. We can host many types of databases within Yottabyte, including most structured and unstructured databases.  Examples include MariaDB, PostgreSQL, and MongoDB.
  2. Hosting for applications that you can’t host locally in your lab or you would like to connect to our HPC and data science clusters, such as Material Studio, Galaxy, and SAS Studio.
  3. Hosting of Virtual Desktops and Servers for restricted data use cases, such as statistical analysis of health data, or an analytical project for Controlled Unsecured Information (CUI).  Most people in this case may need a powerful workstation for SAS, Stata or R analyses, for example, or some other application.  

Are these the only things I can do with resources in the Yottabyte Research Cloud?

No!  Contact us at arcts-support@umich.edu if you want to learn whether or not your idea can be done within YBRC!  

How do I get help if I have an issue with something in Yottabyte?

The best way to get help is to send an email to arcts-support@umich.edu with a brief description of the issues that you are seeing.  

What are the support hours for the Yottabyte Research Cloud?

Yottabyte is supported between the hours of 9am to 5pm Monday through Friday.  Response times for support outside of these hours will be longer.

Usage Questions

What’s the biggest machine I can build within Yottabyte Research Cloud?

Because of the way that YBRC divides up resources, the largest Virtual Machine within the cluster is 12 processing cores, and 96 GB of RAM.  

How many Yottabyte Research Cloud resources am I able to access at no cost?

ARC-TS policy is to limit no-cost individual allocations to 100 cores, so that access is always open to multiple research groups.

What if I need more than the no-cost maximum?

If you need to use more than 100 cores of YBRC, we recommend that you purchase YBRC physical infrastructure of your own and add it to the cluster.  Physical infrastructure can be purchased in 96 physical core chunks, which can be oversubscribed as memory allows.  For every block purchased, the researcher will also receive 4 years of hardware and OS support for that block in the case of failure.  For a cost estimate of buying your own blocks of infrastructure and adding to the cluster, please email arcts-support@umich.edu.  

What is ‘scratch’ storage?

Scratch storage for Yottabyte Research Cloud is the storage area network that OS storage and active data storage on the local virtual machines that are not actively being backed up or replicated to a separate infrastructure.  Like the scratch storage on Flux, we don’t recommend storing any data solely on the local disk of any machines.  Make sure that you have backups on other machines, like Turbo, Locker, or some other service.  

HIPAA Compliance Questions

What can I do inside of an HIPAA network enclave?

For researchers with restricted data with a HIPAA classification, we provide a small menu of Linux and Windows workstations to be installed within your enclave.  We do not delegate administrative rights for those workstations to researchers or research staff.  We may delegate administrative rights for workstations and services in your enclaves to IT staff in your unit who have successfully completed the HIPAA IT training coursework given by ITS or HITS, and are familiar with desktop and virtual machine environments.  

Machines in the HIPAA network enclaves are encircled by a deny first firewall that prevents most traffic from entering the enclaves.  Researchers can still visit external-to-campus websites from within a HIPAA network enclave.  Researchers within a HIPAA network enclave can use storage services such as Turbo and MiStorage Silver (via CIFS) to host data for longer-term storage.

What are a researcher and research group responsibilities when they have HIPAA data within YBRC?

All researchers, staff, and students that use YBRC when analyzing restricted data have a shared responsibility in keeping their restricted data secure.

  • Researchers need to be aware of the personnel in their labs who have access to the data in their enclaves.  
    • Each lab should have a process for adding and removing users from enclaves that includes removing departed lab members from access to restricted data as soon as possible after they have left the lab.
    • Each lab should review who has access to their data and enclaves on a twice yearly basis via checking the memberships of their M-Community and Active Directory groups to ensure that people have been removed as requested.  
  • Each lab user must store their restricted data in a specific directory, as discussed during their introductory meeting with YBRC staff.  They must keep the data only in this directory over the life of the data on the system.  

CUI Compliance Questions

What can I do inside of a Secure Enclave Service CUI enclave?

Staff will work with researchers using CUI-classified data to determine the types of analysis that can be conducted on YBRC resources that comply with relevant regulations.

What are a researcher and research group responsibilities when they have CUI data within YBRC?

All researchers, staff, and students that use YBRC when analyzing restricted data have a shared responsibility in keeping their restricted data secure.

  • Researchers need to be aware of the personnel in their labs who have access to the data in their enclaves.  
    • Each lab should have a process for adding and removing users from enclaves that includes removing departed lab members from access to restricted data as soon as possible after they have left the lab.
    • Each lab should review who has access to their data and enclaves on a twice yearly basis via checking the memberships of their M-Community and Active Directory groups to ensure that people have been removed as requested.  
  • Each lab user must store their restricted data in a specific directory, as discussed during their introductory meeting with YBRC staff.  They must keep the data only in this directory over the life of the data on the system.  

ConFlux

By | Systems and Services

conflux1-300x260ConFlux is a cluster that seamlessly combines the computing power of HPC with the analytical power of data science. The next generation of computational physics requires HPC applications (running on external clusters) to interconnect with large data sets at run time. ConFlux provides low latency communications for in- and out- of-core data, cross-platform storage, as well as high throughput interconnects and massive memory allocations. The file-system and scheduler natively handle extreme-scale machine learning and traditional HPC modules in a tightly integrated work flow — rather than in segregated operations — leading to significantly lower latencies, fewer algorithmic barriers and less data movement.

The ConFlux cluster is built with ~58 IBM Power8 CPU two-socket “Firestone” S822LC compute nodes providing 20 cores in each.  Seventeen Power8 CPU two-socket “Garrison” S822LC compute nodes provide an additional 20 cores and host four NVIDIA Pascal GPUs connected via NVIDIA’s NVLink technology to the Power8 system bus. Each GPU based node has a local high-speed NVMe flash memory for random access.

All compute and storage is connected via a 100Gb/s InfiniBand fabric. The IBM and NVLink connectivity, combined with IBM CAPI Technology provide an unprecedented data transfer throughput required for the data-driven computational physics researchers will be conducting.

ConFlux is funded by a National Science Foundation grant; the Principal Investigator is Karthik Duraisamy, Assistant Professor of Aerospace Engineering and Director of the Center for Data-Driven Computational Physics (CDDCP). ConFlux and the CDDCP are under the auspices of the Michigan Institute for Computational Discovery and Engineering.

Order Service

A portion of the cycles on ConFlux will be available through a competitive application process. More information will be posted as it becomes available.