Explore ARCExplore ARC

U-M partners with Cavium on Big Data computing platform

By | Feature, General Interest, Happenings, HPC, News

A new partnership between the University of Michigan and Cavium Inc., a San Jose-based provider of semiconductor products, will create a powerful new Big Data computing cluster available to all U-M researchers.

The $3.5 million ThunderX computing cluster will enable U-M researchers to, for example, process massive amounts of data generated by remote sensors in distributed manufacturing environments, or by test fleets of automated and connected vehicles.

The cluster will run the Hortonworks Data Platform providing Spark, Hadoop MapReduce and other tools for large-scale data processing.

“U-M scientists are conducting groundbreaking research in Big Data already, in areas like connected and automated transportation, learning analytics, precision medicine and social science. This partnership with Cavium will accelerate the pace of data-driven research and opening up new avenues of inquiry,” said Eric Michielssen, U-M associate vice president for advanced research computing and the Louise Ganiard Johnson Professor of Engineering in the Department of Electrical Engineering and Computer Science.

“I know from experience that U-M researchers are capable of amazing discoveries. Cavium is honored to help break new ground in Big Data research at one of the top universities in the world,” said Cavium founder and CEO Syed Ali, who received a master of science in electrical engineering from U-M in 1981.

Cavium Inc. is a leading provider of semiconductor products that enable secure and intelligent processing for enterprise, data center, wired and wireless networking. The new U-M system will use dual socket servers powered by Cavium’s ThunderX ARMv8-A workload optimized processors.

The ThunderX product family is Cavium’s 64-bit ARMv8-A server processor for next generation Data Center and Cloud applications, and features high performance custom cores, single and dual socket configurations, high memory bandwidth and large memory capacity.

Alec Gallimore, the Robert J. Vlasic Dean of Engineering at U-M, said the Cavium partnership represents a milestone in the development of the College of Engineering and the university.

“It is clear that the ability to rapidly gain insights into vast amounts of data is key to the next wave of engineering and science breakthroughs. Without a doubt, the Cavium platform will allow our faculty and researchers to harness the power of Big Data, both in the classroom and in their research,” said Gallimore, who is also the Richard F. and Eleanor A. Towner Professor, an Arthur F. Thurnau Professor, and a professor both of aerospace engineering and of applied physics.

Along with applications in fields like manufacturing and transportation, the platform will enable researchers in the social, health and information sciences to more easily mine large, structured and unstructured datasets. This will eventually allow, for example, researchers to discover correlations between health outcomes and disease outbreaks with information derived from socioeconomic, geospatial and environmental data streams.

U-M and Cavium chose to run the cluster on Hortonworks Data Platform, which is based on open source Apache Hadoop. The ThunderX cluster will deliver high performance computer services for the Hadoop analytics and, ultimately, a total of three petabytes of storage space.

“Hortonworks is excited to be a part of forward-leading research at the University of Michigan exploring low-powered, high-performance computing,” said Nadeem Asghar, vice president and global head of technical alliances at Hortonworks. “We see this as a great opportunity to further expand the platform and segment enablement for Hortonworks and the ARM community.”

Building a Community of Social Scientists with Big Data Skills: The ICOS Big Data Summer Camp

By | Educational, Feature, General Interest, News

As the use of data science techniques continues to grow across disciplines, a group of University of Michigan researchers are working to build a community of social scientists with skills in Big Data through a week-long summer camp for faculty and graduate students.

Having recently completed its fourth annual session, the Big Data Summer Camp held by the Interdisciplinary Committee for Organizational Studies (ICOS) trains approximately 50 people each spring in skills and methods such as Python, SQL, and social media APIs. The camp splits up into several groups to try to answer a research question using these newly acquired skills.

Working with researchers from other fields is a key component of the camp, and of creating a Big Data social science community, said co-coordinator Todd Schifeling, a Research Fellow at the Erb Institute in the School of Natural Resources and Environment.

“Students meet from across social science disciplines who wouldn’t meet otherwise,” said Schifeling. “And every year we bring back more and more past campers to present on what they’ve been doing.”

Schifeling himself participated in the camp as a student before taking on the role of coordinator this year.

Teddy DeWitt, the other co-coordinator of the camp and a doctoral student at the Ross School of Business, added the camp presents the curriculum in a unique way relative to the rest of campus.

“This set of material does not seem to be available in other parts of the university, at least … with an applied perspective in mind,” he said. “So we’re glad we have this set of resources that is both accessible and well-received by students.”

Participants range in skill from beginning to advanced, but even a relatively advanced student like Jeff Lockhart, a doctoral student in sociology and population studies who describes himself as “super-committed to computational social science,” said that it’s hard to find classes in computational methods in social science departments.

“[The ICOS camp] doesn’t expect a lot of prior knowledge, which I think is critical,” Lockhart said.

Lockhart, DeWitt, and Dylan Nelson, also a sociology doctoral student, are working on setting up a series of workshops in Computational Social Science for fall 2016 (contact Lockhart at jwlock@umich.edu for more information). Lockhart said it’s critical that social scientists learn Big Data skills.

“If we don’t have skills like this, there’s no way for us to enter into these fields of research that are going to be more and more important,” he said.

“A lot of the skills we’ve learned are sort of the on-ramp for doing data science,” DeWitt added.

The camp is co-sponsored by Advanced Research Computing (ARC).

New on-campus data-science and computational research services available

By | Feature, General Interest, News | No Comments

Researchers across campus now have access to several new services to help them navigate the new tools and methodologies emerging for data-intensive and computational research.

As part of the U-M Data Science Initiative announced in fall 2015, Consulting for Statistics, Computing and Analytics Research (CSCAR) is offering new and expanded services, including guidance on:

  • Research methodology for data science.
  • Large scale data processing using high performance computing systems.
  • Optimization of code and use of Flux and other advanced computing systems.
  • Advanced data management.
  • Geospatial data analyses.
  • Exploratory analysis and data visualization.
  • Obtaining licensed data from commercial sources.
  • Scraping, aggregating and integrating data from public sources.
  • Analysis of restricted data.

“With Big Data and computational simulations playing an ever-larger role in research in a variety of fields, it’s increasingly important to provide researchers with a comprehensive ecosystem of support and services that address those methodologies,” said CSCAR Director Kerby Shedden.

As part of this significant expansion of its scope, the campuswide statistical consulting service CSCAR has been renamed Consulting for Statistics, Computing and Analytics Research. It was formerly known as the Center for Statistical Consultation and Research.

For more information, see the University Record article.