hadoopFlux Hadoop Cluster is an upgraded Hadoop cluster currently available as a technology preview with no associated charges to U-M researchers. The cluster is an on-campus resource that provides a different service level than most cloud-based Hadoop offerings, including:

  • high-bandwidth data transfer to and from other campus data storage locations with no data transfer costs
  • very high-speed inter-node connections using 40Gb/s Ethernet.

The cluster provides 112TB of total usable disk space, 40GbE inter-node networking, Hadoop version 2.6.0, and several additional data science tools.

Aside from Hadoop and its Distributed File System, the ARC-TS data science service includes:

  • Pig, a high-level language that enables substantial parallelization, allowing the analysis of very large data sets.
  • Hive, data warehouse software that facilitates querying and managing large datasets residing in distributed storage using a SQL-like language called HiveQL.
  • Sqoop, a tool for transferring data between SQL databases and the Hadoop Distributed File System.
  • Rmr, an extension of the R Statistical Language to support distributed processing of large datasets stored in the Hadoop Distributed File System.
  • Spark, a general processing engine compatible with Hadoop data
  • mrjob, allows MapReduce jobs in Python to run on Hadoop

The software versions are as follows:

Title Version
Hadoop 2.6.0
Hive 1.1.0
Sqoop 1.4.6
Pig 0.12.0
R/rhdfs/rmr 3.0.3
Spark 1.6.0
mrjob 0.4.3-dev, commit

226a741548cf125ecfb549b7c50d52cda932d045

 

Order Service

Using the Flux Hadoop environment requires a Flux user account (available at no cost), but currently does not require a Flux allocation.

To order:

Email hpc-support@umich.edu.

For more information: data-science-support@umich.edu.

Related Events

December 1 @ 8:30 am - 5:30 pm

2017 U-M Data Science Research Forum

Forum Highlights Oral and poster presentations on Theoretical foundations of data science Data science methodology Data science applications in any research domain Social impact of data science research How to…