Loading Events
  • This event has passed.

Hadoop and Spark Workshop

October 31, 2017 @ 1:00 pm - 5:00 pm

East Hall B254

Overview

Learn how to process large amounts (up to terabytes) of data using SQL and/or simple programming models available in Python, Scala, and Java. Computers will be provided to follow along with hands-on examples; users can also bring laptops.

Prerequisites

Intro to the Linux Command Line or equivalent. This course assumes familiarity with the Linux command line.

A user account on Flux. If you do not have a Flux user account, click here to go to the account application page at: https://arc-ts.umich.edu/fluxform/

Duo authentication.

Duo two-factor authentication is required to log in to the cluster. When logging in, you will need to type your UMICH password as well as authenticate through Duo in order to access Flux.

If you need to enroll in Duo, follow the instructions at Getting Started: How to Enroll in Duo.

Hadoop queue membership. If you did not ask to be put on the training Hadoop queue when applying for a Flux user account, send an email to hpc-support@umich.edu asking to be put on the training queue.

click here to register

Instructor

Brock Palen
Associate Director
ARC-TS

Brock has over 10 years of high performance computing and data intensive computing experience in an academic environment. He currently works with the team at ARC-TS to provide HPC, Data Science, storage, and other research computing services to the University. Brock also is the NSF XSEDE projects Campus Champion representing the schools to this and other national computing infrastructures and organizations.

Materials

Course Preparation

In order to participate successfully in the class exercises, you must have a Flux user account, an MToken, and be added to a Hadoop queue. The user account allows you to log in to the cluster, create, compile, and test applications, and transfer data into Hadoop’s filesystem for processing. The Hadoop queue allows you to submit those jobs, executing those applications in parallel on the cluster.

Flux user account

A single Flux user account can be used to prepare and submit jobs using various allocations. If you already already possess a user account, you can use it for this course, you can skip to “Flux allocation” below. If not, please visit https://arc-ts.umich.edu/fluxform to obtain one. A user account is free to members of the University community. Please note that obtaining an account requires human processing, so be sure to do this at least two business days before class begins.

Hadoop queue

We’ll add you to the training queue so you can run jobs on the cluster during the course. If you already have an existing Hadoop queue, you can use that as well, if you like.

Duo Authentication

Duo two-factor authentication is required to log in to the cluster. When logging in, you will need to type your UMICH password as well as authenticate through Duo in order to access Flux.

If you need to enroll in Duo, follow the instructions at Getting Started: How to Enroll in Duo.

More help

Please email hpc-support@umich.edu for questions, comments, or to seek further assistance.

Details

Date:
October 31, 2017
Time:
1:00 pm - 5:00 pm
Event Categories:
,
Event Tags:
,

Venue

East Hall B254
530 Church St.
Ann Arbor, MI 48109 United States

Organizer

Other

Register