Loading Events
  • This event has passed.

Intro to Data Analytics with PySpark on the Cavium-ThunderX Cluster

February 25, 2021 @ 1:00 pm - 3:30 pm

Your Desktop

OVERVIEW

This course will cover 4 areas:

– Overview of the Hadoop Distributed Filesystem (HDFS)
– PySpark vs Pandas (similarities and differences)
– Working with input/output (HDFS vs NFS) in PySpark
– Example analytic workflows (exercises)

INSTRUCTOR

Armand Burks
Research Data Scientist Intermediate
Information and Technology Services – Advanced Research Computing – Technology Services

Armand Burks, Ph.D., is a research data scientist intermediate for Advanced Research Computing – Technology Services (ARC-TS), a division of Information and Technology Services (ITS). Armand helps researchers with establishing data workflows, transforming data between different formats, programming support, optimizing/parallelizing code, cloud computing with Hadoop, and developing custom code (C++, Java, Python). He earned a B.S. in computer science from Alabama State University in 2008, an M.S. in computer science and engineering from Michigan State University in 2010, and a Ph.D. in computer science from Michigan State University in 2017.

MATERIALS

Prerequisites: Workshop participants should take the “Introduction to the Linux Command Line” workshop and already have basic programming experience with Python.

Click here for more information on The Cavium ThunderX Cluster

Click here to fill out an account request form
Note: 3 business days are needed for creation of accounts
Students should fill in “Workshop” in the “Advisor” section.


Campus VPN access is required for off-campus access but not from on campus. An SSH client, and Duo will be required during the workshop.  
If you do not have this software already, please download and install the Cisco AnyConnect VPN software following these instructions: https://its.umich.edu/enterprise/wifi-networks/vpn/getting-started You will need this to be able to use the ssh client. You will need to use the ‘Campus All traffic’ profile in the Cisco client.

A Zoom link will be provided to the participants the day before the class. Registration is required.

Please note, this session will be recorded.  

If you have questions about this workshop, please send an email to the instructor at arburks@umich.edu

Details

Date:
February 25, 2021
Time:
1:00 pm - 3:30 pm
Event Categories:
, , , , ,

Venue

Your Desktop

Other

Register
Presenter(s)
Armand Burks (arburks)