Tag

data science

Using tweets to understand climate change sentiment

By | HPC, News, Research, Systems and Services

A team from Urban Sustainability Research Group of the School for Environment and Sustainability (UM-SEAS) has been studying public tweets to understand climate change and global warming attitudes in the U.S. 

Dimitris Gounaridis, is a fellow with the study. The team is mentored by Joshua Newell, and combines work about perceptions on climate change by Jianxun Yang and proprietary level vulnerability assessment by Wanja Waweru

“This research is timely and urgent. It helps us identify hazards, and elevated risks of flooding and heat, for socially vulnerable communities across the U.S. This risk is exacerbated especially for populations that do not believe climate change is happening,” Dimitris stated. 

The research team used a deep learning algorithm that is able to recognize text and predict whether the person tweeting believes in climate change or not. The algorithm analyzed a total of 7 million public tweets from a combination of datasets from a dataset called the U-M Twitter Decahose and the George Washington University Libraries Dataverse. This dataset consists of an historical archive of Decahose tweets and an ongoing collection from the Decahose. The current deep learning model has an 85% accuracy rate and is validated at multiple levels.

The map below shows the prediction of specific users that believe or are skeptical of climate change and global warming. Dimitris used geospatial modeling techniques to identify clusters of American skepticism and belief to create the map.

A map of the United States with blue and red dots indicating climate change acceptance.

(Image courtesy Dimitris Gounaridis.)

The tweet stream is sampled in real-time. Armand Burks, a research data scientist with ARC, wrote the Python code that is responsible for continuously collecting the data and storing it in Turbo Research Storage. He says that many researchers across the university are using this data for various research projects as well as classes. 

“We are seeing an increased demand for shared community data sets like the Decahose. ARC’s platforms like Turbo, ThunderX, and Great Lakes, hold and process that data, and our data scientists are available, in partnership with CSCAR, to assist in deriving meaning from such large data. 

“This is proving to be an effective way to combine compute services, methodology, and campus research mission leaders to make an impact quickly,” said Brock Palen, director of ARC.

In the future, Dimitris plans to refine the model to increase its accuracy, and then combine that with climate change vulnerability for flooding and heat stress.

“MIDAS is pleased that so many U-M faculty members are interested in using the Twitter Decahose. We currently have over 40 projects with faculty in the Schools of Information, Kinesiology, Social Work, and Public Health, as well as at Michigan Ross, the Ford School, LSA and more,” said H.V. Jagadish, MIDAS director and professor of Electrical Engineering and Computer Science

The Twitter Decahose is co-managed and supported by MIDAS, CSCAR, and ARC, and is available to all researchers without any additional charge. For questions about the Decahose, email Kristin Burgard, MIDAS outreach and partnership manager.

Understanding How the Brain Processes Music Through the Bach Trio Sonatas

By |

This event is open to the public.

Daniel Forger, Professor of Mathematics and Computational Medicine and Bioinformatics
James Kibbie, Professor of Music and Chair of the Organ Department, University Organist
Caleb Mayer, Graduate Student Research Assistant (Mathematics)
Sarah Simko, Graduate Student Research Assistant (Organ Performance)

With support from the Data Science for Music Challenge Initiative through MIDAS, the team is taking a big data approach to understanding the patterns and principles of music. The project is developing and analyzing a library of digitized performances of the Trio Sonatas for organ by Johann Sebastian Bach, applying novel algorithms to study the music structure from a data science perspective. Organ students from the School of Music, Theatre & Dance will demonstrate how the Frieze Memorial Organ in Hill Auditorium is used to create big data files of live performances. The team will discuss how its analysis compares different performances to determine features that make performances artistic, as well as the common mistakes performers make. The digitized performances will be shared with researchers and will enable research and pedagogy in many disciplines, including data science, music performance, mathematics and music psychology.

ARC-TS joins Cloud Native Computing Foundation

By | General Interest, Happenings, News

Advanced Research Computing – Technology Services (ARC-TS) at the University of Michigan has become the first U.S. academic institution to join the Cloud Native Computing Foundation (CNCF), a foundation that advances the development and use of cloud native applications and services. Founded in 2015, CNCF is part of the Linux Foundation.

CNCF announced ARC-TS’s membership at the KubeCon and CloudNativeCon event in Copenhagen. A video of the opening remarks by CNCF Executive Director Dan Kohn can be viewed on the event website.

“Our membership in the CNCF signals our commitment to bringing cloud computing and containers technology to researchers across campus,” said Brock Palen, Director of ARC-TS. “Kubernetes and other CNCF platforms are becoming crucial tools for advanced machine learning, pipelining, and other research methods. We also look forward to bring an academic perspective to the foundation.”

ARC-TS’s membership and participation in the group signals its adoption and commitment to cloud-native technologies and practices. Users of containers and other CNCF services will have access to experts in the field.

Membership gives the U-M research community input into in the continuing development of cloud-native applications, and within CNCF-managed and ancillary projects. U-M is the second academic institution to join the foundation, and the only one in the U.S.

U-M launches Data Science Master’s Program

By | Educational, General Interest, Happenings, News

The University of Michigan’s new, interdisciplinary Data Science Master’s Program is taking applications for its first group of students. The program is aimed at teaching participants how to extract useful knowledge from massive datasets using computational and statistical techniques.

The program is a collaboration between the College of Engineering (EECS), the College of Literature Science and the Arts (Statistics), the School of Public Health (Biostatistics), the School of Information, and the Michigan Institute for Data Science.

“We are very excited to be offering this unique collaborative program, which brings together expertise from four key disciplines at the University in a curriculum that is at the forefront of data science,” said HV Jagadish, Bernard A. Galler Collegiate Professor of Electrical Engineering and Computer Science, who chairs the program committee for the program.

“MIDAS was a catalyst in bringing  faculty from multiple disciplines together to work towards the development of this new degree program,”  he added.

MIDAS will provide students in this program with interdisciplinary collaborations, intellectual stimulation, exposure to a broad range of practice, networking opportunities, and space on Central Campus to meet for formal and informal gatherings.

For more information, see the program website at https://lsa.umich.edu/stats/masters_students/mastersprograms/data-science-masters-program.html, and the program guide (PDF) at https://lsa.umich.edu/content/dam/stats-assets/StatsPDF/MSDS-Program-Guide.pdf.

Applications are due March 15.

Hadoop and Spark Workshop

By |

Overview

Learn how to process large amounts (up to terabytes) of data using SQL and/or simple programming models available in Python, R, Scala, and Java. Computers will be provided to follow along with hands-on examples; users can also bring laptops.

Prerequisites

Intro to the Linux Command Line or equivalent. This course assumes familiarity with the Linux command line.

A user account on Flux. If you do not have a Flux user account, click here to go to the account application page at: https://arc-ts.umich.edu/fluxform/

Duo authentication.

Duo two-factor authentication is required to log in to the cluster. When logging in, you will need to type your UMICH password as well as authenticate through Duo in order to access Flux.

If you need to enroll in Duo, follow the instructions at Getting Started: How to Enroll in Duo.

click here to register

Instructor

Brock Palen
Director
ARC-TS

Brock has over 10 years of high performance computing and data intensive computing experience in an academic environment. He currently works with the team at ARC-TS to provide HPC, Data Science, storage, and other research computing services to the University. Brock also is the NSF XSEDE projects Campus Champion representing the schools to this and other national computing infrastructures and organizations.

Materials

Course Preparation

In order to participate successfully in the class exercises, you must have a Flux user account. The user account allows you to log in to the cluster, create, compile, and test applications, and transfer data into Hadoop’s filesystem for processing.

Flux user account

A single Flux user account can be used to prepare and submit jobs using various allocations. If you already already possess a user account, you can use it for this course, you can skip to “Flux allocation” below. If not, please visit https://arc-ts.umich.edu/fluxform to obtain one. A user account is free to members of the University community. Please note that obtaining an account requires human processing, so be sure to do this at least two business days before class begins.

Duo Authentication

Duo two-factor authentication is required to log in to the cluster. When logging in, you will need to type your UMICH password as well as authenticate through Duo in order to access Flux.

If you need to enroll in Duo, follow the instructions at Getting Started: How to Enroll in Duo.

More help

Please email hpc-support@umich.edu for questions, comments, or to seek further assistance.

Data Science Certificate Info Session

By |

DS Cert program info session on 2/16 in room 1180 at the Duderstadt Building from 5:30pm to 6:30pm. The

Come learn about the Graduate Certificate in Data Science:

The certificate is focused on developing core proficiencies in data analytics:
1) Modeling — Understanding of core data science principles, assumptions and applications;
2) Technology — Knowledge of basic protocols for data management, processing, computation, information extraction, and visualization;
3) Practice — Hands-on experience with real data, modeling tools, and technology resources.

Prof. Alfred Hero Distinguished Professor Lecture

By |

This lecture is presented by Alfred O. Hero in honor of being named the John H. Holland Distinguished University Professor of Electrical Engineering and Computer Science

 “Locating the nodes: from sensor arrays to genomic networks”

 Reception following

Abstract

Spatially distributed measurements have been used for hundreds of years to perform geolocation, geodesy and triangulation.  In WW1 acoustic sensor arrays were used to locate the direction of cannon fire based on correlation between sensor readings. Sensors in the Internet-of-Things (IoT) auto-locate their nodes  based on correlation between received pilot signals. In genomics influential nodes are located in transcriptional or lineage networks based on correlation between omic profiles. Whether the node is a target, a sensor, or a nucleotide sequence, the problem of node localization is of central interest in many disciplines of science and technology.  In this talk  I will provide perspectives on the general node localization problem, discuss solutions and algorithms,  and address future opportunities and challenges.

Bio

Alfred O. Hero III is the John H. Holland Distinguished University Professor of Electrical Engineering and Computer Science and the R. Jamison and Betty Williams Professor of Engineering. He is also the Co-Director of the University’s Michigan Institute for Data Science (MIDAS). He is also a professor of Biomedical Engineering and Statistics.

Hero’s recent research interests are in high dimensional spatio-temporal data, multi-modal data integration, statistical signal processing, and machine learning. Of particular interest are applications to social networks, network security and forensics, computer vision, and personalized health.

Hero received a B.S. (summa cum laude) from Boston University (1980) and a Ph.D from Princeton University (1984), both in Electrical Engineering. He joined the faculty of the University of Michigan in 1984. He received the University of Michigan Distinguished Faculty Achievement Award (2011), the Stephen S. Attwood Excellence in Engineering Award (2017), the IEEE Signal Processing Society Meritorious Service Award (1998), the IEEE Third Millenium Medal (2000), and the IEEE Signal Processing Society Technical Achievement Award (2014). In 2015 he received the IEEE Signal Processing Society Award, which is the highest career award bestowed by this Society. Hero was President of the IEEE Signal Processing Society (2006-2008) and was on the Board of Directors of the IEEE (2009-2011) where he served as Director of Division IX (Signals and Applications). He is a Fellow of the Institute of Electrical and Electronics Engineers (IEEE), and is chair of the Committee on Applied and Theoretical Statistics (CATS) of the US National Academies of Science.

Peers Health and U-M begin research partnership using disability and workers’ comp healthcare data

By | General Interest, Happenings, News, Research

Peers Health and the University of Michigan are starting a two-year research project that will apply advanced learning technologies to a proprietary global database of millions of de-identified disability and workers’ compensation cases. The goals of the project include developing a prescriptive modeling framework to facilitate development of optimal return-to-work plans for injured or ill patients.

Public policy experts have begun to connect patients’ ability to perform their productive endeavors, such as their job, to their state of general health and well-being. The findings from this project, by helping define when someone objectively has returned to health, could inform decision-making in virtually every healthcare episode.

The principal investigators in the project, Dr. Brian Denton and Dr. Jenna Wiens, are both renowned experts in medical machine learning. Dr. Denton, a professor of Industrial and Operations Engineering and Urology, and Dr. Wiens, an assistant professor of Computer Science and Engineering, are both affiliated with the Michigan Institute of Data Science (MIDAS) at U-M.

Peers Health recently announced an expanded partnership with ODG, an MCG company and part of the Hearst Health Network, to aggressively acquire new data to enhance ODG functionality and to fuel this research. Jon Seymour, MD, CEO of Peers, said, “This is a new phase in medical publishing where raw data collection is the editorial function and cutting-edge machine learning is the technology factor. We turned to the University of Michigan due to its impressive data science programs spanning multiple departments, as well as the specific experience of Dr. Denton and Dr. Wiens in medical applications. We’re confident this initiative will attract many new data contributors along the way.”

“The collaboration with Peers Health is exciting because it provides data that can help build a model that will reduce the time — from both a safety and productivity perspective — for people to return to work following sickness or injury,” Denton said. “Streaming data in from existing patients will allow our model to adapt and improve over time.”

Wiens added: “These data contain a particularly interesting training label: days away from work. We hypothesize that this will be a strong signal for the type, timing, and effectiveness of the treatments and therapies.”

The U-M partnership with Peers was established by MIDAS and the university’s Business Engagement Center (BEC).

“This partnership illustrates the power of combining data from the healthcare industry with the data science expertise of U-M faculty,” said Dr. Alfred Hero, professor of Engineering and co-director of MIDAS.

“It is energizing for the BEC to be part of these innovative collaborative relationships that create real impact in the world,” added BEC Director Amy Klinke.

 

Video available from MIDAS Research Forum

By | General Interest, Happenings, News, Research

Video is now available from the MIDAS Research Forum held Dec. 1 in the Michigan League at http://myumi.ch/6vA3V

The forum featured U-M students and faculty showcasing their data science research; a workshop on how to work with industry; presentations from student groups; and a summary of the data science consulting and infrastructure services available to the U-M research community.

NOTE: The keynote presentation from Christopher Rozell of the Georgia Institute of Technology will be available in the near future.

2017 U-M Data Science Research Forum

By |

Forum Highlights

  • Oral and poster presentations on
    • Theoretical foundations of data science
    • Data science methodology
    • Data science applications in any research domain
    • Social impact of data science research
  • Networking Reception

All presentations will come from submissions in response to our call for abstracts
Abstract Submission Deadline: October 23, 2017
We welcome submission from all U-M data science researchers (faculty, staff, trainees)

Please register for this event.  Please also see the call for abstracts for instruction, and submit through the Abstract Submission Form.

Preliminary Schedule