Explore ARCExplore ARC

Yottabyte Research Cloud certified for CUI data

By | Data, General Interest, Happenings, News

Advanced Research Computing – Technology Services (ARC-TS) is pleased to announce that the Yottabyte Research Cloud (YBRC) computing platform is now certified to accept data designated as Controlled Unclassified Information (CUI). This includes certification for YBRC and its associated services, enabling secure data analysis on Windows and Linux virtual desktops as well as secure hosting of databases and data ingestion.

For more information on CUI, see the U-M Research Ethics and Compliance CUI webpage and Sensitive Data Guide: Controlled Unclassified Information (CUI). CUI regulations apply to federal non-classified information requiring security controls; an example of CUI data often used in research is data from the Centers for Medicare and Medicaid Services.

The new capability ensures the security of CUI data through the creation of firewalled network enclaves, allowing CUI data to be analyzed safely and securely in YBRC’s flexible, robust and scalable environment.  Within each network enclave, researchers have access to Windows and Linux virtual desktops that can contain any software required for their analysis pipeline.

This capability also extends to our database and ingestion services:

  • Structured databases:  MySQL/MariaDB, and PostgreSQL.
  • Unstructured databases: Cassandra, MongoDB, InfluxDB, Grafana, and ElasticSearch.
  • Data ingestion: Redis, Kafka, RabbitMQ.
  • Data processing: Apache Flink, Apache Storm, Node.js and Apache NiFi.
  • Other data services are available upon request.

The CUI certification extends YBRC’s existing capabilities for handling sensitive data; the service can also take HIPAA data, Export Controlled REsearch (ITAR, EAR), Personally Identifiable Information, and more. Please see Sensitive Data Guide: Yottabyte Research Cloud for more information.

YBRC is supported by U-M’s Data Science Initiative launched in 2015 and was created through a partnership between Yottabyte and ARC-TS. These tools are offered to all researchers at the University of Michigan free of charge, provided that certain usage limits are not exceeded. Large-scale users who outgrow the no-cost allotment may purchase additional YBRC resources. All interested parties should contact hpc-support@umich.edu.

MIDAS Data Science for Music Challenge Initiative announces funded projects

By | Data, General Interest, Happenings, News, Research

From digital analysis of Bach sonatas to mining data from crowdsourced compositions, researchers at the University of Michigan are using modern big data techniques to transform how we understand, create and interact with music.

Four U-M research teams will receive support for projects that apply data science tools like machine learning and data mining to the study of music theory, performance, social media-based music making, and the connection between words and music. The funding is provided under the Data Science for Music Challenge Initiative through the Michigan Institute for Data Science (MIDAS).

“MIDAS is excited to catalyze innovative, interdisciplinary research at the intersection of data science and music,” said Alfred Hero, co-director of MIDAS and the John H. Holland Distinguished University Professor of Electrical Engineering and Computer Science. “The four proposals selected will apply and demonstrate some of the most powerful state-of-the-art machine learning and data mining methods to empirical music theory, automated musical accompaniment of text and data-driven analysis of music performance.”

Jason Corey, associate dean for graduate studies and research at the School of Music, Theatre & Dance, added: “These new collaborations between our music faculty and engineers, mathematicians and computer scientists will help broaden and deepen our understanding of the complexities of music composition and performance.”

The four projects represent the beginning of MIDAS’ support for the emerging Data Science for Music research. The long-term goal is to build a critical mass of interdisciplinary researchers for sustained development of this research area, which demonstrates the power of data science to transform traditional research disciplines.

Each project will receive $75,000 over a year. The projects are:

Understanding and Mining Patterns of Audience Engagement and Creative Collaboration in Large-Scale Crowdsourced Music Performances

Investigators: Danai Koutra and Walter Lasecki, both assistant professors of computer science and engineering

Summary: The project will develop a platform for crowdsourced music making and performance, and use data mining techniques to discover patterns in audience engagement and participation. The results can be applied to other interactive settings as well, including developing new educational tools.

Understanding How the Brain Processes Music Through the Bach Trio Sonatas
Investigators: Daniel Forger, professor of mathematics and computational medicine and bioinformatics; James Kibbie, professor and chair of organ and university organist

Summary: The project will develop and analyze a library of digitized performances of Bach’s Trio Sonatas, applying novel algorithms to study the music structure from a data science perspective. The team’s analysis will compare different performances to determine features that make performances artistic, as well as the common mistakes performers make. Findings will be integrated into courses both on organ performance and on data science.

The Sound of Text
Investigators: Rada Mihalcea, professor of electrical engineering and computer science; Anıl Çamcı, assistant professor of performing arts technology

Summary: The project will develop a data science framework that will connect language and music, developing tools that can produce musical interpretations of texts based on content and emotion. The resulting tool will be able to translate any text—poetry, prose, or even research papers—into music.

A Computational Study of Patterned Melodic Structures Across Musical Cultures
Investigators: Somangshu Mukherji, assistant professor of music theory; Xuanlong Nguyen, associate professor of statistics

Summary: This project will combine music theory and computational analysis to compare the melodies of music across six cultures—including Indian and Irish songs, as well as Bach and Mozart—to identify commonalities in how music is structured cross-culturally.

The Data Science for Music program is the fifth challenge initiative funded by MIDAS to promote innovation in data science and cross-disciplinary collaboration, while building on existing expertise of U-M researchers. The other four are focused on transportation, health sciences, social sciences and learning analytics.

Hero said the confluence of music and data science was a natural extension.

“The University of Michigan’s combined strengths in data science methodology and music makes us an ideal crucible for discovery and innovation at this intersection,” he said.

Contact: Dan Meisler, Communications Manager, Advanced Research Computing
734-764-7414, dmeisler@umich.edu

Interdisciplinary Committee on Organizational Studies (ICOS) Big Data Summer Camp, May 14-18

By | Data, Educational, General Interest, Happenings, News
Social and organizational life are increasingly conducted online through electronic media, from emails to Twitter feed to dating sites to GPS phone tracking. The traces these activities leave behind have acquired the (misleading) title of “big data.” Within a few years, a standard part of graduate training in the social sciences will include a hefty dose of “using of big data,” and we will all be utilizing terms like API and Python.
This year ICOS, MIDAS, and ARC are again offering a one-week “big data summer camp” for doctoral students interested in organizational research, with a combination of detailed examples from researchers; hands-on instruction in Python, SQL, and APIs; and group work to apply these ideas to organizational questions.  Enrollment is free, but students must commit to attending all day for each day of camp, and be willing to work in interdisciplinary groups.

The dates of the camp are all day May 14th-18th.

https://ttc.iss.lsa.umich.edu/ttc/sessions/interdisciplinary-committee-on-organizational-studies-icos-big-data-summer-camp-3/ 

CSCAR provides walk-in support for new Flux users

By | Data, Educational, Flux, General Interest, HPC, News

CSCAR now provides walk-in support during business hours for students, faculty, and staff seeking assistance in getting started with the Flux computing environment.  CSCAR consultants can walk a researcher through the steps of applying for a Flux account, installing and configuring a terminal client, connecting to Flux, basic SSH and Unix command line, and obtaining or accessing allocations.  

In addition to walk-in support, CSCAR has several staff consultants with expertise in advanced and high performance computing who can work with clients on a variety of topics such as installing, optimizing, and profiling code.  

Support via email is also provided via hpc-support@umich.edu.  

CSCAR is located in room 3550 of the Rackham Building (915 E. Washington St.). Walk-in hours are from 9 a.m. – 5 p.m., Monday through Friday, except for noon – 1 p.m. on Tuesdays.

See the CSCAR web site (cscar.research.umich.edu) for more information.

Info session: Consulting and computing resources for data science — Nov. 8

By | Data, Educational, Events, General Interest, Happenings, HPC

Advanced Research Computing at U-M (ARC) will host an information session for graduate students in all disciplines who are interested in new computing and data science resources and services available to U-M researchers.

Brief presentations from members of ARC Technology Services (ARC-TS) on computing infrastructure, and from Consulting for Statistics, Computing, and Analytics Research (CSCAR) on statistics, data science, and computing training and consulting will be followed by a Q&A session, and opportunities to interact individually with ARC and CSCAR staff.

ARC and CSCAR are interested in connecting with graduate students whose research would benefit from customized or innovative computational or analytic approaches, and can provide guidance for students aiming to do this. ARC and CSCAR are also interested in developing training and documentation materials for a diverse range of application areas, and would welcome input from student researchers on opportunities to tailor our training offerings to new areas.

Speakers:

  • Kerby Shedden, Director, CSCAR
  • Brock Palen, Director, ARC-TS

Date/Time/Location:

Wednesday, Nov. 8, 2017, 2 – 4 p.m., West Conference Room, 4th Floor, Rackham Building (915 E. Washington St.)

Add to Google Calendar

Real estate dataset available to researchers

By | Data, Data sets, Educational, General Interest, Happenings, News

The University of Michigan Library system and the Data Acquisition for Data Sciences program (DADS) of the U-M Data Science Initiative (DSI) have recently joined forces to license a major data resource capturing parcel-level information about the property market in the United States.  

The data were licensed from the Corelogic corporation, who have assimilated deed, tax and foreclosure information on nearly all properties in the entire US. Coverage dates vary by county, some county records go back fifty years. Coverage is more comprehensive from the 1990s to the present.

These data will support a variety of research efforts into regional economies, economic disparities, trends in land-use, housing market dynamics, and urban ecology, among many other areas.

The data are available on the Turbo Research Storage system for users of the U-M High Performance Computing infrastructure, and via the University of Michigan Library.

To access the data, researchers must first sign a MOU; contact Senior Associate Librarian Catherine Morse cmorse@umich.edu for more information, or visit https://www.lib.umich.edu/database/corelogic-parcel-level-real-estate-data.

Mini-course: Introduction to Python — Sept. 11-14

By | Data, Educational, Events, General Interest, News

Asst. Prof. Emanuel Gull, Physics, is offering a mini-course introducing the Python programming language in a four-lecture series. Beginners without any programming experience as well as programmers who usually use other languages (C, C++, Fortran, Java, …) are encouraged to come; no prior knowledge of programming languages is required!

For the first two lectures we will mostly follow the book Learning Python. This book is available at our library. An earlier edition (with small differences, equivalent for all practical purposes) is available as an e-book. The second week will introduce some useful python libraries: numpyscipymatplotlib.

At the end of the first two weeks you will know enough about Python to use it for your grad class homework and your research.

Special meeting place: we will meet in 340 West Hall on Monday September 11 at 5 PM.

Please bring a laptop computer along to follow the exercises!

Syllabus (Dates & Location for Fall 2017)

  1. Monday September 11 5:00 – 6:30 PM: Welcome & Getting Started (hello.py). Location: 340 West Hall
  2. Tuesday September 12 5:00 – 6:30 PM: Numbers, Strings, Lists, Dictionaries, Tuples, Functions, Modules, Control flow. Location: 335 West Hall
  3. Wednesday September 13 5:00 – 6:30 PM: Useful Python libraries (part I): numpy, scipy, matplotlib. Location: 335 West Hall
  4. Thursday September 14 5:00 – 6:30 PM: Useful Python libraries (part 2): 3d plotting in matplotlib and exercises. Location: 335 West Hall

For more information: https://sites.lsa.umich.edu/gull-lab/teaching/physics-514-fall-2017/introduction-to-python/

 

Flux HPC Blog: Querying data with SparkSQL

By | Data, General Interest, HPC, News

SparkSQL is a way for people to use SQL-like language to query their data with ease while taking advantage of the speed of Spark, a fast, general engine for data processing that runs over Hadoop. I wanted to test this out on a dataset I found from Walmart with their stores’ weekly sales numbers. I put the csv into our cluster’s HDFS (in /var/walmart) making it accessible to all Flux Hadoop users.

U-M, SJTU research teams share $1 million for data science projects

By | Data, General Interest, Happenings, News, Research

Five research teams from the University of Michigan and Shanghai Jiao Tong University in China are sharing $1 million to study data science and its impact on air quality, galaxy clusters, lightweight metals, financial trading and renewable energy.

Since 2009, the two universities have collaborated on a number of research projects that address challenges and opportunities in energy, biomedicine, nanotechnology and data science.

In the latest round of annual grants, the winning projects focus on data science and how it can be applied to chemistry and physics of the universe, as well as finance and economics.

For more, read the University Record article.

For descriptions of the research projects, see the MIDAS/SJTU partnership page.