SparkSQL is a way for people to use SQL-like language to query their data with ease while taking advantage of the speed of Spark, a fast, general engine for data processing that runs over Hadoop. I wanted to test this out on a dataset I found from Walmart with their stores’ weekly sales numbers. I put the csv into our cluster’s HDFS (in /var/walmart) making it accessible to all Flux Hadoop users.
Five research teams from the University of Michigan and Shanghai Jiao Tong University in China are sharing $1 million to study data science and its impact on air quality, galaxy clusters, lightweight metals, financial trading and renewable energy.
Since 2009, the two universities have collaborated on a number of research projects that address challenges and opportunities in energy, biomedicine, nanotechnology and data science.
In the latest round of annual grants, the winning projects focus on data science and how it can be applied to chemistry and physics of the universe, as well as finance and economics.
For more, read the University Record article.
For descriptions of the research projects, see the MIDAS/SJTU partnership page.