The Yottabyte Research Cloud (YBRC) environment is capable of hosting many components of a scalable, on-demand data pipeline tools for research labs at U-M.  These capabilities allow researchers to custom build data pipelines that can ingest data from a variety of sources, process them using a message bus service, and store them in a variety of databases for later analysis.  Researchers have the choice of using any or all of our pipeline services, depending on their workflows.

Researchers have used our data pipeline services to stream remote sensor data into YBRC to a Redis service, use Node.js for filtering and data processing before storing the data in a MongoDB service for later analysis and retrieval.  

We can utilize most common software tools at each step, or we can work with you to configure a tool of your choice. To explore options, contact hpc-support@umich.edu. Capabilities available include:

  • Data ingestion components:  Redis, Kafka, RabbitMQ.
  • Data processing engines:  Apache Flink, Apache Storm, and Apache NiFi.
  • A variety of data stores and databases::
    • Structured databases: MySQL/MariaDB, and Postgres
    • NoSQL databases: Cassandra, MongoDB, InfluxDB, Grafana, and ElasticSearch

This diagram shows how data pipeline functions fit into the ARC data science offerings.

ybrc-pipeline