Boston
May 8-11, 2017

Event Details


Big Data as a Service at Mass Open Cloud

We describe the Massachusetts Open Cloud (MOC) Big Data as a Service (BDaaS) solution we built on top of OpenStack. BDaaS allow users access public data sets and stand up Hadoop and SPARK environments on-demand to work on these datasets. We use Cloud Dataverse, an open-source framework that can store data in Ceph, as our data repository. Ceph’s RADOS gateway (RGW) is used as a gateway between the Big Data analysis tasks and the Ceph storage service. To improve the performance of the Big Data environments, we modified RGW to cache data in SSDs attached to a server local to each rack. All requests for data are automatically directed by the network to the nearest RGW. Users can browse, investigate, and download datasets at MOC Dataverse and run analytics on any of the datasets by clicking a button to provision a Big Data processing environment. BDaaS will prefetch the data from Ceph into caches, and then invoke OpenStack Sahara to create the on-demand environment.


What can I expect to learn?

We describe the high-performance Big Data as a Service (BDaaS) framework we built on top of OpenStack for use in the Massachusetts Open Cloud.  Our BDaaS framework enables Hadoop and Spark Jobs to compute on large datasets on-demand without downloading them to local storage a priori.  Users can browse datasets, select relevant ones, and run analytics on them at the touch of a button.  It avoids slow accesses to remote storage by caching frequently-used datasets (or portions of datasets) in per-rack SSDs and re-directing requests to the closest one.  It uses Cloud DataVerse, an open-source framework that stores data in Ceph, as its data repository and implements the caching tier within Ceph’s RadoS gateway.

Wednesday, May 10, 11:50am-12:30pm
Difficulty Level: Beginner
MOC Research Scientist
Dr. Ata Turk is a research scientist in Massachusetts Open Cloud initiative and in the Electrical and Computer Engineering Department at Boston University. His research areas include cloud computing systems, bigdata analytics, energy efficiency, information retrieval, and mobile computing. Prior to joining MOC, Dr. Turk worked at Yahoo Labs, Barcelona, as a member of the Web retrieval... FULL PROFILE
Boston University, Visiting Scientist
Raja is a visiting scientist at Boston University, working at the Mass Open Cloud.  His work generally involves understanding how to automate operational tasks within and among clouds.  FULL PROFILE
Intel Corporation