Boston
May 8-11, 2017

Event Details


When Dataverse Meets OpenStack...

Cloud Dataverse is a new service for accessing and processing public data sets in an OpenStack Cloud. It is based on Dataverse, a popular framework for sharing, preserving, and analyzing research data. Cloud Dataverse extends Dataverse to replicate datasets from per-institution repositories to a cloud-based repository and store data in Swift, enabling applications running in the cloud to access data in-situ. We use OpenStack Sahara to launch on-demand Big Data applications that use Swift as a datasource for analytics jobs running on Hadoop, Spark, or Pig.

We follow the user's journey through the Cloud Dataverse: browsing datasets, the harvesting/replication process, viewing files in the object store, and the use of compute provided by Sahara. To enhance user experience in Sahara, we plan to provide the automatic generation of default cluster templates via a new UI providing users with an option to bypass the complexity of Horizon.


What can I expect to learn?
  • The features of the existing Dataverse project
  • The relevant new functionality which allow the integration of Dataverse with OpenStack
  • The basics of OpenStack Sahara
Wednesday, May 10, 11:00am-11:40am
Difficulty Level: Intermediate
Technical Lead / Architect
Gustavo Durand works at Harvard University's Institute for Quantitative Social Science, as the Technical Lead and architect of the Dataverse application, an open source web application for publishing, citing, analyzing, and preserving data. He began his Java programming career at Cambridge Technology Partners in 1997 and has more than 20 total years experience as a software developer and... FULL PROFILE
Red Hat
Jeremy is an undergraduate student at Boston University. He represents a new wave of upstream OpenStack contributors, as a core contributor and reviewer for the Sahara project. He also works on the development of "Mix and Match", a service underpinning the Open Cloud eXchange model pioneered by the Massachusetts Open Cloud.  FULL PROFILE
Senior Software Developer
Leonid Andreev works at Harvard University's Institute for Quantitative Social Science, as the Senior Software Developer of Dataverse, an open source web application for publishing, citing, analyzing, and preserving data. He has more than 20 years experience as an application developer and systems programmer. He has worked on all layers application development, from front to back end, to... FULL PROFILE