Event Details

Please note: All times listed below are in Central Time Zone

<< Go back

When Dataverse Meets OpenStack...

Big Data

Cloud Dataverse is a new service for accessing and processing public data sets in an OpenStack Cloud. It is based on Dataverse, a popular framework for sharing, preserving, and analyzing research data. Cloud Dataverse extends Dataverse to replicate datasets from per-institution repositories to a cloud-based repository and store data in Swift, enabling applications running in the cloud to access data in-situ. We use OpenStack Sahara to launch on-demand Big Data applications that use Swift as a datasource for analytics jobs running on Hadoop, Spark, or Pig.

We follow the user's journey through the Cloud Dataverse: browsing datasets, the harvesting/replication process, viewing files in the object store, and the use of compute provided by Sahara. To enhance user experience in Sahara, we plan to provide the automatic generation of default cluster templates via a new UI providing users with an option to bypass the complexity of Horizon.

What can I expect to learn?

The features of the existing Dataverse project
The relevant new functionality which allow the integration of Dataverse with OpenStack
The basics of OpenStack Sahara

Wednesday, May 10, 11:00am-11:40am (3:00pm - 3:40pm UTC)

Hynes Convention Center - Level Two - MR 207

View video

Difficulty Level: Intermediate

Tags: Sahara Swift Community Scientific UX Public Cloud

Gustavo Durand

Technical Lead / Architect

Gustavo Durand works at Harvard University's Institute for Quantitative Social Science, as the Technical Lead and architect of the Dataverse application, an open source web application for publishing, citing, analyzing, and preserving data. He began his Java programming career at Cambridge Technology Partners in 1997 and has more than 20 total years experience as a software developer and... FULL PROFILE

Jeremy Freudberg

Software Engineer

Jeremy is currently a software engineer at Red Hat, focused on OpenStack. He is the PTL of Sahara for the Train and Ussuri release cycles and has been a core contributor to that project since Pike. He gained his knowledge of OpenStack and cloud computing during many years at the Massachusetts Open Cloud. FULL PROFILE

Leonid Andreev

Senior Software Developer

Leonid Andreev works at Harvard University's Institute for Quantitative Social Science, as the Senior Software Developer of Dataverse, an open source web application for publishing, citing, analyzing, and preserving data. He has more than 20 years experience as an application developer and systems programmer. He has worked on all layers application development, from front to back end, to... FULL PROFILE