OpenStack Presentation Voting

Help this presentation get to the OpenStack Summit!

OpenStack community members are voting on presentations to be presented at the OpenStack Summit, November 3-7, in Paris, France. We received hundreds of high-quality submissions, and your votes can help us determine which ones to include in the schedule.

"Targeting OpenStack Clouds"

Datalocality for Hadoop on Openstack

Apache Hadoop is an open source data processing framework that is usually deployed on bare-metal commodity servers. However recently, more Hadoop clusters are being deployed in cloud environments using virtual machines for a multitude of reasons amongst which the ease of deployment and scalability are the most prominent. Cloud environments offer several advantages over bare-metal ones but introduce their own set of challenges when dealing with Hadoop clusters. The main challenge here is Data-Locality, where in a cloud environment a virtual machine might get created on one physical host and its corresponding disks (volumes) might get created on a different physical host. This separation between the compute and the storage components for a virtual machine introduces delays and network congestion when a virtual machine tries to access its non local disks over the network. In this work we propose a solution for Data-Locality for Hadoop clusters deployed on OpenStack. Our solution uses the extensible scheduling frameworks in OpenStack Nova and OpenStack Cinder to select the best physical host for a virtual machine based on storage requirements and to ensure that any disks attached to the virtual machine are local disks. We'll also present how we used this solution within the OpenStack Sahara project, which makes Hadoop clusters provisioning on OpenStack easier and more efficient. By the way, this solution is not limited to Hadoop clusters, any cluster of machines with local disk access and performance needs could benefit from this solution, such as ElasticSearchCassandra, etc.

Speaker Bios

Yann Degat

After several years building web based solutions in various French companies, Yann Degat now works @Numergy, a French public cloud based on Openstack, to contribute on subjects around BigData and Paas in cloud computing environments.

Adrien Vergé

Adrien Vergé is an engineer who graduated from the École Polytechnique (France) in 2012. He has done research on tracing optimization on ARM systems at École Polytechnique Montréal (Canada), in the lab where the Linux Trace Toolkit (LTTng) was created. He has a patent pending for optimizing the Tor privacy-preserving network, based on a work with Technicolor in 2012. He has published on ARM code disassembly. He now works @Numergy, a French public cloud based on Openstack, to contribute on various subjects around cloud computing.

Serge Alexandre

Serge develops through his carrier high level of expertise building complex platform and software through Telecom card drivers to Java Software distribution platform. He also held senior management level in Softay, Devnet, ISDnet, ICT Software and BroadSoftware. He is one of the French UNIX/Java experts. Serge since now 2 years develops a Big Data as a service platform within VirtualScale, bringing virtualization expertise in Hadoop environnement.

Abbass Marouni

Abbass was one of the first lead member in Data Chanel network within Alcatel Lucent and R&D developer in Internet Memory Research one. He joined VirtualScale since in 2013, were he developed the Chef code and architecture for Hadoop in Openstack environnement.




Yann Degat


Adrien Vergé


Serge Alexandre

Engineering Director

Abbass Marouni

Software lead developer

Ready to vote on this presentation?

Create A New Account

Share and promote this presentation

Attend The Summit

Summit Registration

Full Summit Details