Boston
May 8-11, 2017

Event Details

Please note: All times listed below are in Central Time Zone


Introducing Swifta: A Performant Hadoop File System Driver for OpenStack Swift

The Hadoop file system driver for Swift is increasingly important as Hadoop ecosystem deployments on OpenStack grow in scale and demand ever-increasing performance. After several months of experience using the OpenStack Sahara-extra file system driver with Hive, Spark, and Presto, we realized that a clean slate redesign would allow us to address threading and data management considerations that significantly impact performance.

We developed a new Swift file system driver, “Swifta”, featuring thread pools, lazy seeks, caching of identical requests, object listing imrovements etc. We tested our implementation against on-premise Ceph object storage, successfully running large queries that otherwise simply failed, and running other queries with substantial performance improvement.

This presentation will discuss the design, implementation, deployment, and performance characteristics of Swifta, which we plan to open source this year, and its use in our cloud-based big data environment.


What can I expect to learn?

Attendees will learn about our months-long experience using the currently-available Swift driver, our motivations for revisiting the overall architecture rather than continuing to make incremental changes to the existing implementation, and performance characteristics of our implementation.

Tuesday, May 9, 4:40pm-5:20pm (8:40pm - 9:20pm UTC)
Difficulty Level: Intermediate
Tags: Sahara Swift
Senior Technical Product Manager
On the Walmart Global eCommerce big data team, Andy guides the evaluation, development, and internal rollout of new technology supporting data scientists and engineers across the company. He received a B.S. in computer networks and information systems from Wentworth Institute of Technology. FULL PROFILE
Senior Manager, Big Fast Data Technology at Walmartlabs
Mengmeng Liu works in the area of big data systems at Walmartlabs. Over the past several years, Mengmeng has worked on a number of open source projects, including Hive, Spark, Hadoop and most recently, supporting big data initiatives to the Cloud. Together with colleagues at Walmartlabs, Mengmeng is passionate about making big data systems more efficient and easier to use,... FULL PROFILE
Staff Software Engineer
I am a staff engineer in BFD team. I have been working on big data related technologies for a few years, which include cloud-based big data system development and performance tuning, the design and implementation of the swifta driver for the Ceph object storage, the design and implementation of boo for OneOps, Hadoop performance tuning, name node metadata redesign, spark, presto, and hive, etc. FULL PROFILE