Event Details

Please note: All times listed below are in Central Time Zone

<< Go back

Unlock bigdata analytic efficiency with Ceph data lake

Private & Hybrid Cloud

Data volume is growing at an unprecedented rate, and disaggregation of compute and storage is now commonplace.Ceph as one of the most popular object based storage systems, which provides block, file, and object in one single platform, and widely deployed in OpenStack based public and private clouds, and also supports the Amazon S3 API.This session will explore the motivations and benefits of running BigData analytics on Ceph object store, and presented an end to end BigData analytics on Ceph object store solution joint by Intel, Redhat and QCT. We will present a the architecture of bigdata analytics on Ceph data lake with different real workloads. We will also share tunings and optimization on the compute side, s3a file adaptors, and Ceph object storage side to improve the TPC-DS batch query performance by 3.42x and eventually is competitive with remote HDFS solutions .

What can I expect to learn?

Will learn how to run bigdata application(MR,Spark,Presto) on Ceph object store directly, how to evaluate performance of bigdata on Ceph object store, what's the performance characterizes and chanlleges in compute and storage seperate architecture, which deployment architecure and optimization meet your requirements, how to leverage cost and performance benifit

Monday, May 21, 4:20pm-5:00pm (11:20pm - 12:00am UTC)

Vancouver Convention Centre West - Level One - Room 118-120

Slides: Unlock bigdata analytic efficiency with Ceph data lake

View video

Difficulty Level: Advanced

Tags: Arch / Ops Ceph Spark Swift

Yong Fu

Senior Software Engineer

Yong Fu is a software engineer of the Cloud Storage Engineer group from Intel Asia Pacific Research & Development Ltd. Work with ISV and Open Source software community closely to ensure their software are optimized for Intel platforms, it may involve any layer of the solution stack such as storage, virtualization and container etc. And be responsible for selecting the workloads, tuning and... FULL PROFILE

Jian Zhang

Software Engineer Manager

Jian Zhang is a senior software engineer manager at Intel, he and his team primarily focused on Open Source Storage development and optimizations on Intel platforms, and build reference solutions for customers. He has 10 years of experiences on performance analysis and optimization for many open source projects like Xen, KVM, Swift and Ceph, HDFS and benchmarking workloads like SPEC-*,... FULL PROFILE

Yuan Zhou

INTC

Yuan Zhou is a Senior Software Development Engineer in the Software and Service Group for Intel Corporation, working in the System Technology Optimization team primarily focused on Cloud Storage Software. He has been working in Databases, Virtualization and Cloud computing for most of his 7+ year career at Intel. FULL PROFILE

Kyle Bader

Red Hat, Inc

Kyle Bader is a Senior Solution Architect working in the Storage Solutions Team at Red Hat, lending his design and operational skills with Ceph to help develop tested solutions that ensure repeatable success when deploying distributed, fault-tolerent, multi-petabyte storage systems. Prior to Red Hat, Kyle had architectural roles at both Inktank and DreamHost. Kyle was part of the team that... FULL PROFILE