Data volume is growing at an unprecedented rate, and disaggregation of compute and storage is now commonplace.Ceph as one of the most popular object based storage systems, which provides block, file, and object in one single platform, and widely deployed in OpenStack based public and private clouds, and also supports the Amazon S3 API.This session will explore the motivations and benefits of running BigData analytics on Ceph object store, and presented an end to end BigData analytics on Ceph object store solution joint by Intel, Redhat and QCT. We will present a the architecture of bigdata analytics on Ceph data lake with different real workloads. We will also share tunings and optimization on the compute side, s3a file adaptors, and Ceph object storage side to improve the TPC-DS batch query performance by 3.42x and eventually is competitive with remote HDFS solutions .
Will learn how to run bigdata application(MR,Spark,Presto) on Ceph object store directly, how to evaluate performance of bigdata on Ceph object store, what's the performance characterizes and chanlleges in compute and storage seperate architecture, which deployment architecure and optimization meet your requirements, how to leverage cost and performance benifit