Event Details

Please note: All times listed below are in Central Time Zone


Lessons Learned running Open Infrastructure on Baremetal Kubernetes Clusters in Production

Kubernetes is rapidly becoming the standard orchestration tool for declaratively managing open infrastructure.  Over the last two years, we have been running baremetal Kubernetes clusters in production that are running challenging containerized workloads including OpenStack itself.  We have upgraded these workloads and the Kubernetes infrastructure itself while maintaining these mission critical environments powering our 5G infrastructure. In this talk we will revisit some of the lessons learned in dealing various challenges along the way from upgrading Kubernetes and the unexpected fallouts that can occur when running complex workloads; docker stability and upgrades; CPU time stealing issues with real time workloads; CNI upgrades in running environments; debugging containerized neutron agents; and issues when workloads like OpenStack tap into functionality like hugepages, cpu pinning, and others that Kubernetes may not account for cleanly from release to release.  


What can I expect to learn?

In this talk, you will learn:


- How kubernetes has changed the way we think about open-infrastructure.

- What the challenges are to running a complex Open Infrastructure workload like OpenStack on Kubernetes in production.

- The reality of Kubernetes upgrades when workloads use features like hugepages and cpu pinning.

- How we try and avoid cascading failure.

- How a containerized OpenStack changes the way you debug OpenStack in production.

- The pros and cons of a containerized everything.

Monday, April 29, 11:10am-11:50am (5:10pm - 5:50pm UTC)
Difficulty Level: Advanced
None
Alan Meadows works as an Cloud Platform Architect at AT&T, responsible for designing, maintaining, and scaling Cloud infrastructure that spans hundreds of datacenters with mission critical telecom requirements. FULL PROFILE
AT&T: Lead Member of Technical Staff
Pete leads multiple areas of the Cloud Platforms organization, including serving as Lead Engineer for the Cloud Services of Network Cloud Product and provides strategic implementation decisions to ease the Public Cloud transition. He leads the Site Reliability Engineering, Infrastructure Virtualization, and Software-Defined Storage solutions. Additionally, he is also the lead implementation... FULL PROFILE