Event Details


Lessons Learned running Open Infrastructure on Baremetal Kubernetes Clusters in Production

Kubernetes is rapidly becoming the standard orchestration tool for declaratively managing open infrastructure.  Over the last two years, we have been running baremetal Kubernetes clusters in production that are running challenging containerized workloads including OpenStack itself.  We have upgraded these workloads and the Kubernetes infrastructure itself while maintaining these mission critical environments powering our 5G infrastructure. In this talk we will revisit some of the lessons learned in dealing various challenges along the way from upgrading Kubernetes and the unexpected fallouts that can occur when running complex workloads; docker stability and upgrades; CPU time stealing issues with real time workloads; CNI upgrades in running environments; debugging containerized neutron agents; and issues when workloads like OpenStack tap into functionality like hugepages, cpu pinning, and others that Kubernetes may not account for cleanly from release to release.  


What can I expect to learn?

In this talk, you will learn:


- How kubernetes has changed the way we think about open-infrastructure.

- What the challenges are to running a complex Open Infrastructure workload like OpenStack on Kubernetes in production.

- The reality of Kubernetes upgrades when workloads use features like hugepages and cpu pinning.

- How we try and avoid cascading failure.

- How a containerized OpenStack changes the way you debug OpenStack in production.

- The pros and cons of a containerized everything.

Monday, April 29, 11:10am-11:50am
Difficulty Level: Advanced
AT&T, LEAD-SYSTEM ARCHITECT
Alan Meadows works as an Cloud Platform Architect at AT&T, responsible for designing, maintaining, and scaling Cloud infrastructure that spans hundreds of datacenters with mission critical telecom requirements. FULL PROFILE
AT&T, Principal Member of Technical Staff
Pete Birley, is an Enterprise and Public Cloud Architect with a demonstrated history of large-scale DevOps, ALM and SRE project success. Currently he is the application lifecycle development lead for AT&T's Network Cloud and Project Team Lead for OpenStack-Helm, the project to manage the lifecycle of OpenStack Components and Open Infrastructure on Kubernetes, as well as a core reviewer... FULL PROFILE