Event Details

Please note: All times listed below are in Central Time Zone

<< Go back

RabbitMQ at Scale, Lessons Learned

Operations War Stories

Operating RabbitMQ at large scale comes with it's own set of challenges. This talk will take you on the journey Cisco faced with operating a large (800+ node) environment inside a single RabbitMQ cluster. We will share the pains, lessons learned and best practices to stabilize and improve messaging performance and reliability.

This talk includes:

OpenStack service configurations related to messaging
Kombu driver enhancements
Considerations when virtualizing the control plane, and how default network buffer settings can be insufficient.
RabbitMQ Erlang arguments related to TCP_USER_TIMEOUT and their impact
The overhead of Queue Mirroring
Kernel level network settings to improve RabbitMQ failover and provide faster service re-connect
Alerting and Monitoring RabbitMQ
Recovering from a cluster partition
Architectural decisions

What can I expect to learn?

Attendees will walk away with best practices and configurations they can make to improve the reliabilty and perfomance of messaging in OpenStack.

Wednesday, October 26, 12:15pm-12:55pm (10:15am - 10:55am UTC)

CCIB - Centre de Convencions Internacional de Barcelona - P1 - Room 112

View video

Difficulty Level: Intermediate

Tags: Enterprise Ops Operator Neutron oslo

Matthew Popow

Cisco

Matt is a Senior Engineer working for Cisco Cloud Services. Matt has experience working on OpenStack since the Grizzly release, and focuses on quality engineering, operations, and release management. FULL PROFILE

Wei Tie

Cisco Systems

Wei is a senior platform engineer at Cisco Cloud Service, working on global OpenStack based cloud build, operation and optimization. Wei started his journey with OpenStack from Essex release and is actively working on stabilizing Icehouse and Liberty platforms. FULL PROFILE

Weiguo Sun

Technical Lead

Weiguo has been working in IT industry for over two dacades. His past experience includes large scale database support on various unix platforms and high performance web farms. Since Grizzly and Havana releases, Weiguo has been a tech lead for the Cisco Cloud Services, focusing on the stability and scalability of Neutron / OVS / Rabbitmq and other backend services. FULL PROFILE

Event Details

Registration Opening Soon