Please note: All times listed below are in Central Time Zone
Operating RabbitMQ at large scale comes with it's own set of challenges. This talk will take you on the journey Cisco faced with operating a large (800+ node) environment inside a single RabbitMQ cluster. We will share the pains, lessons learned and best practices to stabilize and improve messaging performance and reliability.
This talk includes:
- OpenStack service configurations related to messaging
- Kombu driver enhancements
- Considerations when virtualizing the control plane, and how default network buffer settings can be insufficient.
- RabbitMQ Erlang arguments related to TCP_USER_TIMEOUT and their impact
- The overhead of Queue Mirroring
- Kernel level network settings to improve RabbitMQ failover and provide faster service re-connect
- Alerting and Monitoring RabbitMQ
- Recovering from a cluster partition
- Architectural decisions
Attendees will walk away with best practices and configurations they can make to improve the reliabilty and perfomance of messaging in OpenStack.