Barcelona, Spain
October 25-28, 2016

Event Details

Please note: All times listed below are in Central Time Zone

Sleep Better at Night: OpenStack Cloud Auto­-Healing

Software­-defined everything is a new trend. How about software­-defined outage prevention and remediation?

You have your cloud up and running. You monitor it through StackLight, Zabbix, Nagios or some other tool. But what's happening when one of the services is unresponsive or your free disk space is low? How quickly will you able to resolve the issue? Do you have any debugging information or logs gathered before you actually start digging into the issue?

We will introduce a “robo­sysadmin” for our production OpenStack cloud that reacts to alerts and outages and helps us to speed up mean time to repair by gathering debug information and trying to fix issues automatically using predefined workflows. It’s a kind of Tier 0 support: it troubleshoots, fixes known problems, escalates to humans when necessary, and provides detailed information on what it has discovered.

What can I expect to learn?

Attendees will learn about:

- How we monitor our multi-dc production cloud at Symantec.
- How we approached the problem of cloud auto­-healing
- Stackstorm and alternatives for automating prevention and remediation of outages
- Openstack auto­-healing workflows we created

Thursday, October 27, 9:00am-9:40am (7:00am - 7:40am UTC)
Difficulty Level: Intermediate
Mykyta Gubenko is a Infrastructure Engineer at Mirantis working for the Services department. As an experienced system engineer, he helps Mirantis customers to be successful with Openstack. Mykyta is focused on deployment automation and large-scale openstack projects. FULL PROFILE