In most cloud deployments, a mix of cloud-native (cattle) and traditional (pet) workloads exist. Traditional, non-cloud workloads require infrastructure support to manage availability. Openstack Masakari project recovers pet workloads from hardware failures by automatically re-locating VMs to healthy nodes. It relies on infrastructure monitoring to detect failures e.g. pacemaker. Although workloads from a failed node are recovered, distributed consensus cluster needs reconfiguration. Cloud admin is expected to reconfigure node monitoring software and re-establish the consensus. Platform9 implemented a zero touch node monitoring solution with consul. This solution integrates with Masakari API to provides a completely automated HA for Openstack. In this talk, we present a deep dive on Masakari and automated cluster management for HA clusters.
Role of Virtual Machine HA in protecting workloads from failures. Common HA terminology such as MTBF, recovery time, etc. Architecture of Masakari and its features. Consul as distributed monitoring solution and automated zero touch reconfiguration solution with consul based distributed consensus.