Nowadays, although more and more applications are beginning to cloud, some users' legacy services are still unreformed, and they still need HA capabilities to ensure the reliability of their applications. So how to build a reliable, flexible VM HA solution on OpenStack?
There are two main issues need to be addressed in VM HA range:
- How to prevent the split-brain problem?
- How to perceive the specific network plane failure of each host to perform a more appropriate recovery operation?
Based on the above concerns, we've developed a complete VM HA solution:
A Sanlock-based distributed lock manager has been developed to solve the first problem. And add Etcd to implement three physical network(management, storage, service) detection for each host. Once any network is interrupted, the HA-manager will detect the failure and then trigger a recovery operation based on the configured policy.
In addition, it also involves some HA-related functions, such queuing, retrying, and so on.
You can learn more about this solution during the session. This includes:
- How to achieve a lock-manager based on Sanlock;
- How to implement the host network fault awareness;
- How to realize HA queuing, HA retry mechanism, etc;
- Problem solving case in development;
- Performance tuning of the entire solution.