Monitoring and its application are becoming key factor for service lifecycle management of various systems such as NFV (5G/MEC) and cloud native platform. Distributed monitoring and analysis, we proposed at Boston Summit, was one of the framework which enables flexible and scalable monitoring that can work with current OpenStack telemetry and monitoring framework.
In this presentation, we will show the service lifecycle management, enlarging the scope of our framework to utilize monitoring data for failure recovery. The advantages of the framework are (1) fast fault detection by shorter interval monitoring (2) silent failure detection by machine learning with various types of metrics. The presentation also includes a demo, showing a workflow of quick service recovery, interworking among OpenStack, including Congress, Ceilometer and Aodh, collectd, scikit-learn and related OPNFV projects.
We will also cover our activity on community, OpenStack and OPNFV.
- Updates of our activity at Boston.
- Operation workflow for service lifecycle management: collecting -> detecting -> notifying -> recovering
- Architecture and advantages of DMA: distributed monitoring with fast detection
- Demo: 0.1 sec fault detection and lightweight machine learning and recovering process.
- Collaboration with OpenStack and OPNFV