Additional statistics facilitate more resilient and performant telco/NFV clouds.
It is vital to monitor systems for malfunctions that could lead to users' application service disruption and promptly react to these fault events to facilitate improving overall system performance.
By providing OpenStack with system statistics from collectd, more data is available, which can be used for monitoring, performance analysis, fault detection, etc. using OPNFV Doctor-prescribed enhancements to OpenStack, action can then be taken to negate the effects of any faults in the deployment.
Gaps have been identified and work to improve OpenStack to enable a more fault tolerant cloud environment is well underway. A key part of this work includes expanding the amount of data available about the system (e.g. DPDK and Open vSwitch statistics) and improving alarming functionality in OpenStack Aodh. This presentation is a follow-up of the last OpenStack summit in Austin.
The audience is anyone interested in learning about:
- Service Assurance in OpenStack, leveraging extensions to DPDK, collectd and OpenStack to support monitoring of resources in the NFV infrastructure including:
- Exposing the KPIs, monitoring & packet processing paths in DPDK to OpenStack.
- Unlocking additional statistics by exposing collectd metrics to Ceilometer through collectd plug-ins, and enhancing the path between these two applications (leveraging Gnocchi and Aodh).
- Benchmarking of these features to improve latency
- Notification of underlying network failures to Neutron and update of tenants' resource states & failover action enhancements undertaken by the Doctor project to build an NFVI fault management/maintenance framework that supports Network Services high availability on top of the virtualized infrastructure
- Continuing effort, new use cases and how the community is committed to improving OpenStack