Post deployment of OpenStack via Airship there is no mechanism to validate the resiliency of the deployed services. This is even bigger challenge in a more complex deployments which are bound by SLAs(5-9s uptime). Validating and continuously performing resiliency checks is a challenge.
What we need is tooling that can invoke the traffic to the Airship services and simultaneously launching a chaos-agent which induces failures against target services or random failure of services or power off one of the controller nodes.
Proposal here is to develop a standalone stateless automation utility to test various components using a configurable test-client based on yaml template which a torpedo metacontroller processes and generates an argo DAG template which would initiate the traffic, chaos and test analyser jobs against a target Airship environment.
This talk will
- Introduce Torpedo framework
- Different components of the framework
- How to add test cases to Torpedo
- How to run resiliency tests using Torpedo