Goblin - Automated Resiliency Testing
Submitted by Shailesh Hegde (@shlsh) on Monday, 18 January 2016
To discuss resiliency testing challenges in large scale cloud deployments and how to automate them (think Chaos Monkey, but with a few key differences).
This talk will cover the following:
- What is resiliency of a large-scale distributed system ?
- Challenges in resiliency testing of a large-scale distributed system which uses third party applications and protocols such as RabbitMQ/AMQP, Caching/NoSQL/Couchbase/Cassandra, service discovery/zookeeper, media (SIP, RTP, H323, PSTN, audio/video)
- Gotcha! What you think won’t fail, but fails
- Describe the Goblin framework (working to open source it in Q1 2016) that induces faults, runs tests, verifies results, recovers the system, all in a controlled manner
- How to use Goblin for live group testing as well as nightly automated runs
- Extending Goblin to other systems
Working in Linux based cloud environments
Currently working as a Lead QA engineer at BlueJeans Network. Part of the core team that built Goblin.