Rootconf 2016

Rootconf is India's principal conference where systems and operations engineers share real world knowledge about building resilient and scalable systems.

Shailesh Hegde

@shlsh

Goblin - Automated Resiliency Testing

Submitted Jan 18, 2016

To discuss resiliency testing challenges in large scale cloud deployments and how to automate them (think Chaos Monkey, but with a few key differences).

Outline

This talk will cover the following:

  • What is resiliency of a large-scale distributed system ?
  • Challenges in resiliency testing of a large-scale distributed system which uses third party applications and protocols such as RabbitMQ/AMQP, Caching/NoSQL/Couchbase/Cassandra, service discovery/zookeeper, media (SIP, RTP, H323, PSTN, audio/video)
  • Gotcha! What you think won’t fail, but fails
  • Describe the Goblin framework (working to open source it in Q1 2016) that induces faults, runs tests, verifies results, recovers the system, all in a controlled manner
  • How to use Goblin for live group testing as well as nightly automated runs
  • Extending Goblin to other systems

Requirements

Working in Linux based cloud environments

Speaker bio

Currently working as a Lead QA engineer at BlueJeans Network. Part of the core team that built Goblin.

Slides

https://goo.gl/VdGhnR

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

We care about site reliability, cloud costs, security and data privacy