Rootconf 2017

On service reliability

How to make sure your Ops Engineer is not killed by overwork!

Submitted by Vachan D A (@vachanda) on Monday, 30 January 2017


Technical level



Crisp talk of 15 mins duration



Vote on this proposal

Login to vote

Total votes:  +5


Modern day SRE and DevOps teams have far more interesting problems to tackle than go through the same routine of mundane chores. By automating these routine tasks, internal stakeholders will be liberated from their dependence on the already understaffed DevOps and the latter will be free to tend to more challenging problems. This is a brief talk about how we tackled this internally.


  • Common challenges haunting Devops
    • Service provisioning dependence on Devops.
    • ad-hoc code being run to fix data issues.
    • Credential and identity creation for Active Directory based apps.
    • Frequent slow mysql queries being pushed in your apps in production.
    • Metric Monitoring alerting being off the charts.
    • Scaling a service to brace for traffic from marketing campaigns.
  • How can you tackle these:
    • Devops-dashboard - A Django powered application backed with a work-queue (for asynchronicity), which can tackle the issues which require accessing production instances for data.
    • Having a comprehensive and robust CI/CD setup can prevent the issue for new service provisioning.
      • CI/CD setup with jenkins pipelines, with cloudformation (for necessary infra provision), docker, ECS and etcd/vault (for secrets management)
    • Getting premonitions about an impending DDOS or any aberrant behaviour notification using trend based alerting with ELK and python.
      • Having real time flow and analysis of server/application logs using ELK setup, along with python programs to probe through the data to provide real timing alerting based on request trends (barrage of requests for a single endpoint from a specfic IP), unwanted intrusions (based on web server responses), non-standard requests, etc.
  • Q and A


Some knowledge of any web programming language and MVC frameworks.
Basic awareness of Ops and Business pain points in an org.

Speaker bio

Software Developer Engineer, Urban Ladder


  • 1
    Philip Paeps (@trouble) Reviewer a year ago

    In order to evaluate your proposal for a full 40-minute slot, the editorial panel will need a much more detailed outline than this. It is not at all clear from your outline what you intend to talk about.

    How are you automating these things? What technologies are you using?

    Please tell us more.

    • 1
      Vachan D A (@vachanda) Proposer a year ago

      Hey @philip, have updated the outline based on your comments. Check and let me know.

      • 1
        Philip Paeps (@trouble) Reviewer a year ago

        Thank you. This looks a lot more complete now.

        It’s safe to assume that Rootconf attendees will be familiar with common challenges haunting DevOps so they will be more interested in hearing about how you tackle them. Can you tell us more about this comprehensive CI/CD setup you are using and how you get these “premonitions”?

        • 1
          Vachan D A (@vachanda) Proposer a year ago

          @philip, have provided more information on the requested topics.

  • 1
    saurabh hirani (@saurabh-hirani) a year ago

    Given that you are targeting specific tools for these solutions - it would help if you have a github repo which the audience can reuse to try out your demo setups. They may be watered down versions of production setups, but it would be useful.

Login with Twitter or Google to leave a comment