Rootconf 2017

On service reliability


How to make sure your Ops Engineer is not killed by overwork!

Submitted by Vachan D A (@vachanda) on Monday, 30 January 2017

Section: Crisp talk of 15 mins duration Technical level: Intermediate


Modern day SRE and DevOps teams have far more interesting problems to tackle than go through the same routine of mundane chores. By automating these routine tasks, internal stakeholders will be liberated from their dependence on the already understaffed DevOps and the latter will be free to tend to more challenging problems. This is a brief talk about how we tackled this internally.


  • Common challenges haunting Devops
    • Service provisioning dependence on Devops.
    • ad-hoc code being run to fix data issues.
    • Credential and identity creation for Active Directory based apps.
    • Frequent slow mysql queries being pushed in your apps in production.
    • Metric Monitoring alerting being off the charts.
    • Scaling a service to brace for traffic from marketing campaigns.
  • How can you tackle these:
    • Devops-dashboard - A Django powered application backed with a work-queue (for asynchronicity), which can tackle the issues which require accessing production instances for data.
    • Having a comprehensive and robust CI/CD setup can prevent the issue for new service provisioning.
      • CI/CD setup with jenkins pipelines, with cloudformation (for necessary infra provision), docker, ECS and etcd/vault (for secrets management)
    • Getting premonitions about an impending DDOS or any aberrant behaviour notification using trend based alerting with ELK and python.
      • Having real time flow and analysis of server/application logs using ELK setup, along with python programs to probe through the data to provide real timing alerting based on request trends (barrage of requests for a single endpoint from a specfic IP), unwanted intrusions (based on web server responses), non-standard requests, etc.
  • Q and A


Some knowledge of any web programming language and MVC frameworks.
Basic awareness of Ops and Business pain points in an org.

Speaker bio

Software Developer Engineer, Urban Ladder


  • Philip Paeps (@trouble) 2 years ago

    In order to evaluate your proposal for a full 40-minute slot, the editorial panel will need a much more detailed outline than this. It is not at all clear from your outline what you intend to talk about.

    How are you automating these things? What technologies are you using?

    Please tell us more.

    • Vachan D A (@vachanda) Proposer 2 years ago

      Hey @philip, have updated the outline based on your comments. Check and let me know.

      • Philip Paeps (@trouble) 2 years ago

        Thank you. This looks a lot more complete now.

        It’s safe to assume that Rootconf attendees will be familiar with common challenges haunting DevOps so they will be more interested in hearing about how you tackle them. Can you tell us more about this comprehensive CI/CD setup you are using and how you get these “premonitions”?

        • Vachan D A (@vachanda) Proposer 2 years ago

          @philip, have provided more information on the requested topics.

  • saurabh hirani (@saurabh-hirani) 2 years ago

    Given that you are targeting specific tools for these solutions - it would help if you have a github repo which the audience can reuse to try out your demo setups. They may be watered down versions of production setups, but it would be useful.

  • asian fanfics (@ducklife) a month ago

    You should work according to the schedule, that will not make you lose your temper.

Login with Twitter or Google to leave a comment