How to make sure your Ops Engineer is not killed by overwork!
Modern day SRE and DevOps teams have far more interesting problems to tackle than go through the same routine of mundane chores. By automating these routine tasks, internal stakeholders will be liberated from their dependence on the already understaffed DevOps and the latter will be free to tend to more challenging problems. This is a brief talk about how we tackled this internally.
- Common challenges haunting Devops
- Service provisioning dependence on Devops.
- ad-hoc code being run to fix data issues.
- Credential and identity creation for Active Directory based apps.
- Frequent slow mysql queries being pushed in your apps in production.
- Metric Monitoring alerting being off the charts.
- Scaling a service to brace for traffic from marketing campaigns.
- How can you tackle these:
- Devops-dashboard - A Django powered application backed with a work-queue (for asynchronicity), which can tackle the issues which require accessing production instances for data.
- Having a comprehensive and robust CI/CD setup can prevent the issue for new service provisioning.
- CI/CD setup with jenkins pipelines, with cloudformation (for necessary infra provision), docker, ECS and etcd/vault (for secrets management)
- Getting premonitions about an impending DDOS or any aberrant behaviour notification using trend based alerting with ELK and python.
- Having real time flow and analysis of server/application logs using ELK setup, along with python programs to probe through the data to provide real timing alerting based on request trends (barrage of requests for a single endpoint from a specfic IP), unwanted intrusions (based on web server responses), non-standard requests, etc.
- Q and A
Some knowledge of any web programming language and MVC frameworks.
Basic awareness of Ops and Business pain points in an org.
Software Developer Engineer, Urban Ladder