Lessons scaling operations to everyone @indix
At Indix we collect and process lots of data. As our data size grew so were the operational difficulties surronding them. When we were a team with small number of developers and a single ops person, using a centralised configuration management system made a lot of sense. Any changes would go through him resulting in a very less overhead. As we scaled our team that single person became a bottleneck for different teams. He wasn’t able to cope up with new things that individual teams wanted to try for their respective set of challenges. This lead to individual teams stepping up to manage their own infrastructure parts on AWS.
That’s when we realised we needed a more de-centralised, respective team owned configuration management across the organization. While most teams welcomed the change, some teams with no prior operational experience found this transition very hard. After a lot of attempts we started using a self-serviced, resource based scheduling for individual services for these teams.
Self managed infrastructure is a dream to any operational team in an organization, but it does come with it’s fair share of challenges.
In this talk, I’ll cover our problems, mistakes and learnings over the years on scaling the operations to everyone at Indix and how some our technology choices were influenced by them.
- Problems, mistakes and learnings
- When we were 5+ member team
- Focus on keeping infrastucture sane
- When we were 30+ member team
- Focus on scaling the infra knowledge and not the ops team
- When we were 60+ member team
- Focus on abstracting away the infra knowledge
- Tech Radar(s)
- Operability Checklist
- Current un-solved challenges
Ashwanth Kumar is a Principal Software Engineer working in Data Ingestion Team @indix. While he’s not fiddling with distributed Systems and data. He contributes to Open Source and helps organises meetups and tech events in the City. He writes Scala at work and Go at home.
- List of all the talks - https://github.com/ashwanthkumar/talks
- OSS - https://github.com/ashwanthkumar/