At BigBinary, we’ve been building neetoDeploy for the past one year.
We were running all our PR review apps across around 25-30 projects on Heroku. Last year once Heroku announced that they’re getting rid of their free plans, we started off by building a platform to deploy PR review apps. We kept the date Heroku was planning to remove their free plans as a deadline and quickly put together our platform on top of Kubernetes so that we could migrate all our apps from Heroku. We completed this way before the deadline and spent the rest of the time fixing bugs and stabilizing the platform. We architected an entire idle mechanism for the apps, based on the network requests each service recieves. If an app is not accessed for 5 minutes, it will get scaled down and would only be brought back up when its accessed again. We were able to bring down our costs substantially with this.
Once we had nailed PR review apps, we started experimenting with staging and production app deployments. Since the basic functionality was there, we were able to bring it together easily. With this, we moved all of BigBinary’s internal staging deployments to neetoDeploy. One of the major uses of staging deployments was to run Cypress tests against them everyday.
After we started using our platform to deploy staging apps, we started facing a lot of stability issues with existing features. We had to rebuild and re-architecture several features that we had already implemented, kind of like building the Ship of Theseus. We went back to the drawing board and designed a new efficient way of streaming logs faster. We setup cluster autoscaler to handle load, and overprovisioned the cluster ever so lightly based on the existing deployments, so that new deployments never have to wait for the cluster to be up, resulting in seamless and fast deployments. We moved from an external docker registry to our own registry hosted inside our Kubernetes cluster to bring down network costs and latency and so on.
The last one year has been a rollercoaster ride in terms of learning and experimenting. Working on and maintaining neetoDeploy over the past year taught me a lot of lessons the hard way and I’ve understood what SRE means in a project of this scale. We wrote a bunch of blog posts about it too.
This is the neetoDeploy story - how we built a cloud deployment platform as a service from scratch and took it to production in a year.
Login to leave a comment
Sreeram Venkitesh
@sreeramvenkitesh Submitter
Slides - https://docs.google.com/presentation/d/1qZd-V2FWqaQ61HB_k58NJAxj9Wowyn6C60LY7ZwHdr8/edit?usp=sharing
Zainab Bawa
@zainabbawa Editor & Promoter
Hi Sreeram, sumamrizing the feedback from the editors:
"neetoDeploy seems to be a commercial or an internal tool.
Understanding what it does or how it works is not useful to the audience. The submission links to blog posts which do talk about some challenges faced in K8s autoscaling.
But the post is designed top-down - it is about - neetoDeploy and then goes to its architecture and then to the technology challenges.
The first 2 parts of the journey do not have takeaways. Restructuring the talk to be bottom up - thin layer of neetoDeploy and a thick covering of - db challenges, k8s challenges, incidents, etc. will require buy in from the speaker.
In the current state, turn this to a lightning talk, which is more tech focussed.