Designing an Autonomous Workbench for Data Science on AWS
In the wake of the COVID-19 pandemic and the consequent remote work setup, we - the Engineering team at Episource - were keen on developing a hosted, self-serving platform which would allow our Data Science counterparts to access the compute and data they needed for their experiments on-the-fly.
Due to the iterative nature of ML development cycle where ideas go from hypothesis to being feature-ready within days, it was imperative for the platform to be instantly scalable to meet the heavy requirements of modern-day ML processes. Another aspect to consider was that we had to ensure sensitive training datasets would stay in encrypted and secure environments only.
This talk will be a quick overview into the thought process, experiments and lessons learnt during our journey of building our own data science workbench on AWS.
During this talk, a participant can expect to understand the following:
- The AWS architecture we designed to host the open source Jupyterlab project and adapted it to accommodate our specific requirements using Kubernetes.
- Best practices for integrating organised dataset storage, closed-door access control and autoscaling capabilities to our architecture.
- First hand insights on how the Workbench has improved Episource’s ML development cycles.
Who is this talk for?
- ML teams of any size, who are looking to introduce autonomy and promote rapid experimentation within their data science ranks.