Zero down time ML model swap using docker and kubernetes

Jul 2017

24 Mon

25 Tue

26 Wed

27 Thu 08:15 AM – 10:00 PM IST

28 Fri 08:15 AM – 06:25 PM IST

29 Sat

30 Sun

MLR Convention Centre, Whitefield, Bengaluru,

Zero down time ML model swap using docker and kubernetes

Submitted Jun 10, 2017

Section: Full talk for data engineering track Technical level: Beginner

At Gojek, we needed to improve the allocation of driver to customer. The behaviour of drivers across different regions are different. Models went stale depending on festivals and influx of new drivers to the system. Also a safe environment for the data science to play with the models was lacking.

We solved this problem by having multiple models running at different regions and different time periods with frequent changes without downtime. This session is about how we built a system using docker and kubernetes to handle multiple models and dynamically allotting traffic to multiple models based on the region or time.

This particular solution offered us the flexibility of running multiple models and redirecting traffic as deemed necessary. It also offered us the scalability if we needed one model to handle multiple regions.

Outline

###Introduction(3-5 mins)
A brief introduction to the topics covered. This section will introduce the business problems faced by Gojek in the domain of allocation of driver. The unique insight into the driver behaviour in Indonesia will also be spoken about in this section.

###Problem statement(2-5 mins)
This section will restate the problem on technical grounds. It will also introduce other technical constraints that were important during the project inception phase.

###A walkthrough through solution(20 mins)
This section will cover the technical solution that was implemented. It covers how we managed to deploy multiple models and scaled them through docker and kubernetes. It explains how we managed to dynamically allot traffic to multiple models with zero downtime. This section will also explain how monitoring and other process checks were in place to make sure that the model was working as expected. The usual deployment strategies around the models will also be explained here.

###Why did we use this solution?(4-5 mins)
This section will cover why we went for this solution and how it impacted us.

###Conclusion(2-5mins)
The issues that we faced during the whole process will be explained here. Also things could have been improved will also be discussed here.

Speaker bio

I am a Data Engineer at Gojek. I am part of the team that works on all things related to data. At Gojek, my work revolves around data pipelines, handling data at scale and building applications on top of data. I have also worked extensively in the Ruby world prior.

Slides

https://www.slideshare.net/anugrahnayar/fifth-draft-77113919

The Fifth Elephant 2017