Machine Learning models begin to lose accuracy as soon as they are put into production. At DoorDash, we implemented a robust monitoring system to diagnose this issue and maintain the accuracy of our forecasts.
DoorDash is a last-mile logistics platform. We use Machine Learning to improve the quality of the experience of our customers. For example, reliable estimates for how long it takes for a restaurant to prepare a food order ensures that when a Dasher (our term for delivery drivers) arrives at a restaurant, the food is already ready, this leads to (a) less queueing at the restaurant by Dashers, leading to a positive experience for restaurants and Dashers both, (b) maximizes the earning potential for Dashers per hour, (c) minimizes the time for delivery to the consumer.
However, once an ML model is trained, validated, and deployed to production, it immediately begins degrading, a process called model drift. This degradation negatively impacts the accuracy of our time estimates and other ML model outputs. Because ML models are derived from data patterns, their inputs and outputs need to be closely monitored in order to diagnose and prevent model drift. Systematically measuring performance against real-world data lets us gauge the extent of model drift.
In the past, we’ve seen instances where our models became out-of-date and began making incorrect predictions. These problems impacted the business and customer experience negatively and forced the engineering team to spend a lot of effort investigating and fixing them. Finding this kind of model drift took a long time because we did not have a way to monitor for it.
This experience inspired us to build a solution on top of our ML platform. We set out to solve this model drift problem more generally and avoid issues like it in the future for all of the ML use cases on our platform. Ultimately, our goal was to create a solution that would protect all the different ML models DoorDash had in production.
In this talk, we will walk through our story of how we surveyed our data scientists, applied systems thinking to this problem, came up with an approach, and implemented a platform-level solution to preventing model drift.
We hope this story will be useful for other Machine Learning Platform teams who want to prevent issues such as data drift and model drift.
Full Article : https://doordash.engineering/2021/05/20/monitor-machine-learning-model-drift/
Slides : https://swaroopch.com/ml-model-monitoring