Model Health Assurance at scale at LinkedIn
At LinkedIn, AI models drive numerous use-cases - across ranking (eg: Feed items, jobs relevance), and classification (wg: content quality, account protection) use-cases - with hundreds of experiments performed every week. At this scale, it is also important to standardize and platformize how the said monitoring is also done so that we don’t reinvest on the same things, we can effect change across all use-cases easily etc - this lets our engineers move faster with the confidence that unexpected behavior can be acted upon.
We have built a platform named Pro-ML, which provides standardized components across the AI lifecycle that engineers use to build out their AI pipelines. Pro-ML comes with Health Assurance built-in - these are a set of components that provide in-built monitoring and alerting on AI models at different stages of the model’s lifecycle - including real-time latency and distribution graphs, data-drift, time-series anomaly detection on model scores and feature values, dark-canary based model validation and a lot more. In this talk we will be taking you through what we are doing as part of Health Assurance and how we are building it. A glimpse at https://engineering.linkedin.com/blog/2021/model-health-assurance-at-linkedin