Anti-patterns in designing machine learning systems
Submitted by Suchana Seth (@suchana) on Monday, 2 May 2016
Technical level: Advanced
The talk will focus on ML specific challenges to designing data science systems, how such systems acquire technical debt, and what we can do at design level to mitigate some of the risks.
Learn how to foresee these pitfalls & design your pipelines and systems to avoid them.
This talk is intended for an audience already familiar with applying machine learning algorithms.
In this talk, we’ll cover these sources of risk to ML systems -
Data drift - how to handle feature distributions that shift with time
Post model heuristics - when and how to add heuristics to model output
Hidden downstream consumers - how to identify and plan for these
Unacknowledged data dependencies - how to identify and plan for these
Feedback loops - the good and the bad
Decision thresholds & action limits - how to keep them sane
Reproducibility - how to ensure it
Suchana is a physicist-turned data scientist with 8 years of experience research, startups and product labs. She volunteers with DataKind in her free time, and mentors data-for-good projects.