Biases often arise in automated workflows based on MachineLearning models due to erroneous assumptions made in the learning process. Examples of such biases involve societal biases such as gender bias, racial bias, age bias and so on.
In this 15 minute talk, we hope to cover prominent sources of such biases that lead to ML models producing unwanted outcomes. We will also look at ways to detect and measure such biases in our production workflows.
Introduction: Why do we care about biases? (1 min)
Sources of biases (3 minutes)
→ Specification Bias
→ Sampling Bias
→ Measurement Bias
→ Annotator/ Label Biases
→ Inherited Biases from other ML models
Metrics to measure biases(5 minutes)
(Will cover classification use-case only - since it might be hard to do more here)
→ TPR across Groups
→ FPR across Groups
→ Accuracy across groups
→ Demographic Parity
How to avoid biased ML workflows(3 mins)
→ Techniques for Debiasing Data going into model
→ Techniques involving Post Processing model outcome
If time permits: (3 mins)
Biases in word embeddings: Case study: Examples from word2vec embeddings
What is fair and what is not is contextual.
Importance of Human inputs/judgement in designing debiasing techniques in ML workflows
Slides in Progress
Here is a more detailed article with the slide deck that was presented on this topic: https://machinelearninginterview.com/topics/machine-learning/bias-and-fairness-in-ml-pipelines/