Reality of Data Modelling: Many analysts, one dataset: Multiple Results
Submitted by Lakshman Prasad (@becomingguru) on Wednesday, 31 May 2017
Full talk for data engineering track
There is a study that gave the same data set to many teams competent to analyse it and asked them all the same question: “whether soccer referees are more likely to give red cards to dark skin toned players than light skin toned players”: http://home.uchicago.edu/~npope/crowdsourcing_paper.pdf
The plurality of the analytical metods used, the distributions used to fit the data and make the predictions by these participating 61 analysts, in 29 teams is very interesting.
We explore the paper, the data set, the models and go through the rationale, why each of the modelling choice may have made sense and compare how the 29 results compare.
The paper makes a case why crowdsourcing the data reasearch and then collaborating to reach a conclusion makes sense to avoid any inherent biases and modelling errors.
- The Project
- The Data Set
- The Results
- Modelling choices
- Possible rationale
- choice of variables
- differences in models
- Prediction differences
- Crowdsourcing, benefits
- Demo of the simple solution
- Discussion of how the simple solution compares to many complex ones
Lakshman Prasad has been interested in data analytics and data science for a long time.
He was very fascinated with this UChicago paper that explicitly gets the results for the same data set and compares the approaches.
He currently works for a management consulting firm to develop technology solutions, that may involve data analytics.