Reality of Data Modelling: Many analysts, one dataset: Multiple Results

Jul 2017

24 Mon

25 Tue

26 Wed

27 Thu 08:15 AM – 10:00 PM IST

28 Fri 08:15 AM – 06:25 PM IST

29 Sat

30 Sun

MLR Convention Centre, Whitefield, Bengaluru,

Reality of Data Modelling: Many analysts, one dataset: Multiple Results

Submitted May 31, 2017

Section: Full talk for data engineering track Technical level: Intermediate

There is a study that gave the same data set to many teams competent to analyse it and asked them all the same question: “whether soccer referees are more likely to give red cards to dark skin toned players than light skin toned players”: http://home.uchicago.edu/~npope/crowdsourcing_paper.pdf

The plurality of the analytical metods used, the distributions used to fit the data and make the predictions by these participating 61 analysts, in 29 teams is very interesting.

We explore the paper, the data set, the models and go through the rationale, why each of the modelling choice may have made sense and compare how the 29 results compare.

The paper makes a case why crowdsourcing the data reasearch and then collaborating to reach a conclusion makes sense to avoid any inherent biases and modelling errors.

Outline

The Project
The Data Set
The Results
Modelling choices
- Possible rationale
- choice of variables
- similarities
- differences in models
Prediction differences
Crowdsourcing, benefits
Demo of the simple solution
Discussion of how the simple solution compares to many complex ones

Speaker bio

Lakshman Prasad has been interested in data analytics and data science for a long time.

He was very fascinated with this UChicago paper that explicitly gets the results for the same data set and compares the approaches.

He currently works for a management consulting firm to develop technology solutions, that may involve data analytics.

The Fifth Elephant 2017