The Fifth Elephant 2017

On data engineering and application of ML in diverse domains

Reality of Data Modelling: Many analysts, one dataset: Multiple Results

Submitted by Lakshman Prasad (@becomingguru) on Wednesday, 31 May 2017

videocam_off

Technical level

Intermediate

Section

Full talk for data engineering track

Status

Submitted

Vote on this proposal

Login to vote

Total votes:  +3

Abstract

There is a study that gave the same data set to many teams competent to analyse it and asked them all the same question: “whether soccer referees are more likely to give red cards to dark skin toned players than light skin toned players”: http://home.uchicago.edu/~npope/crowdsourcing_paper.pdf

The plurality of the analytical metods used, the distributions used to fit the data and make the predictions by these participating 61 analysts, in 29 teams is very interesting.

We explore the paper, the data set, the models and go through the rationale, why each of the modelling choice may have made sense and compare how the 29 results compare.

The paper makes a case why crowdsourcing the data reasearch and then collaborating to reach a conclusion makes sense to avoid any inherent biases and modelling errors.

Outline

  • The Project
  • The Data Set
  • The Results
  • Modelling choices
    • Possible rationale
    • choice of variables
    • similarities
    • differences in models
  • Prediction differences
  • Crowdsourcing, benefits
  • Demo of the simple solution
  • Discussion of how the simple solution compares to many complex ones

Speaker bio

Lakshman Prasad has been interested in data analytics and data science for a long time.

He was very fascinated with this UChicago paper that explicitly gets the results for the same data set and compares the approaches.

He currently works for a management consulting firm to develop technology solutions, that may involve data analytics.

Comments

Login with Twitter or Google to leave a comment