The Fifth Elephant 2017

On data engineering and application of ML in diverse domains

Lakshman Prasad

@becomingguru

Reality of Data Modelling: Many analysts, one dataset: Multiple Results

Submitted May 31, 2017

There is a study that gave the same data set to many teams competent to analyse it and asked them all the same question: “whether soccer referees are more likely to give red cards to dark skin toned players than light skin toned players”: http://home.uchicago.edu/~npope/crowdsourcing_paper.pdf

The plurality of the analytical metods used, the distributions used to fit the data and make the predictions by these participating 61 analysts, in 29 teams is very interesting.

We explore the paper, the data set, the models and go through the rationale, why each of the modelling choice may have made sense and compare how the 29 results compare.

The paper makes a case why crowdsourcing the data reasearch and then collaborating to reach a conclusion makes sense to avoid any inherent biases and modelling errors.

Outline

  • The Project
  • The Data Set
  • The Results
  • Modelling choices
    • Possible rationale
    • choice of variables
    • similarities
    • differences in models
  • Prediction differences
  • Crowdsourcing, benefits
  • Demo of the simple solution
  • Discussion of how the simple solution compares to many complex ones

Speaker bio

Lakshman Prasad has been interested in data analytics and data science for a long time.

He was very fascinated with this UChicago paper that explicitly gets the results for the same data set and compares the approaches.

He currently works for a management consulting firm to develop technology solutions, that may involve data analytics.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jump starting better data engineering and AI futures