The Fifth Elephant 2018

The seventh edition of India's best data conference

Design for Data

Submitted by Paul Meinshausen (@pmeins) on Thursday, 12 July 2018

videocam_off

Technical level

Beginner

Section

Full talk

Status

Confirmed & Scheduled

View proposal in schedule

Vote on this proposal

Login to vote

Total votes:  +2

Abstract

When evaluating the quality and likelihood of success of AI/ML projects, I have found it helpful to think in terms of three core components: Workflow, Data, and Algorithms. In media and public discussion algorithms tend to receive the most attention, and for young data scientists they are often what seem most exciting. This talk will focus on the two underrated other components: workflow and data. In the majority of cases I’ve seen, as both a data scientist and an investor, they are what determine whether a project will really make a difference and produce practical success. Good, high-quality data comes from the work of design, and the work of design is fascinating, challenging, rewarding, and deserves every data scientist and engineer’s attention and practice. I will present a few key steps of designing for data, and lots of practical and real examples and illustrations from my work and study as a data scientist.

Outline

  • Introduction: the framework of Workflow, Data, Algorithm for AI/ML projects.
  • What is data? A representation of a part of the world that we care about.
  • The Data Generating Process
    • The data collection process (the technology and operations by which data reaches a database)
    • The statistical model
    • The probabilistic model
  • Data Quality as a function of data use - availability and visibility
    • Knowing the past readily - before predicting the future
  • The Complexity of Taking Action on the World - Learning from Machine Learning
    • Tracking and storing models, predictions, and results
  • Conclusion and Takeaways

Requirements

Past experience with real-world data science projects will be helpful. The talk will aim to provide something for beginners as well as advanced professionals.

Speaker bio

Paul Meinshausen is a Data Scientist in Residence at Montane Ventures, an early-stage venture capital fund. Previously he was CoFounder and Chief Data Scientist at PaySense, a mobile fintech startup in Mumbai. Earlier roles include Vice President of Data Science at Housing.com, and Principal Data Scientist at Teradata. He has a research background in behavioral and cognitive science, first started working on big and unstructured data for the U.S. Department of Defense in Afghanistan, and was a Data Science for Social Good Fellow at the University of Chicago’s Computation Institute.

Links

Slides

https://drive.google.com/open?id=1g3RQxBciwhWmdNmVfTAyMeuf_ftzwf2g

Comments

Login with Twitter or Google to leave a comment