Design for Data

This submission has been added to the schedule

Design for Data

Submitted Jul 12, 2018

Section: Full talk Technical level: Beginner

When evaluating the quality and likelihood of success of AI/ML projects, I have found it helpful to think in terms of three core components: Workflow, Data, and Algorithms. In media and public discussion algorithms tend to receive the most attention, and for young data scientists they are often what seem most exciting. This talk will focus on the two underrated other components: workflow and data. In the majority of cases I’ve seen, as both a data scientist and an investor, they are what determine whether a project will really make a difference and produce practical success. Good, high-quality data comes from the work of design, and the work of design is fascinating, challenging, rewarding, and deserves every data scientist and engineer’s attention and practice. I will present a few key steps of designing for data, and lots of practical and real examples and illustrations from my work and study as a data scientist.

Outline

Introduction: the framework of Workflow, Data, Algorithm for AI/ML projects.
What is data? A representation of a part of the world that we care about.
The Data Generating Process
- The data collection process (the technology and operations by which data reaches a database)
- The statistical model
- The probabilistic model
Data Quality as a function of data use - availability and visibility
- Knowing the past readily - before predicting the future
The Complexity of Taking Action on the World - Learning from Machine Learning
- Tracking and storing models, predictions, and results
Conclusion and Takeaways

Requirements

Past experience with real-world data science projects will be helpful. The talk will aim to provide something for beginners as well as advanced professionals.

Speaker bio

Paul Meinshausen is a Data Scientist in Residence at Montane Ventures, an early-stage venture capital fund. Previously he was CoFounder and Chief Data Scientist at PaySense, a mobile fintech startup in Mumbai. Earlier roles include Vice President of Data Science at Housing.com, and Principal Data Scientist at Teradata. He has a research background in behavioral and cognitive science, first started working on big and unstructured data for the U.S. Department of Defense in Afghanistan, and was a Data Science for Social Good Fellow at the University of Chicago’s Computation Institute.

Links

Slides

https://drive.google.com/open?id=1g3RQxBciwhWmdNmVfTAyMeuf_ftzwf2g

The Fifth Elephant 2018

Design for Data

Outline

Requirements

Speaker bio

Links

Slides

Comments