The Fifth Elephant 2020 edition

The Fifth Elephant 2020 edition

On data governance, engineering for data privacy and data science

Imaad Mohamed Khan

@imaadmkhan1

Finding high propensity users for Delivery Jobs

Submitted May 31, 2020

At Vahan, we’re helping 300M+ low-skilled workers in India find jobs using WhatsApp. One of the major categories of jobs where we help find people is Delivery jobs. For our marketing campaigns, we wanted to be able to identify the users that have a high propensity towards taking up Delivery Jobs. We will talk about how we iterated over the process of building models and feature engineering to come up with different versions of our model in a limited period of time.
This talk will benefit Data Analysts, Data Scientists and other professionals looking to build propensity models.
Key Learnings for participants: End to end data science project flow, what is propensity modeling, is propensity modeling = causality?, feature engineering, feature engineering techniques, model selection, model evaluation, convert a business problem to data problem.

Outline

Introduction
Company Introduction - Talk about what we do as a company here
Speaker Introduction - Talk a little about what I have done so far
Problem we’re solving - Propensity of users - Why is it important to solve for us?
Introduction to Propensity Modeling - Introduce various propensity modeling techniques in the literature
Propensity Modeling and Causality, are they related? - Related discussion to compare propensity modeling with causality

Data Collection
Understanding the data - Describe the process of collecting our data
Label Selection - process of selection - Describe the process of arriving at how we selected our label
Missing data imputation - Talk about techniques used to impute missing data

Modeling
Iteration
First version - Clustering with some features
Second version - Better Feature Engineering (explain different techniques of feature engineering here) with a Supervised Learning approach

Evaluation
Metrics for each approach - Custom Metric to evaluate clustering, F1-Score, Precision, Recall for the supervised learning approach

Deployment
Release of the cohorts to the marketing teams

Requirements

A basic understanding of Machine Learning concepts - Supervised, Unsupervised Learning, Evaluation Metrics

Speaker bio

Imaad Mohamed Khan is a Data Scientist, Content Creator and an Educator. He graduated with a Masters in Internet Technologies and Information Systems from TU Braunschweig, Germany. He did his Bachelors in Electronics and Communication from M S Ramaiah Institute of Technology, Bangalore. He is currently working as a Data Scientist at Vahan. Earlier, he was the co-founder and CTO of Recreate.ai. Prior to that, he was working as a Data Scientist at Indegene where he worked on multiple Data Science and NLP projects. He is also a Content Creator on LinkedIn and writes on topics related to Data Science, Machine Learning and Artificial Intelligence.

Slides

https://docs.google.com/presentation/d/1vie2EQfZFSQVxY40nk73jw-Uwqph0_EhYRXALWMkFC4/edit?usp=sharing

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jump starting better data engineering and AI futures