Anthill Inside 2019

On infrastructure for AI and ML: from managing training data to data storage, cloud strategy and costs of developing ML models

Large scale Machine Learning and data storage for CDP: transforming Digital Marketing

Submitted by Kunal Kishore (@kunalkishore) on Tuesday, 30 April 2019

Technical level: Intermediate Session type: Lecture Section: Full talk


We will talk about why do we need a single end-to-end customer data platform to enable truly personalised digital marketing. We also explain what pain-points, such as cold-start problem, do we solve for marketers if we collate and utilise data from first, second and third party sources rather than relying on just first party data. Then we will focus on the motive to use Machine Learning to create behavioral audience segments, instead of relying on plain historical numbers.

The major end goals are:
- cost effective customer acquistion
- higher purchases, subscriptions, engagement etc
- better return on marketing spends
- higher user retention

Then we shall cover the underlying tasks. First is that of acquisition and ingestion of data from multiple data sources into the platform in such a way that it adheres to the data laws, is secure, and can be used combinedly. These data sources include telecom operators, location service providers, ad-exchanges, push notification service providers, social media platforms etc.
Secondly, we shall discuss how is these data enriched and engineered into a single, unified, secure and anonymised user profile database. Example of data enrichment are location based such as polygon mapping, mobile device based, app usage based etc.
Then we discuss a series of user behavioral scores built using Machine Learning such as LTV, Churn Rate, propensity to transact. lookalike etc. Here we cover feature engineering, model accuracy, choice of algorithm, scalability etc. Scalability is a big challenge as certain scores are predicted for hundreds of millions of users. The business proposition of how these scores can be used to create audience segments in order to target users as per their life-cycle will come next.


  • Introduction to CDP
  • Pain points of marketers
  • How Machine Learning and CDP work in tandem
  • Data acquisition, collection, ingestion, enrichment and management
  • Machine Learning based user behavioral scores
  • Audience and persona creation


  • Nothing specific. People should have a basic understanding of Machine Learning.

Speaker bio

Kunal Kishore completed his Bachelor of technology degree from IIT Kharagpur in Electronics and Communication Engineering. Currently he works as Research Scientist at Inmobi where he leads the data science efforts on Inmobi’s CDP offering. He has previously worked on data science areas such as large scale content recommendation systems, ad response prediction for display advertising bidder and e-commerce product recommendation.



  • Abhishek Balaji (@booleanbalaji) Reviewer a month ago

    Hello Kunal,

    Thank you for submitting a proposal. To proceed with evaluation, we need to see detailed slides and a preview video to supplement your proposal. Your slides must cover the following:

    • Problem statement/context, which the audience can relate to and understand. The problem statement has to be a problem (based on this context) that can be generalized for all.
    • What were the tools/options available in the market to solve this problem? How did you evaluate alternatives, and what metrics did you use for the evaluation?
    • Why did you pick the option that you did?
    • Explain how the situation was before the solution you picked/built and how it changed after implementing the solution you picked and built? Show before-after scenario comparisons & metrics.
    • What compromises/trade-offs did you have to make in this process?
    • What is the one takeaway that you want participants to go back with at the end of this talk? What is it that participants should learn/be cautious about when solving similar problems?
    • What are the privacy, security, regulatory and ethical considerations which were taken when designing the product?
    • Is the tool free/open-source? If not, what can the audience takeaway from the talk?

    We need to see the updated slides on or before 21 May in order to close the decision on your proposal. If we do not receive an update by 21 May we’ll move the proposal for consideration at a future event.

Login with Twitter or Google to leave a comment