The Fifth Elephant 2017

On data engineering and application of ML in diverse domains

Up next

ML For Personalization At Scale @ Nearbuy


ankit kohli


Here I will try to explain how we use ML to give personalized recommendations to the customers.
Also I will explain how have we setup our Big Data Pipeline using KAFKA , SPARK and HBASE .
The amount of data we process daily and how to we handle anamolies and our learning track .
I will also discuss about vvarious ML Algos that we are using and how to use them in SPARK .
Understanding of Collaborative Filtering ( and it use cases ) and how to use it in SPARK


Data Pipeline Dicussion
Data Modeling - Avro/ Parquet
Discussion over how data from various source ( Real Time & Batch ) is ingested using Kafka ,
transformed using SPARK and stored in HBASE
Then how data is modeled and fed into ML Pipelines using SPARK
And Then about varioud ML Algos that we run to generate personalizations and how it is used in Nearbuy’s World.
Finally, ways to evaluate your ML Algos.

Various Data Sources -> kafka -> Spark -> Hbase
ML Lib Algos - Collaborative Filtering

                                                                                                                     Common Problems that comes in each step 
                                                                                                                     Brief about Kafka , Hbase and in depth about SPARK

Speaker bio

Currently, I am employed as a Software Engineer at Nearbuy.In the past I have worked in Practo , Make My Trip.
Currenly my interest is in Big Data and I am actively involved in building projects to better the customer experience.
Working on Machine Learning to develop Personalization at Nearbuy .
I have overall 7 Years of expereince in technology.