The Fifth Elephant 2017

On data engineering and application of ML in diverse domains

ML For Personalization At Scale @ Nearbuy

Submitted by ankit kohli (@ankitko) on Wednesday, 12 April 2017

Section: Full talk for data engineering track Technical level: Advanced Status: Rejected


Here I will try to explain how we use ML to give personalized recommendations to the customers.
Also I will explain how have we setup our Big Data Pipeline using KAFKA , SPARK and HBASE .
The amount of data we process daily and how to we handle anamolies and our learning track .
I will also discuss about vvarious ML Algos that we are using and how to use them in SPARK .
Understanding of Collaborative Filtering ( and it use cases ) and how to use it in SPARK


Data Pipeline Dicussion
Data Modeling - Avro/ Parquet
Discussion over how data from various source ( Real Time & Batch ) is ingested using Kafka ,
transformed using SPARK and stored in HBASE
Then how data is modeled and fed into ML Pipelines using SPARK
And Then about varioud ML Algos that we run to generate personalizations and how it is used in Nearbuy’s World.
Finally, ways to evaluate your ML Algos.

Various Data Sources -> kafka -> Spark -> Hbase
| ML Lib Algos - Collaborative Filtering

                                                                                                                     Common Problems that comes in each step 
                                                                                                                     Brief about Kafka , Hbase and in depth about SPARK

Speaker bio

Currently, I am employed as a Software Engineer at Nearbuy.In the past I have worked in Practo , Make My Trip.
Currenly my interest is in Big Data and I am actively involved in building projects to better the customer experience.
Working on Machine Learning to develop Personalization at Nearbuy .
I have overall 7 Years of expereince in technology.



Preview video


  • Zainab Bawa (@zainabbawa) 3 years ago

    Hi Ankit, why is Nearbuy’s case interesting for other participants? Also, this talk is about Nearbuy’s architecture and architectural decisions. Why is this interesting to other attendees? What is the insight you intend to present to the audience through Nearbuy’s experiences?

    • ankit kohli (@ankitko) Proposer 3 years ago

      Hi Team , I am interested in providing an insight to any participant who are into the field of Machine Learning or intend to do so .Through my talk I want to present how I have implemented customer personalization in Nearbuy. I will talk about technologies such as SPARK , KAFKA and HBASE and elaborate upon how to set a ML pipeline from end - to end . I will present slides detailing how to give Recommendations & Predictions for customers.
      I will talk about -
      1. SPARK ML LIB
      2. ALS Model
      3. Logistic Regression
      4. Feature Extraction
      5. K Mean Clustering
      6. Common Problems (Scale , Data )
      7. ML Tools and Libraries
      8. How to use KAFKA to gather click stream data and sink it in HBASE .
      This is interesting for people
      1. who are thinking of doing ML at their workplace and need to have some insights and experience
      some starting point ,
      2. who are already doing some ML - can surely know more and see how the same thing is implemented elsewhere
      For Nearbuy (me) -
      will get to know more of ML practices followed eleswhere , and expecting to surely learn more and gain knowlwdge to do better

  • laxmi singh (@pvacorner19) 5 months ago

    I wanted to thank you for this great read!!

Login to leave a comment