The Fifth Elephant 2019

Gathering of 1000+ practitioners from the data ecosystem

Tickets

Turning Data into Actionable Insights in Real Time

Submitted by archit agarwal (@arcagarwal) on Friday, 14 June 2019

Session type: Short talk of 20 mins

Abstract

This talk will share our learnings and best practices in building our data pipeline which is handling billion of events per day and latency in single digit(seconds). how we moved from Spring microservices to Akka framework and how we reduced our VM footprint by 85% using Akka framework and.We have seen a huge growth in data in recent years and using Spring was not scalable.I will share how PayPal analytics pipeline is processing billions of events and what is the techstack we are using to achieve this feat.How we are processing data and making it useful to make decisions out of that data.This talk will start from how we are acquiring the data to how we are processing the data as well as visualization. Will tell how we are utilizing Kakfa,Spark and Druid(Open Stack) in our ecosystem.This talk should help anyone new into building data processing pipelines in their organization.

Outline

The flow would look like this:
1- About PayPal
2- Introduction to Real Time Analtyics Pipeline
3- How we are acquiring the data from paypal site(ClickStream Analtyics) making use of Akka Framework.
4- Messaging layer -> How we are using Kafka.
5- How we are processing the data in real time using Spark Streaming and will also share storage file format to save space.
6- Visualization of the data so that analysts can make meaningful insights out of data and also how we are getting dashboards loaded within seconds with the help of Druid.
7- Connecting all the dots
8- Takeways

Speaker bio

I am working as a Senior Data Engineer at PayPal since 1 year.I have the passion for building SCALABLE(billions of request with single digit latency) systems software. In my spare time I read about tech blogs of different companies and play badminton.

Links

Slides

https://www.slideshare.net/secret/BHUFD3HIRUMQqZ

Comments

  • Abhishek Balaji (@booleanbalaji) Reviewer 4 months ago

    Hi Archit,

    Thank you for submitting a proposal. We need to see detailed slides and a preview video to evaluate your proposal. Your slides must cover the following:

    • Problem statement/context, which the audience can relate to and understand. The problem statement has to be a problem (based on this context) that can be generalized for all.
    • What were the tools/frameworks available in the market to solve this problem? How did you evaluate these, and what metrics did you use for the evaluation? Why did you pick the option that you did?
    • Explain how the situation was before the solution you picked/built and how it changed after implementing the solution you picked and built? Show before-after scenario comparisons & metrics.
    • What compromises/trade-offs did you have to make in this process?
    • What is the one takeaway that you want participants to go back with at the end of this talk? What is it that participants should learn/be cautious about when solving similar problems?

    We need your updated slides and preview video by Jun 27, 2019 to evaluate your proposal. If we do not receive an update, we’d be moving your proposal for evaluation under a future event.

  • archit agarwal (@arcagarwal) Proposer 4 months ago

    Slides have been uploaded on SlideShare.

    • Abhishek Balaji (@booleanbalaji) Reviewer 4 months ago

      Thanks Archit. We cannot accept this talk for The Fifth Elephant in its current form. The proposal reads like a showcase of Paypal’s infrastructure and would not be useful for the audience. Further, the feedback and points to cover as mentioned in the previous comment have not been incorporated. Since we’re on a tight deadline for The Fifth Elephant, we cannot reconsider the presentation either. You may choose to incorporate the feedback and get back to us. We’ll evaluate this for a future event.

Login with Twitter or Google to leave a comment