Turning Data into Actionable Insights in Real Time
Submitted by archit agarwal (@arcagarwal) on Friday, 14 June 2019
Session type: Short talk of 20 mins
This talk will share our learnings and best practices in building our data pipeline which is handling billion of events per day and latency in single digit(seconds). how we moved from Spring microservices to Akka framework and how we reduced our VM footprint by 85% using Akka framework and.We have seen a huge growth in data in recent years and using Spring was not scalable.I will share how PayPal analytics pipeline is processing billions of events and what is the techstack we are using to achieve this feat.How we are processing data and making it useful to make decisions out of that data.This talk will start from how we are acquiring the data to how we are processing the data as well as visualization. Will tell how we are utilizing Kakfa,Spark and Druid(Open Stack) in our ecosystem.This talk should help anyone new into building data processing pipelines in their organization.
The flow would look like this:
1- About PayPal
2- Introduction to Real Time Analtyics Pipeline
3- How we are acquiring the data from paypal site(ClickStream Analtyics) making use of Akka Framework.
4- Messaging layer -> How we are using Kafka.
5- How we are processing the data in real time using Spark Streaming and will also share storage file format to save space.
6- Visualization of the data so that analysts can make meaningful insights out of data and also how we are getting dashboards loaded within seconds with the help of Druid.
7- Connecting all the dots
I am working as a Senior Data Engineer at PayPal since 1 year.I have the passion for building SCALABLE(billions of request with single digit latency) systems software. In my spare time I read about tech blogs of different companies and play badminton.