Jul 2019
22 Mon
23 Tue
24 Wed
25 Thu 09:15 AM – 05:45 PM IST
26 Fri 09:20 AM – 05:30 PM IST
27 Sat
28 Sun
archit agarwal
This talk will share our learnings and best practices in building our data pipeline which is handling billion of events per day and latency in single digit(seconds). how we moved from Spring microservices to Akka framework and how we reduced our VM footprint by 85% using Akka framework and.We have seen a huge growth in data in recent years and using Spring was not scalable.I will share how PayPal analytics pipeline is processing billions of events and what is the techstack we are using to achieve this feat.How we are processing data and making it useful to make decisions out of that data.This talk will start from how we are acquiring the data to how we are processing the data as well as visualization. Will tell how we are utilizing Kakfa,Spark and Druid(Open Stack) in our ecosystem.This talk should help anyone new into building data processing pipelines in their organization.
The flow would look like this:
1- About PayPal
2- Introduction to Real Time Analtyics Pipeline
3- How we are acquiring the data from paypal site(ClickStream Analtyics) making use of Akka Framework.
4- Messaging layer -> How we are using Kafka.
5- How we are processing the data in real time using Spark Streaming and will also share storage file format to save space.
6- Visualization of the data so that analysts can make meaningful insights out of data and also how we are getting dashboards loaded within seconds with the help of Druid.
7- Connecting all the dots
8- Takeways
I am working as a Senior Data Engineer at PayPal since 1 year.I have the passion for building SCALABLE(billions of request with single digit latency) systems software. In my spare time I read about tech blogs of different companies and play badminton.
Hosted by
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}