The Fifth Elephant 2018

The Fifth Elephant 2018

The seventh edition of India's best data conference

Abhishek Agarwal

@abhishek_appd

Building big data pipelines on kafka and kubernetes

Submitted Mar 31, 2018

At Appdynamics, we have been trying to push the limits to which we can scale the metric ingestion. Toward this goal, we have been taking logical pieces out of monolithic application and re-architecting these pieces to handle large scale.

Initially, we decided to adopt a stream processing platform to port these new pieces to, but later we realized that not all of these pieces are typical real time streaming application. Some of these pieces are regular web services but they also have common concerns such as orchestration, fault tolerance, resiliency, up-scaling/down-scaling etc. In fact, most organizations have this same problem but usually end up deploying different infrastructures for different category of applications. However, operational simplicity and a lean infrastructure were importants concerns for us and motivated us to take a different route.

In this talk, I am going to talk about how modelling our real time data pipelines as asynchronous microservices has allowed us to use same kubernetes infrastrucutre for both data pipelines and regular web services. I will talk on how this unification immensly simplifies our deployment and operations work and keeps our services lean.

Outline

  1. Problem statement - Why we started on this path
  2. Initial vision - Re-architecting the application with stream processing platforms
  3. Separation of concerns - application concerns (at least once) vs infra concerns (scaling)
  4. Course correction - Moving from stream processing platform to an orchestration platform
  5. Ingestion pipeline - Ingestion is the first service we scaled using kafka and kubernetes
  6. Lessons - What were the challenges and how we overcame them
  7. Q/A

Speaker bio

Abhishek is Staff Software Engineer at Appdynamics India Pvt Ltd and works on real time stream processing infrstructure at Appdynamics. He is also a core member of the team which is responsible for transitioning appdynamics product from a monolith to asynchronous microservices driven architecture. Previously, he has worked in InMobi user/data platform team. He is also a PMC member of Apache storm.

Slides

https://docs.google.com/presentation/d/1Ao1ae14k0qQq8LKWZXCv0UOeL-Ki_JeqE94udFNiFTM/edit?usp=sharing

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jump starting better data engineering and AI futures