The Fifth Elephant 2017

On data engineering and application of ML in diverse domains

Large scale business stats aggregation using Kafka

Submitted by Vinothkumar Raman (@vinothkumarraman) on Thursday, 30 March 2017

videocam
Preview video

Technical level

Intermediate

Section

Full talk of 40 mins duration

Status

Submitted

Vote on this proposal

Login to vote

Total votes:  +22

Abstract

At Indix we collect and process lot of data. We monitor the correct behaviour of our system through collection of business metrics. Over the time, we moved most of our system from batch map-reduce jobs to kafka stream tasks. Hence we had to move the stats to be more real time. So we built a system called Abel, which aggregates millions of events that it gets and collects stats for the same.

Outline

  1. Stats as seperate MR jobs
  2. Pros and Cons for MR jobs
  3. Trying to use riemann for stat collection
  4. Pros and Cons for Riemann
  5. Generalizing the Stats abstraction with Semigroups
  6. Semigroup properties
  7. Semigroup examples
  8. Emission of stats with Abel
  9. Explosion of keys
  10. Performance

Speaker bio

I have been working at Indix for last 4.5 years and before that was part of Thoughtworks. I have worked on almost every part of Indix have been through the journey of how it evolved.

Slides

https://speakerdeck.com/vinothkr/evolution-of-stats-at-indix

Preview video

https://youtu.be/WaK_UdWPdq4

Comments

  • 1
    Zainab Bawa (@zainabbawa) Reviewer a year ago

    Hello Vinoth, please share link to a two-min preview video where you explain what this talk is about and what is the takeaway for the audience.

    • 1
      Vinothkumar Raman (@vinothkumarraman) Proposer a year ago

      Hello Zainab, I’ve updated the submission with a preview video. Let me know if you need anything more.

Login with Twitter or Google to leave a comment