The Fifth Elephant 2017

On data engineering and application of ML in diverse domains

Vinothkumar Raman

@vinothkumarraman

Large scale business stats aggregation using Kafka

Submitted Mar 30, 2017

At Indix we collect and process lot of data. We monitor the correct behaviour of our system through collection of business metrics. Over the time, we moved most of our system from batch map-reduce jobs to kafka stream tasks. Hence we had to move the stats to be more real time. So we built a system called Abel, which aggregates millions of events that it gets and collects stats for the same.

Outline

  1. Stats as seperate MR jobs
  2. Pros and Cons for MR jobs
  3. Trying to use riemann for stat collection
  4. Pros and Cons for Riemann
  5. Generalizing the Stats abstraction with Semigroups
  6. Semigroup properties
  7. Semigroup examples
  8. Emission of stats with Abel
  9. Explosion of keys
  10. Performance

Speaker bio

I have been working at Indix for last 4.5 years and before that was part of Thoughtworks. I have worked on almost every part of Indix have been through the journey of how it evolved.

Slides

https://speakerdeck.com/vinothkr/evolution-of-stats-at-indix

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jump starting better data engineering and AI futures