Instrumenting your kafka & storm pipeline
Bhasker Kode
@bhaskerkode
tips to design your stream processing setup.
what all can go wrong, how to instrument it.
Outline
introduction to a production setup that handles billions of events per week through our home-grown apache kafka client, process the stream with storm - and then aggregate into postgres. will also share pro’s and con’s of building our own kafka client as opposed to re-using one.
Speaker bio
Bosky (@bhaskerkode) leads a product engg team at Helpshift & works on erlang, clojure and golang.
building distributed systems since ‘06 across edtech, adtech & mobile in erlang, clojure & go.
=> http://in.linkedin.com/in/bhaskerkode & http://slideshare.net/bosky101
Built a kafka producer/micro-service used in production at helpshift, layer, and several other companies.
=> http://github.com/helpshift/ekaf
( Recommended by Apache kafka https://cwiki.apache.org/confluence/display/KAFKA/Clients )
Using storm in production for sentiment analysis, topic extraction, naive bayes classification, etc.
Eager to learn more about best practices in storm deployment/management. Incidentally another part that feeds into this system is in golang, uses the Shopify kafka producer.
Links
- http://github.com/helpshift/ekaf
- https://coderwall.com/p/1lyfxg/parsing-the-kafka-protocol-with-erlang-pattern-matching-ftw
Slides
https://www.dropbox.com/s/c4rwq1tiu7nxvpd/fifthel-15-kafka-storm-at-helpshift-bhaskerkode.png?dl=0
{{ errorMsg }}