PyCon, the gathering for the community using and developing the open-source Python programming language. This is the second year of the PyCon Pune where the community will meet for two days of talks and working on upstream projects in two days of dev sprint. CFP ends on 15th September AoE.

Website: https://pune.pycon.org/2018/

Note to submitters: The talks will be selected by a team. We will not count the public votes.

Hosted by

PyCon Pune 2017 more

Srinivasan Rangarajan

@cnu

Real-time log analytics using Probabilistic Data Structures in Redis

Submitted Sep 15, 2017

There are two ways to solve any problem: Accurately or approximately. Accurate data structures has its disadvantages - too much memory usage and unscalable for real-time nature of data. In this talk we will see how to take advantage of the newly release Redis 4.0 with pluggable modules to build a data pipeline which uses probabilistic data structures to get real-time insights.

Outline

There are different insights and metrics that could be obtained from log events data. Processing the data in real-time and getting accurate results are possible in theory. In practice, not so easy.

Not all results and metrics need to be accurate. There are places where the tradeoff between accuracy and memory usage/scalability is worth it. That is where probabilistic data structures (PDS) come in. In this talk I will be explaining about different PDSs and how they work. And I will also be talking about how to use Redis and it’s pluggable module system to use these data structure much more efficiently.

  1. Introduction
  2. Log Analysis
    1. Problem: Parsing high volume & velocity log event data.
    2. Various metrics to be measured.
  3. Redis 4.0
    1. New Features in Redis 4.0
    2. Using the new modules system for accessing these data structures
  4. Difference between accurate data structures and probabilistic data structures
    1. Hyperloglog - Cardinality of sets
    2. Top-K - Getting the top k items from a data set
    3. Count Min Sketch - Get item counts
    4. Bloom Filters - Check for membership

Speaker bio

I have been using python for more than 10 years professionally and have worked with numerous startups, building their engineering platform to solve problems at a large scale. Currently I manage the entire engineering team at Mad Street Den and am responsible for building and scaling the entire platform on which different Computer Vision based Retail Automation products are being built. The products we built are being used by millions of users every day all over the globe.

I am a regular speaker in Pycon India and have talked in 2009, 2013 and 2016. Apart from speaking in other local meetups, I have also been on the editorial board for the Fifth Elephant Conference 2017 - identifying and helping speakers to fine-tune their talks. I also occasionally contribute to few open source projects and maintain a few of my own.

Slides

https://speakerdeck.com/cnu/probabilistic-data-structures

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

PyCon Pune 2017 more