Using Probabilistic Data Structures to Build Real-Time Monitoring Dashboards

Jul 2017

24 Mon

25 Tue

26 Wed

27 Thu 08:15 AM – 10:00 PM IST

28 Fri 08:15 AM – 06:25 PM IST

29 Sat

30 Sun

MLR Convention Centre, Whitefield, Bengaluru,

Using Probabilistic Data Structures to Build Real-Time Monitoring Dashboards

Submitted Jun 9, 2017

Section: Crisp talk for data engineering track Technical level: Beginner

Performing basic operations like finding an element in a set or calculating its cardinality for a few thousands of data points is child’s play. However, it becomes complex and prohibitively expensive as the data-set grows into the millions and covers multiple dimensions.

One way of addressing this problem is by first indexing the data in a database, and then finding its cardinality or checking if an element is present in the database. However, this approach is not optimized for streaming data. Is it possible to perform these operations in a fixed amount of time with acceptable levels of trade-off over accuracy?

At DataWeave, we have managed to crawl millions of URLs every day, and analyze a large number of data points in real-time, with low error rates.

This talk presents an innovative way to build a monitoring dashboard using two probabilistic data structures - Bloom Filters and HyperLogLog.

Outline

Sketching
Bloom Filters
HyperLogLog
Practical Use Cases
Realtime dashboard using Bloom Filters and HLL

Speaker bio

I work as a Software Engineer in the data platforms team at DataWeave, a provider of Competitive Intelligence as a Service for retailers and consumer brands. I design and manage dataflows to various ‘Datastores’ maintained by the company. I also ensure that all datastores are working at optimum capacity, and data consistency is maintained across them.

I have more than 10 years of experience in the software industry with extensive experience in building core networks in the telecommunications domain. I hold a Master’s degree from IIIT-Bangalore.

Slides

https://drive.google.com/file/d/0ByAaSdfBUHSVbnBDZmhZVEtISFU/view?usp=sharing

The Fifth Elephant 2017