Approximate algorithms for summarizing streaming data

Jul 2015

13 Mon

14 Tue

15 Wed

16 Thu 08:30 AM – 06:35 PM IST

17 Fri 08:30 AM – 06:30 PM IST

18 Sat 09:00 AM – 06:30 PM IST

19 Sun

NIMHANS Convention center

All submissions

Previous Next

This submission has been added to the schedule

Approximate algorithms for summarizing streaming data

Submitted Jun 15, 2015

Section: Full Talk Technical level: Intermediate

Introduce two approximate algorithms which are considered cornerstone of big data infrastructure.
These algorithms can be used to obtain a first hand summary of massive dataset in a streaming manner

Outline

Approximate algorithms can be used for processing huge streams of incoming data using only a single pass. These algorithms consume finite amount of memory and cpu cycles. They enable us to maintain summaries which are sufficient to answer expected queries about the data.

Two such novel algorithms, finding lots of applications in the industry today are

Count min sketch (CMS)
HyperLogLog

This talk aims to:

Provide a brief introduction to theoritical aspects behind these algorithms
How they can be leveraged to summarize unstructured data for practical purposes.
How to choose the tuning parameters pertinent to your needs.
Demonstrate how we have used them in Sumologic service.

Requirements

Interest in approximate algorithms, streaming algorithms

Speaker bio

Himadri Sarkar is a Software Engineer at Sumologic India where he is currently working in the are of search performance. Sumo Logic is a cloud-based log management and analytics service that leverages machine-generated big data to deliver real-time IT insights. Search performance team is responsible for delivering all the search related capabilities of the system.