Approximate Query Processing

Jul 2018

23 Mon

24 Tue

25 Wed

26 Thu 07:45 AM – 06:15 PM IST

27 Fri 07:45 AM – 05:35 PM IST

28 Sat

29 Sun

NIMHANS Convention Centre, Bengaluru

Approximate Query Processing

Submitted Mar 31, 2018

Section: Crisp talk Technical level: Beginner

Data Analysts are constantly exploring for various forms of data and searching for new insights to make better decisions for their businesses. Email marketing team at Walmart relies heavily on Customer Segmenter, an in-house tool, which figures out which customers are best suited for an email advertisement based on various attributes. Conducting these data analytics were very costly though, both in terms of time and cluster resources, where even a simple query could take minutes to hours to complete. Most of the time, it is challenging for the analysts to know if their query is going to give them the information they need until they actually run their query and see the results and quite often they have to modify their query several times but the good thing is they don’t need exact results. To save marketing team from this painstaking experience, we use Verdict, a next generation query processor, which can save 100x-200x computational costs of your existing cluster. Verdict provides an immediate answer that is 99.9% accurate where as our analysts were okay with an error bound of five percent. An immediate results helps our analysts whether to go ahead and run the full query or modify their query to better fit an email campaign. Verdict is compatible with all existing SQL based databases and big data engines like Hive and spark for example.

Outline

Intro Slide
About me.
Data Analytics
Email Marketing at Walmart
Problem Statement: Conducting data analytics is time and resource consuming job. Most of the time the analysts don’t even know if their query is going to give them the information they need until they actually run their query and see the results and quite often they have to modify their query several times. An approximate result would have helped them making an early decision.
Introduction to VerdictDB
AQP-as-a-Middleware
Compatibility with existing SQL engines
Features of Verdict
References
QnA

Speaker bio

I’m sharing my experiences I’ve had at Walmart to solve a problem of long-running, resource consuming and often futile queries of Customer Segmentation faced by our Email Marketing team. Currently, I’m working on a in-house distributed database/streaming platform based on top of Kafka Streams for Walmart.

https://www.linkedin.com/in/deepak-iiit/

Slides

https://www.slideshare.net/DeepakGoyal25/approximate-query-processing

The Fifth Elephant 2018