The Fifth Elephant 2017
On data engineering and application of ML in diverse domains
Jul 2017
24 Mon
25 Tue
26 Wed
27 Thu 08:15 AM – 10:00 PM IST
28 Fri 08:15 AM – 06:25 PM IST
29 Sat
30 Sun
On data engineering and application of ML in diverse domains
Jul 2017
24 Mon
25 Tue
26 Wed
27 Thu 08:15 AM – 10:00 PM IST
28 Fri 08:15 AM – 06:25 PM IST
29 Sat
30 Sun
Rahul Ramesh
Performing basic operations like finding an element in a set or calculating its cardinality for a few thousands of data points is child’s play. However, it becomes complex and prohibitively expensive as the data-set grows into the millions and covers multiple dimensions.
One way of addressing this problem is by first indexing the data in a database, and then finding its cardinality or checking if an element is present in the database. However, this approach is not optimized for streaming data. Is it possible to perform these operations in a fixed amount of time with acceptable levels of trade-off over accuracy?
At DataWeave, we have managed to crawl millions of URLs every day, and analyze a large number of data points in real-time, with low error rates.
This talk presents an innovative way to build a monitoring dashboard using two probabilistic data structures - Bloom Filters and HyperLogLog.
I work as a Software Engineer in the data platforms team at DataWeave, a provider of Competitive Intelligence as a Service for retailers and consumer brands. I design and manage dataflows to various ‘Datastores’ maintained by the company. I also ensure that all datastores are working at optimum capacity, and data consistency is maintained across them.
I have more than 10 years of experience in the software industry with extensive experience in building core networks in the telecommunications domain. I hold a Master’s degree from IIIT-Bangalore.
https://drive.google.com/file/d/0ByAaSdfBUHSVbnBDZmhZVEtISFU/view?usp=sharing
Jul 2017
24 Mon
25 Tue
26 Wed
27 Thu 08:15 AM – 10:00 PM IST
28 Fri 08:15 AM – 06:25 PM IST
29 Sat
30 Sun
Hosted by
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}