The Fifth Elephant 2016

India's most renowned data science conference

Ranganathan B

@ranganathan

Big Data Structures

Submitted Apr 24, 2016

Analysis of terabyte data sets by heavy data processing are common tasks these days. A data structure is a particular way of organizing data in a computer so that it can be used efficiently. For Big Data, the computer changes to a cluster and also the way of organizing the data is distributed. The usage patterns are changing from being precise changes to being probabilistic. False positive matches are acceptable (with small error rates), false negatives are not. For rapidness, approximations are acceptable, with small percentage of precision.

There are few data structures used for practical results, for specific use-cases, with parameters based on expected data volume and required error probability which I call -- “The Big Data Structures”. This talk highlights on use-case based examples of these Big Data Structures.

Outline

All are use-cases with exising solution and improved solutions

  • Use case 1: Cardinality
  • Use case 2: Frequency
  • Use case 3: Membership
  • Use case 4: Verification

Speaker bio

Ranganathan has nearly eleven years of experience of developing awesome products and loves to works on full stack - from front end, to backend and scale. Though graduated as civil engineer, he worked with few software companies, tried two startups and at present works for ThoughtWorks as Technology Lead, where he is contributing to open source products. He runs the one of the top technology meetups in Hyderabad - Hyderabad Scalability Meetup. He is very interested in exploring Big data technologies and a regular speaker. He has recently spoke in Apache Big Data Europe 2015, Apache Big Data North America 2016, GIDS 2015, GIDS 2016, and many other meetups and conferences.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jump starting better data engineering and AI futures