Big Data Structures
Submitted by Ranganathan B (@ranganathan) on Sunday, 24 April 2016
Analysis of terabyte data sets by heavy data processing are common tasks these days. A data structure is a particular way of organizing data in a computer so that it can be used efficiently. For Big Data, the computer changes to a cluster and also the way of organizing the data is distributed. The usage patterns are changing from being precise changes to being probabilistic. False positive matches are acceptable (with small error rates), false negatives are not. For rapidness, approximations are acceptable, with small percentage of precision.
There are few data structures used for practical results, for specific use-cases, with parameters based on expected data volume and required error probability which I call – “The Big Data Structures”. This talk highlights on use-case based examples of these Big Data Structures.
All are use-cases with exising solution and improved solutions
- Use case 1: Cardinality - Use case 2: Frequency - Use case 3: Membership - Use case 4: Verification
Ranganathan has nearly eleven years of experience of developing awesome products and loves to works on full stack - from front end, to backend and scale. Though graduated as civil engineer, he worked with few software companies, tried two startups and at present works for ThoughtWorks as Technology Lead, where he is contributing to open source products. He runs the one of the top technology meetups in Hyderabad - Hyderabad Scalability Meetup. He is very interested in exploring Big data technologies and a regular speaker. He has recently spoke in Apache Big Data Europe 2015, Apache Big Data North America 2016, GIDS 2015, GIDS 2016, and many other meetups and conferences.