Workshop: Learning ElasticSearch and using it to analyze Aadhaar's Public Datasets
Submitted by Anurag (@anurag) on Wednesday, 5 June 2013
You have a large data-set, commercial off-the-shelf hardware and, a project deadline that is looming. How do you manipulate the data and extract useful information? And, how steep is the learning curve?
I would like to demonstrate that it is not too difficult to handle "big data" and, build your queries. And, together we will learn how to do just that.
ElasticSearch is a flexible and powerful open source, distributed real-time search and analytics engine. This workshop is planned for beginners and consists of two sections:
- Learning ElasticSearch
This section covers the basics of ElasticSearch. Installation, common configuration options, what are indexes, documents, type mappings, aliases, querying using curl, tire and pyelasticsearch would be some of the topics covered.
- Implementing all that into something useful
The Aadhaar project provides some publicly available data. The set is large enough to begin manipulating. We will be importing the dataset and trying to find if the oldest Indian is really 179 years old. Or, any other interesting queries we can come up with.
The intended take-away from the workshop is a deeper understanding of ElasticSearch and, an appreciation of simple underlying technologies that power the "Big Data" aspects. Plus, you'll get to learn about queries - their cost and, how to plan for the cheapest or, fastest maps.
- A laptop (or VM) running Linux.
- Know how to write simple programs in either Python or Ruby.
- Preinstall ElasticSearch from http://www.elasticsearch.org/download to get started quickly.
Anurag works with Red Hat at their Pune office. He's a part of Engineering Services group and loves to play with APIs.