Workshop: Learning ElasticSearch and using it to analyze Aadhaar's Public Datasets

This submission has been added to the schedule

Workshop: Learning ElasticSearch and using it to analyze Aadhaar's Public Datasets

Submitted Jun 5, 2013

Section: Workshops Technical level: Beginner

You have a large data-set, commercial off-the-shelf hardware and, a project deadline that is looming. How do you manipulate the data and extract useful information? And, how steep is the learning curve?

I would like to demonstrate that it is not too difficult to handle “big data” and, build your queries. And, together we will learn how to do just that.

Outline

ElasticSearch is a flexible and powerful open source, distributed real-time search and analytics engine. This workshop is planned for beginners and consists of two sections:

Learning ElasticSearch

This section covers the basics of ElasticSearch. Installation, common configuration options, what are indexes, documents, type mappings, aliases, querying using curl, tire and pyelasticsearch would be some of the topics covered.

Implementing all that into something useful

The Aadhaar project provides some publicly available data. The set is large enough to begin manipulating. We will be importing the dataset and trying to find if the oldest Indian is really 179 years old. Or, any other interesting queries we can come up with.

The intended take-away from the workshop is a deeper understanding of ElasticSearch and, an appreciation of simple underlying technologies that power the “Big Data” aspects. Plus, you’ll get to learn about queries - their cost and, how to plan for the cheapest or, fastest maps.

Requirements

A laptop (or VM) running Linux.
Know how to write simple programs in either Python or Ruby.
Preinstall ElasticSearch from http://www.elasticsearch.org/download to get started quickly.

The Fifth Elephant 2013