HOWTO run a hadoop cluster on a laptop

This submission has been added to the schedule

HOWTO run a hadoop cluster on a laptop

Submitted Apr 28, 2013

Section: Storage and Databases Technical level: Beginner

Most of the tutorials involving being consumers of hadoop instead of being developers of the core technology. And even then, doing it without the backing of a company or someone else to foot the bill for your cluster hardware is a problem that’s missing in most FAQs.

This is a problem worth attacking, to broaden the dev community and to let the hobbyist pick up all the simple bugs off the TODO list.

Outline

Hadoop has nearly become synonymous with Big Data. And this is because of the large community that stands behind the project. But this is a tough project to contribute to for someone who wants to work on hadoop in their spare time.

In a short session over half-hour, I want to share some of the convenience scripts I have accumulated over the last 6 months, which help me work with hadoop - at dev-scales.

For the purposes of a clean build env, we will use an Ubuntu LXC container to isolate the hadoop install from the rest of the system.

This provides the base container for your install, which we can clone later for multiple nodes of the cluster.

Now, with the help of a few convenience scripts & pre-packaged config files, you can download hadoop, build it and set up a single node cluster without much trouble.

This brings up a very brain-dead simple, non-secure hadoop cluster - easily extensible to a few nodes easily.

The multiple node setup is only useful to debug node locality and schedulers, but for most of the HDFS/Hadoop development, the single node cluster works wonders.

And all that takes you from a clean laptop to running a private hadoop instance, that you can recompile & redeploy in seconds.

The next step is contributing patches, which is left as an exercise to the reader.

Requirements

Basic java skills
Familiarity with Ubuntu

Speaker bio

Gopal Vijayaraghavan is a late entry into the hadoop game, having started working on it last year. Working with hadoop as part of the Stinger/Tez initiatives, he has gathered a lot of what used to be tribal knowledge in the hadoop community & discovered that most of it has never been written down. Having been exposed to some of the secrets behind working on hadoop for profit, he wants to share that with people who want to do that for fun.

Links

https://github.com/t3rmin4t0r/hadoop-autobuild

The Fifth Elephant 2013