HOWTO run a hadoop cluster on a laptop

Jul 2013

8 Mon

9 Tue

10 Wed

11 Thu 09:30 AM – 04:30 PM IST

12 Fri 10:15 AM – 05:30 PM IST

13 Sat 10:15 AM – 05:30 PM IST

14 Sun

Make a submission

Nimhans Convention Centre

Event format, themes and submission guidelines

The Fifth Elephant 2013 invites proposals on use cases and real-life examples. Tell us what specific problem you faced, which technology/tools worked for your use case and why, how you have developed business intelligence on the data you are collecting, and analytics tools and techniques you employ. Our preference is for showcasing original work with clear take-aways for the audience. Please emphasize these in your proposal.

The conference will have two parallel tracks on 12th and 13th July:

Storage: OLTP, messaging and notifications, databases and big data, NoSQL

Analytics: Metrics and tools, cloud computing, mathematical modelling and statistical analysis, visualization

This year we are adding a preliminary day of workshops, on 11th July, to provide attendees more in-depth, hands-on training on open source frameworks and tools (Pig, Hadoop, Hive, etc), commercial solutions (sponsored), programming languages such as R, and visualization techniques and tricks, among others.

We have a demo track for startups and companies who want to showcase their product to customers at The Fifth Elephant 2013 and get feedback. Slots are also open for 4-6 sponsored sessions for companies who want to talk about their technologies and reach out to developers, CTOs, CIOs and product managers at The Fifth Elephant. For more information on demo and sponsored session proposals, write to info@hasgeek.com.

HasGeek believes in open source as the foundation of the internet. Our aim is to strengthen these foundations for future generations. If your talk describes a codebase for developers to work with, we require that it is available under a license that does not impose itself on subsequent work. This is typically a permissive open source license (almost anything that is listed at opensource.org/licenses and is not GPL or AGPL), but restrictive and commercial licenses are also considered depending on how they affect the developer’s relationship with the user.

If you’d like to showcase commercial work that makes money for you, please consider supporting the event with a sponsorship.

Proposal selection process

Voting is open to attendees who have purchased event tickets. If there is a proposal you find notable, please vote for it and leave a comment to initiate discussions. Your vote will be reflected immediately, but will be counted towards selections only if you purchase a ticket. Proposals will also be evaluated by a program committee consisting of:

Emphasis will be placed on original work and talks which present new insights to the audience.

The programme committee will interview proposers who have received maximum votes from attendees and the committee. Proposers must submit presentation drafts as part of the selection process to ensure the talk is in line with the original proposal and to help the program committee build a coherent line-up for the event.

There is only one speaker per session. Attendance is free for selected speakers. HasGeek will cover your travel to and accommodation in Bangalore from anywhere in the world. As our budget is limited, we will prefer speakers from locations closer home, but will do our best to cover for anyone exceptional. If you are able to raise support for your trip, we will count that towards an event sponsorship.

If your proposal is not accepted, you can buy a ticket at the same rate as was available on the day you proposed. We’ll send you a code.

The program committee will announce the first round of selected proposals by end of April, a second round by end-May, and will finalize the schedule by 20th June. The funnel will close on 5th June. The event is on 11th-13th July 2013.

HOWTO run a hadoop cluster on a laptop

Submitted Apr 28, 2013

Section: Storage and Databases Technical level: Beginner

Most of the tutorials involving being consumers of hadoop instead of being developers of the core technology. And even then, doing it without the backing of a company or someone else to foot the bill for your cluster hardware is a problem that’s missing in most FAQs.

This is a problem worth attacking, to broaden the dev community and to let the hobbyist pick up all the simple bugs off the TODO list.

Outline

Hadoop has nearly become synonymous with Big Data. And this is because of the large community that stands behind the project. But this is a tough project to contribute to for someone who wants to work on hadoop in their spare time.

In a short session over half-hour, I want to share some of the convenience scripts I have accumulated over the last 6 months, which help me work with hadoop - at dev-scales.

For the purposes of a clean build env, we will use an Ubuntu LXC container to isolate the hadoop install from the rest of the system.

This provides the base container for your install, which we can clone later for multiple nodes of the cluster.

Now, with the help of a few convenience scripts & pre-packaged config files, you can download hadoop, build it and set up a single node cluster without much trouble.

This brings up a very brain-dead simple, non-secure hadoop cluster - easily extensible to a few nodes easily.

The multiple node setup is only useful to debug node locality and schedulers, but for most of the HDFS/Hadoop development, the single node cluster works wonders.

And all that takes you from a clean laptop to running a private hadoop instance, that you can recompile & redeploy in seconds.

The next step is contributing patches, which is left as an exercise to the reader.

Requirements

Basic java skills
Familiarity with Ubuntu

Speaker bio

Gopal Vijayaraghavan is a late entry into the hadoop game, having started working on it last year. Working with hadoop as part of the Stinger/Tez initiatives, he has gathered a lot of what used to be tribal knowledge in the hadoop community & discovered that most of it has never been written down. Having been exposed to some of the secrets behind working on hadoop for profit, he wants to share that with people who want to do that for fun.

The Fifth Elephant 2013

Event format, themes and submission guidelines

Workshops

Product demos and sponsored sessions

Commitment to open source

Proposal selection process

Dates