The Fifth Elephant 2012

Finding the elephant in the data.

What are your users doing on your website or in your store? How do you turn the piles of data your organization generates into actionable information? Where do you get complementary data to make yours more comprehensive? What tech, and what techniques?

The Fifth Elephant is a two day conference on big data.

Early Geek tickets are available from fifthelephant.doattend.com.

The proposal funnel below will enable you to submit a session and vote on proposed sessions. It is a good practice introduce yourself and share details about your work as well as the subject of your talk while proposing a session.

Each community member can vote for or against a talk. A vote from each member of the Editorial Panel is equivalent to two community votes. Both types of votes will be considered for final speaker selection.

It’s useful to keep a few guidelines in mind while submitting proposals:

  1. Describe how to use something that is available under a liberal open source license. Participants can use this without having to pay you anything.

  2. Tell a story of how you did something. If it involves commercial tools, please explain why they made sense.

  3. Buy a slot to pitch whatever commercial tool you are backing.

Speakers will get a free ticket to both days of the event. Proposers whose talks are not on the final schedule will be able to purchase tickets at the Early Geek price of Rs. 1800.

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

Vijay Rayapati

@amnigos

Lessons Learned : Scaling Hadoop and BigData in Cloud (Amazon EMR )

Submitted Mar 20, 2012

High level technology and business perspective around BigData including how to and why leverage cloud based platforms like Amazon EMR along with Map Reduce for data analysis.

Expect to learn concepts, insights, challenges in problem solving, scaling, dealing with data and performance tuning.

Outline

I will be talking about Hadoop, Map Reduce in general and how to leverage Cloud based platforms like Amazon EMR for Hadoop Map Reduce jobs. Will be sharing lessons learned from managing hyper scale production Hadoop clusters and tuning for performance in general – Think 68400 GB RAM, 26000 CPUs and 1700000 GB Disks :)

This talk is aimed at providing insights and challenges while building large scale data analysis platforms using Hadoop and technical challenges involved in scaling algorithms to data storage to json parsers to in-memory data stores to managing 100’s of jobs.

Requirements

Basic understanding of Hadoop will be good.

Speaker bio

An Engineer (aka CTO) working at Kuliza on Platforms, Cloud and BigData. Previously at #Startups #GizaPage #Trilogy #eFoodlet #Michelin.

Have worked on large scale web platforms and distributed systems for enterprises and consumer web. At Kuliza, we work on PetaByte scale EMR clusters with 100’s of nodes running Hadoop for data analysis. Our cloud team runs 500+ production servers with every possible #technical #stack #configuration out there!

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more