The Fifth Elephant 2012

Finding the elephant in the data.

Building the Infrastructure to Handle Big Data

Submitted by Prashant Kumar on Thursday, 28 June 2012

videocam_off

Technical level

Intermediate

Section

Big Data Infrastructure & Processing

Session type

Lecture

Status

Confirmed

Vote on this proposal

Login to vote

Total votes:  +53

Objective

Understanding the pieces involved in building a generic Big data infrastructure

Description

Description : I’ll be sharing our experience at PromptCloud where we deal with terabytes of data daily. This talk will cover- how to build a multi-purpose technology stack to manage large scale data and the numerous processes that operate on it. We’ll cover following topics in various depths depending upon audience inclination:

  1. Automated installs for clusters - (Vagrant, Chef)
  2. Job scheduling (Resque/Celery)
  3. Process and cluster monitoring( God,Monit, Sensu), peripheral mention of Nagios
  4. Queue - the perennial suspect( RabbitMQ)
  5. Data storage - HBase, MongoDB, Voldemort, Riak
  6. Dashboard - Sensu

If there is time we can take up code integration and review (git,gerrit,jenkins) which are not directly needed but are crucial if the team size dealing with the above mentioned infrastructure increases.

Speaker bio

Speaker Bio : Prashant passed out from IIT Kanpur and then joined Yahoo! data team where he worked for about a year. Among other things he worked on the analytics for the new version of Yahoo! front page then.

Currently Prashant runs a leading Big data and Cloud computing startup PromptCloud (http://promptcloud.com) where flagship product is large scale data crawl and extraction and hosted indexing of the data. PromptCloud also deals with Big data analytics but at a limited scale as of now. This talk will be based on many of the learnings at PromptCloud .

PS : I have submitted another talk for Riak, depending upon which (if any :) ) has enough interest , we’ll pick one.

Link for the Riak proposal - http://funnel.hasgeek.com/5el/394-riak-robust-and-featureful-distributed-key-value-store

Links

Comments

  • 2
    Ranjan Piyush (@piyushranjan1) 6 years ago

    must have talk!

  • 1
    shanu suhail (@tinystep) 6 years ago

    I look forward to this talk

  • 1
    Prashant Kumar (@prashantkr) 6 years ago

    Here is the link for slides : http://www.slideshare.net/promptcloud/building-infrastructure-for-big-data

    Also shared in our company updates on LinkedIn.

Login with Twitter or Google to leave a comment