The Fifth Elephant 2012

Finding the elephant in the data.

The Elephant in the Cloud

Submitted by Joydeep Sen Sarma (@jsensarma) on Wednesday, 23 May 2012

videocam_off

Technical level

Intermediate

Section

Big Data Infrastructure & Processing

Session type

Lecture

Status

Confirmed

Vote on this proposal

Login to vote

Total votes:  +24

Objective

How do you build a big data service in the Cloud? How can we make queries against relatively slow Cloud Storage Systems fast? How can we take real advantage of the elasticity available in the Cloud? How do you make the Cloud dead easy to use for big data processing?

At Qubole we have been searching for answers to these questions and would love to share what we have discovered and built.

Description

Hadoop and frameworks on top of it like Hive are a popular application running in the Cloud. The Cloud architecture though is significantly different - in terms of it's elasticity, it's latency characteristics and it's pricing models than a regular data center. It can also be daunting to a lay user to understand and setup. In this talk we will describe how Qubole Data Service has adapted Hadoop and Hive to uniquely fit and exploit the Cloud architecture and make big data processing easy and accessible to all. The agenda will be roughly as follows:

  1. Start by covering some key characteristics of the Cloud.
  2. Describe the current state of art of running Big Data stack in the Cloud and the problems and opportunities for improvement in the above.
  3. Describe Qubole Architecture and how we have attempted to tackle some of these problems.
  4. Demonstrate some of the usability enhancements and go over some performance comparisons.

Speaker bio

Joydeep is a co-founder at Qubole and heads their India development team. Prior to starting Qubole - Joydeep worked at Facebook where he boot-strapped the data processing ecosystem based on Hadoop, started the Apache Hive project and led the Data Infrastructure team. Joydeep was a key contributor on the Facebook Messages architecture team that brought Apache HBase to Facebook and to the transactional and reporting backends for Facebook Credits. He has been a driver for other important sub-projects in the Hadoop ecosystem - like the FairScheduler and RCFile. Joydeep studied Computer Science at IIT-Delhi and University of Pittsburgh and started his career working on Oracle's database kernel and building highly available and scalable file systems at Netapp. In between - he has played founding roles in storage and advertising startups. He cut his teeth building data driven applications as the lead engineer on Yahoo's in-house Recommendation Platform.

Joydeep holds numerous patents, has many published papers and has been both speaker and panelist at Hadoop summits and at other Silicon Valley conferences.

Links

Comments

  • 1
    Raghav Kumar Gautam (@raghavgautam) 6 years ago

    Looking forward to it.

  • 1
    Govind Kanshi (@govindsk) 6 years ago

    This will be a great talk. Thanks Joydeep for doing it.

Login with Twitter or Google to leave a comment