Internals of Hadoop, Hive and Hbase, and how we made it scalable and Highly available.

Apr 2016

11 Mon

12 Tue

13 Wed

14 Thu 08:30 AM – 10:05 PM IST

15 Fri 08:30 AM – 05:30 PM IST

16 Sat 08:00 AM – 05:30 PM IST

17 Sun 08:30 AM – 01:00 PM IST

MLR Convention Centre, J P Nagar, Bangalore

Internals of Hadoop, Hive and Hbase, and how we made it scalable and Highly available.

Submitted Jan 20, 2016

Section: Full talk Technical level: Beginner

To discuss the internal architecture of Hadoop(hdfs),Hbase and Hive. I will also discuss, how we Designed our Data in Hive and Hbase based on our need, what problems we faced in production cluster and how we made it scalable and highly available.

Outline

In this talk, I will discuss my experience with hadoop, hive and hbase. I will first talk about hdfs architecture and its internal(like hdfs block, I/O operations while reading and writing, etc). I will cover basic workflow of mapreduce with yarn architecture.
In Hive, I will discuss about hive workflow, how execution engine executes DAG of stages(mapreduce jobs) and our use case for using hive.
Then I will discuss about hbase. Its complete architecute(block cache, Memstore etc),its internal flow and how it is linked with hdfs. I will also talk about how we designed row key in hbase according to our use case. At last I will discuss about scaling and high availablity of our production cluster.

Requirements

Basic knowledge of any file system internals.

Speaker bio

I am Harsh. I have worked 1.6 years in Directi(Media.net) as DevOps Engineer and currently I am working in Linkedin as Site Reliability Engineer.In Directi I have majorly worked on Hadoop technology and understood the internals of it.
https://www.linkedin.com/in/sharmaharsh1

Rootconf 2016