Rootconf 2016

Rootconf is India's principal conference where systems and operations engineers share real world knowledge about building resilient and scalable systems.

Harsh

@mindfuck

Internals of Hadoop, Hive and Hbase, and how we made it scalable and Highly available.

Submitted Jan 20, 2016

To discuss the internal architecture of Hadoop(hdfs),Hbase and Hive. I will also discuss, how we Designed our Data in Hive and Hbase based on our need, what problems we faced in production cluster and how we made it scalable and highly available.

Outline

In this talk, I will discuss my experience with hadoop, hive and hbase. I will first talk about hdfs architecture and its internal(like hdfs block, I/O operations while reading and writing, etc). I will cover basic workflow of mapreduce with yarn architecture.
In Hive, I will discuss about hive workflow, how execution engine executes DAG of stages(mapreduce jobs) and our use case for using hive.
Then I will discuss about hbase. Its complete architecute(block cache, Memstore etc),its internal flow and how it is linked with hdfs. I will also talk about how we designed row key in hbase according to our use case. At last I will discuss about scaling and high availablity of our production cluster.

Requirements

Basic knowledge of any file system internals.

Speaker bio

I am Harsh. I have worked 1.6 years in Directi(Media.net) as DevOps Engineer and currently I am working in Linkedin as Site Reliability Engineer.In Directi I have majorly worked on Hadoop technology and understood the internals of it.
https://www.linkedin.com/in/sharmaharsh1

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

We care about site reliability, cloud costs, security and data privacy