Internals of Hadoop, Hive and Hbase, and how we made it scalable and Highly available.
Submitted by Harsh (@mindfuck) on Wednesday, 20 January 2016
To discuss the internal architecture of Hadoop(hdfs),Hbase and Hive. I will also discuss, how we Designed our Data in Hive and Hbase based on our need, what problems we faced in production cluster and how we made it scalable and highly available.
In this talk, I will discuss my experience with hadoop, hive and hbase. I will first talk about hdfs architecture and its internal(like hdfs block, I/O operations while reading and writing, etc). I will cover basic workflow of mapreduce with yarn architecture.
In Hive, I will discuss about hive workflow, how execution engine executes DAG of stages(mapreduce jobs) and our use case for using hive.
Then I will discuss about hbase. Its complete architecute(block cache, Memstore etc),its internal flow and how it is linked with hdfs. I will also talk about how we designed row key in hbase according to our use case. At last I will discuss about scaling and high availablity of our production cluster.
Basic knowledge of any file system internals.
I am Harsh. I have worked 1.6 years in Directi(Media.net) as DevOps Engineer and currently I am working in Linkedin as Site Reliability Engineer.In Directi I have majorly worked on Hadoop technology and understood the internals of it.