Dr. Hadoop – Diagnose your Hadoop Jobs
Have you faced a problem where you run a job or query on hadoop, which runs very slow, and you have no clue why? You look at your job details on jobtracker and get confused with hundreds of counters and configurations? You really don’t know how to make sense out of it. This is a very common challenge for hadoop beginners specially the analysts or the people coming from RDBMS world. This talk is about the solution that we have built to address this problem.
This talk is about a tool that we have developed within intuit – Dr. hadoop, which analyzes your job, identifies the areas of improvements and gives recommendations to improve its performance. It collects all the history logs, counters and configuration of your job, applies a set of rules and provides recommendations with suggested values and severity.
I am a hadoop performance engineer@Intuit. I have been working on hadoop performance for more than 3 years.
De-dup on Hadoop