It takes two to tango! - Is SQL-on-Hadoop the next big step?
To explore the trend of SQL-on-Hadoop. This talk will focus on some of the recent attempts (OSS and Commercial) to get SQL running on Hadoop.
Since early days the Hadoop community has made several attempts to stretch Hadoop beyond its role as a distributed programming framework. The key strength Hadoop that brings to the table is its ability to scale linearly. Can we combine this advantage of Hadoop with the efficiency of databases? What does it take to run SQL over Hadoop?
Running SQL-on-Hadoop implies accessing data from "within" Hadoop using SQL as the interface. Accomplishing this demands a significant re-architecture of the storage and compute infrastructures.
SQL-on-Hadoop shifts Hadoop's role from being a technology, viewed so far as complementary to databases into something that could compete with them. Its perhaps the feature that will help Hadoop find its way into more enterprises without them having to reinvent themselves as Map Reduce experts. As a result of this we perhaps won't need separate data stores for structured and unstructured data in the future!
Srihari currently heads the technology organization for ThoughtWorks India. He's been a developer and architect for several enterprise applications with focus on building large scale systems based on service oriented architectures, domain specific languages etc. He is quite passionate about distributed systems and databases and blogs about them on www.systemswemake.com.