The Fifth Elephant 2017
On data engineering and application of ML in diverse domains
Jul 2017
24 Mon
25 Tue
26 Wed
27 Thu 08:15 AM – 10:00 PM IST
28 Fri 08:15 AM – 06:25 PM IST
29 Sat
30 Sun
Nishant Bangarwa
Druid is an open-source analytics data store designed for business inteligence OLAP queries on timeseries data. Druid provides low latency real-time data ingestion, flexible data exploration and fast data aggregation. Many organizations have deployed Druid to analyze ad-tech, dev-ops, network traffic, website traffic, finance, sensor and IOT data.
Druid’s strong points are very compelling but there are some important features like large joins and full SQL support. This talk will present how Druid and Apache Hive can be used together to index large amounts of data and query Druid data sources from Hive using SQL, and execute complex Hive queries on top of Druid data sources. We will walk through the architecture of the solution leveraging Apache Calcite to overcome the challenge of transparently generating Druid JSON queries from the input Hive SQL queries. We conclude with a demo highlighting the performant and powerful integration of these projects.
Introduction to HIVE and Druid
Why HIVE + Druid
Architecture
Demo
performance results
Nishant is Druid PMC member and Software Engineer at Hortonworks. He is part of Business Intelligence team at Hortonworks. Prior to that he was part of Metamarkets backend team and was responsible for analytics infrastructure, including real-time analytics in Druid. He holds a B.Tech in Computer Science from National Institute of Technology, Kurukshetra, India.
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}