Scalable Realtime Analytics using Druid
Submitted by Nishant Bangarwa (@nishantbangarwa) on Wednesday, 6 July 2016
Traditional SaaS solutions based on hadoop datastore Hive/Hbase or classical RDBMS work well for storing data, although they are not optimized for ingesting data and making it immediately available for interactive ad-hoc low latency queries at a very high scale. Long query latencies make these solutions suboptimal choices to power interactive applications. This talk will introduce Druid as a complementing solution for scalable real-time ingestion and analytics.
Druid is an open source distributed data warehouse, designed to support OLAP-like queries and is used in production at numerous companies. It was inspired by Google’s Dremel, PowerDrill and search framework. This talk will cover druid architecture, its storage internals and the common use cases druid is a good fit for.
- History and Motivation
- Live Demo
- Druid Architecture
- Storage Internals
- Druid in Practice
- Common Use Cases
Nishant is an active contributor and PMC member for Druid. He is part of Business Intelligence team at Hortonworks. Prior to that he was part of Metamarkets backend team and was responsible for analytics infrastructure, including real-time analytics in Druid. He holds a B.Tech in Computer Science from National Institute of Technology, Kurukshetra, India.