The Fifth Elephant 2016

India's most renowned data science conference

Nishant Bangarwa

@nishantbangarwa

Scalable Realtime Analytics using Druid

Submitted Jul 6, 2016

Traditional SaaS solutions based on hadoop datastore Hive/Hbase or classical RDBMS work well for storing data, although they are not optimized for ingesting data and making it immediately available for interactive ad-hoc low latency queries at a very high scale. Long query latencies make these solutions suboptimal choices to power interactive applications. This talk will introduce Druid as a complementing solution for scalable real-time ingestion and analytics.

Druid is an open source distributed data warehouse, designed to support OLAP-like queries and is used in production at numerous companies. It was inspired by Google’s Dremel, PowerDrill and search framework. This talk will cover druid architecture, its storage internals and the common use cases druid is a good fit for.

Outline

  • History and Motivation
  • Live Demo
  • Druid Architecture
  • Storage Internals
  • Druid in Practice
  • Common Use Cases

Speaker bio

Nishant is an active contributor and PMC member for Druid. He is part of Business Intelligence team at Hortonworks. Prior to that he was part of Metamarkets backend team and was responsible for analytics infrastructure, including real-time analytics in Druid. He holds a B.Tech in Computer Science from National Institute of Technology, Kurukshetra, India.

Slides

https://speakerdeck.com/nishantneo/fifth-elephant-scalable-realtime-analytics-using-druid

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jump starting better data engineering and AI futures