Interactive Realtime Dashboards on Data Streams using Kafka, Druid and Superset
Submitted by Nishant Bangarwa (@nishantbangarwa) on Tuesday, 6 June 2017
Section: Full talk for data engineering track Technical level: Intermediate
When interacting with analytics dashboards in order to achieve a smooth user experience, two major key requirements are quick response time and data freshness. To meet the requirements of creating fast interactive BI dashboards over streaming data, organizations often struggle with selecting a proper serving layer.
Cluster computing frameworks such as Hadoop or Spark work well for storing large volumes of data, although they are not optimized for making it available for queries in realtime. long query latencies also make these systems sub-optimal choices for powering interactive dashboards and BI use-cases.
This talk presents an open source realtime data analytics stack using Apache Kafka, Druid and Superset. The stack combines the low latency streaming and processing capabilities of Kafka with Druid which enables immediate exploration and provides low latency queries over the ingested data streams. Superset provides the visualization and dashboarding that integrates nicely with Druid. In this talk we will discuss why this architecture is well suited to interactive applications over streaming data, present an end to end demo of complete stack and discuss its key features and discuss performance characterstics from real world use-cases.
Apache Kafka as Streaming and Processing Layer.
Druid as Serving Layer
Superset as Visualization layer
Key features of Analytics Stack
Nishant is Druid PMC member and Software Engineer at Hortonworks. He is part of Business Intelligence team at Hortonworks. Prior to that he was part of Metamarkets backend team and was responsible for analytics infrastructure, including real-time analytics in Druid. He holds a B.Tech in Computer Science from National Institute of Technology, Kurukshetra, India.