Introducing Hive-Kafka Integration for Real-Time Kafka SQL Queries

Jun 2019

17 Mon

18 Tue

19 Wed

20 Thu

21 Fri 08:45 AM – 05:40 PM IST

22 Sat 09:00 AM – 05:30 PM IST

23 Sun

NIMHANS Convention Centre, Bangalore

Introducing Hive-Kafka Integration for Real-Time Kafka SQL Queries

Submitted Feb 25, 2019

Section: Crisp talk of 20 mins duration Technical level: Intermediate

I will explain the SQL access pattern for Kafka and how to get it tp work with the new Kafka Hive Integration.
Stream processing engines/libraries like Kafka Streams provide a programmatic stream processing access pattern to Kafka. Application developers love this access pattern but when you talk to BI developers, their analytics requirements are quite different which are focused on use cases around ad hoc analytics, data exploration, and trend discovery. BI persona requirements for Kafka access include:

Treat Kafka topics/streams as tables.
Support for ANSI SQL.
Support complex joins (different join keys, multi-way join, join predicate to non-table keys, non-equi joins, multiple joins in the same query).
UDF support for extensibility.
JDBC/ODBC support.
Creating views for column masking.
Rich ACL support including column level security.

To address these requirements, the new HDP 3.1 release has added a new Hive Storage Handler for Kafka which allows users to view Kafka topics as Hive tables. This new feature allows BI developers to take full advantage of Hive analytical operations/capabilities including complex joins, aggregations, window functions, UDFs, pushdown predicate filtering, windowing, etc.

Rootconf 2019

Introducing Hive-Kafka Integration for Real-Time Kafka SQL Queries

Outline

Speaker bio

Slides

Comments