Speak at The Fifth Elephant 2026 Annual Conference
Share you work with the community
Jul 2026
13 Mon
14 Tue
15 Wed
16 Thu
17 Fri 09:00 AM – 06:00 PM IST
18 Sat 09:00 AM – 06:00 PM IST
19 Sun
Vivek Sinha
@vivekwiki
Submitted Jun 1, 2026
Everyone is putting their data into Apache Iceberg; almost no one is serving sub-second queries directly from it. Once data lands in Iceberg, a familiar question arises: how do you power real-time experiences without duplicating it in yet another serving system? This challenge is especially sharp in observability workloads like RUM, clickstream, and APM, and in customer-facing analytics dashboards, where the cost of low latency often leads to pipeline sprawl. Data teams want strong isolation and fast queries, but not a web of ETL jobs and redundant data copies.
In this talk, I’ll share how we at StarTree built a fast query layer for Apache Iceberg using Apache Pinot, and what it takes to make Iceberg behave like a low-latency analytics store at scale. I’ll walk through a Kafka to Iceberg to Pinot/StarTree architecture where Iceberg stays the single source of truth while Pinot powers production queries. The session covers the key design decisions: Pinot indexing and pruning for selective reads, parallel prefetching of Iceberg blocks over S3, and trading off local vs remote storage for cost and latency. I’ll close with benchmark results on roughly 1 TB across realistic query shapes and real-world examples, both internal customer metrics exploration and external product analytics, all without extra ETL or duplicate data.
Data engineers, platform engineers, and backend engineers building real-time or user-facing analytics products who are evaluating or already using Apache Iceberg as a data lakehouse foundation.
Vivek Sinha is a Product Manager at StarTree, where he leads the charter to extend Apache Pinot’s query engine to run directly on Apache Iceberg, enabling high concurrency, low latency analytics without data duplication. With over a decade of experience across database systems, ETL, data lakes, and OLAP, from founding engineer to PM at Hevo Data, he has shipped data infrastructure products at Fortune 500 scale across batch and real-time processing. He also leads the AI Initiative charter for the Ingestion and Applications layer on StarTree Pinot, shaping how AI-native workloads are ingested and served at scale.
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}