Vivek Sinha

@vivekwiki

High Concurrency & Low Latency Serving on Apache Iceberg

Submitted May 25, 2026

Description

Everyone is putting their data into Apache Iceberg; almost no one is serving sub-second queries directly from it. Once data lands in Iceberg, a familiar question arises: how do you power real-time experiences without duplicating it in yet another serving system? This challenge is especially sharp in observability workloads like RUM, clickstream, and APM, and in customer-facing analytics dashboards, where the cost of low latency often leads to pipeline sprawl. Data teams want strong isolation and fast queries, but not a web of ETL jobs and redundant data copies.

In this talk, I’ll share how we at StarTree built a fast query layer for Apache Iceberg using Apache Pinot, and what it takes to make Iceberg behave like a low-latency analytics store at scale. I’ll walk through a Kafka to Iceberg to Pinot/StarTree architecture where Iceberg stays the single source of truth while Pinot powers production queries. The session covers the key design decisions: Pinot indexing and pruning for selective reads, parallel prefetching of Iceberg blocks over S3, and trading off local vs remote storage for cost and latency. I’ll close with benchmark results on roughly 1 TB across realistic query shapes and real-world examples, both internal customer metrics exploration and external product analytics, all without extra ETL or duplicate data.

Takeaways

  1. A practical blueprint for adding a low-latency serving layer on top of Apache Iceberg without data duplication or additional ETL pipelines.
  2. A reference Kafka to Iceberg to Pinot/StarTree architecture with concrete design choices and benchmark data (QPS and latency on roughly 1 TB) for filters and aggregates across primitive and complex types.

Who Should Attend

Data engineers, platform engineers, and backend engineers building real-time or user-facing analytics products who are evaluating or already using Apache Iceberg as a data lakehouse foundation.

Bio

Vivek Sinha is a Product Manager at StarTree, where he leads the charter to extend Apache Pinot’s query engine to run directly on Apache Iceberg, enabling high concurrency, low latency analytics without data duplication. With over a decade of experience across database systems, ETL, data lakes, and OLAP, from founding engineer to PM at Hevo Data, he has shipped data infrastructure products at Fortune 500 scale across batch and real-time processing. He also leads the AI Initiative charter for the Ingestion and Applications layer on StarTree Pinot, shaping how AI-native workloads are ingested and served at scale.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

We care about site reliability, cloud costs, security and data privacy