Vivek Sinha

@vivekwiki

High Concurrency & Low Latency Serving on Apache Iceberg — for Applications and AI Agents

Submitted May 25, 2026

Description

Everyone is putting their data into Apache Iceberg; almost no one is serving sub-second queries directly from it. Once data lands in Iceberg, a familiar question arises: how do you power real-time experiences without duplicating it in yet another serving system? This challenge is sharp in observability workloads like RUM, clickstream, and APM, and in customer-facing analytics dashboards where the cost of low latency often leads to pipeline sprawl. But the same problem is now showing up in AI: LLMs and agentic systems need fresh, structured facts as context, and most teams solve this by copying data into a vector store or a separate snapshot, creating yet another data silo. If Iceberg is already your source of truth, there is a better path.
In this talk, I will share how we at StarTree extended Apache Pinot’s query engine to run directly on Apache Iceberg, and what it takes to serve both product applications and AI agents from the same stack. I will walk through a Kafka to Iceberg to Pinot/StarTree architecture where Iceberg stays the single source of truth, Pinot powers high-concurrency production queries, and AI agents connect to the same query layer as a tool endpoint via MCP or a REST interface. The session covers Pinot indexing and pruning for selective reads, parallel prefetching of Iceberg blocks over S3, and how this same low-latency layer can ground LLM responses with live structured data, without an extra vector store for tabular facts. I will close with benchmark results on roughly 1 TB across realistic query shapes and real-world patterns for both user-facing product analytics and agentic retrieval workflows.

Takeaways

  1. A practical blueprint for a single Kafka to Iceberg to Pinot/StarTree stack that serves both real-time product applications and AI agents without data duplication, extra ETL, or a separate vector store for structured data.
  2. A concrete pattern for connecting LLM agents to a live Iceberg-backed query layer using tool calling or MCP, so agents retrieve fresh structured context (metrics, aggregates, time-series) at sub-second latency rather than relying on stale snapshots.

Who Should Attend

Data engineers, platform engineers, and backend engineers building real-time analytics products or AI-powered applications who are evaluating Apache Iceberg as a data lakehouse foundation and want to avoid duplicating data across multiple serving systems.

Bio

Vivek Sinha is a Product Manager at StarTree, where he leads the charter to extend Apache Pinot’s query engine to run directly on Apache Iceberg, enabling high concurrency, low latency analytics without data duplication. With over a decade of experience across database systems, ETL, data lakes, and OLAP, from founding engineer to PM at Hevo Data, he has shipped data infrastructure products at Fortune 500 scale across batch and real-time processing. He also leads the AI Initiative charter for the Ingestion and Applications layer on StarTree Pinot, shaping how AI-native workloads are ingested and served at scale.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jumpstart better data engineering and AI futures