Sweta

No Stats, No Problem: : Building Feedback-Driven Optimizers for Lakehouses

Submitted May 9, 2026

Modern query optimizers were designed assuming that the engine has reasonably good statistics: row counts, NDVs, histograms, column correlations, table freshness, and reliable cost models. In many lakehouse environments, that assumption breaks down. This talk is about building a query optimizer that can survive in that world. We will discuss a set of practical techniques for planning under limited statistics: LEO-style learning from executed queries, equivalence sets to compensate for missing NDV and semantic constraints, auto-stats driven by “magic number” sensitivity analysis, and complementary learning from both data and query execution.

The second half of the talk presents a research direction: online parametric query optimization for recurring BI workloads. Many lakehouse queries are templatized: the same SQL shape runs repeatedly with different customer IDs, time windows, geographies, product lines, or account bindings. Most bindings behave like the common case, but a few create plan cliffs — for example, a whale account or an unusually broad date range. We will examine how an optimizer can learn compact parameter-risk regions, maintain a bounded set of useful plans, and reduce tail-latency regret.

Attendees will leave with a mental model for optimizer design when statistics are incomplete by default: what to estimate, what to learn, what to collect, what to treat as uncertain, and where robust planning beats blind adaptivity

The session is aimed at database engineers, query optimizer developers, data platform teams, and practitioners running analytical SQL over lakehouse or object-store-backed systems.

Slide: https://docs.google.com/presentation/d/1oze2xvJuLeavgb9S2LCQrZRoZiLDldZD/edit?usp=sharing&ouid=103492819106331508633&rtpof=true&sd=true

Sweta Singh leads the SQL query optimizer team at e6data. She has over two decades of experience in database systems, query optimization, distributed systems, performance engineering, and workload management. Before E6data, she spent 19 years on the IBM Db2 development team. Her work spans cost-based optimization, statistics approximation, learning optimizers, join enumeration, workload management, distributed systems and OLTP performance engineering.
Renu Pinky Sumam is a Senior Software Engineer on the Query Optimizer team at E6data, with nearly 19 years of experience across relational database technology, cloud systems and AI. Before joining e6data, she worked at IBM on Db2 and IBM Cloud Object Storage, where she helped rearchitect the Cloud Object Storage billing infrastructure into a serverless, cloud-native architecture.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

We care about site reliability, cloud costs, security and data privacy