Speak at The Fifth Elephant 2026 Annual Conference
Share you work with the community
Jul 2026
27 Mon
28 Tue
29 Wed
30 Thu
31 Fri 09:00 AM – 06:00 PM IST
1 Sat
2 Sun
Share you work with the community
Jul 2026
27 Mon
28 Tue
29 Wed
30 Thu
31 Fri 09:00 AM – 06:00 PM IST
1 Sat
2 Sun
Apurva
@apurvarathi
Submitted Jun 23, 2026
Lakehouses built on open table formats have emerged as the de facto architecture for modern analytical data systems, yet few practitioners appreciate how deeply database internals underpin their design. Modern open table formats are often described as metadata layers on top of Parquet files, but beneath the surface they have quietly reinvented many of the core ideas that powered databases for decades. Delta Lake uses transaction logs reminiscent of database write-ahead logs (WALs), Iceberg relies on hierarchical metadata structures that behave like indexes, while Hudi and Paimon draw inspiration from storage-engine concepts such as log-structured storage and compaction.
This talk explores how the move to object storage forced data systems to reimplement transactions, concurrency control, indexing, catalogs, compaction & snapshot isolation and what these design choices reveal about the convergence of databases and lakehouses. Through architectural comparisons of Delta Lake, Apache Iceberg, Apache Hudi, and Apache Paimon, attendees will gain a deeper understanding of the storage and metadata foundations that power modern analytical platforms.
We conclude by exploring a growing industry trend toward tiered architectures that combine specialised storage engines for hot data with open table formats for historical data, raising an intriguing question: are we beginning to come full circle after a decade of rebuilding database primitives on top of object stores?
Note: The talk assumes familiarity with data lakes and SQL analytics but does not require prior knowledge of open table formats.
This session is intended for data engineers, data platform engineers, software engineers, and technical architects who want to understand the database and storage-system concepts that underpin modern lakehouse architectures such as Delta Lake, Iceberg, Hudi, and Paimon. Familiarity with data lakes or analytical data platforms is helpful but no prior knowledge of open table formats is required.
Apurva Rathi is a data platform engineer with over a decade of experience building large-scale data systems at Atlassian, Meta and Mastercard. Her work has spanned everything from data engineering to building self-service, governed data platforms that enable organizations to make data-driven decisions at scale.
Most recently, she led the development of core components of Atlassian’s next-generation data platform, including self-service transformation, ingestion, data sharing, and metrics capabilities. During her sabbatical, she has been exploring the internals of modern data systems, with a particular focus on open table formats, storage and query engines, stream processing, and AI-native data architectures. She has authored a series of technical deep-dives on lakehouse and streamhouse architectures and enjoys connecting contemporary data systems back to the fundamental concepts that inspired them.
Coming Soon
ETA: 25th June
Jul 2026
27 Mon
28 Tue
29 Wed
30 Thu
31 Fri 09:00 AM – 06:00 PM IST
1 Sat
2 Sun
Hosted by
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}