Speak at The Fifth Elephant 2026 Annual Conference
Share you work with the community
Jul 2026
27 Mon
28 Tue
29 Wed
30 Thu
31 Fri 09:00 AM – 06:00 PM IST
1 Sat
2 Sun
Aayush Naik
@naikaayush
Submitted Jun 25, 2026
Delta Lake makes the table mutable, but the underlying parquet files are physically immutable. In this talk, we will dive into the internals of Insert, Update and Delete operations. We begin with the introduction: Parquet (columnar storage), Delta log (txn record of all add & remove actions), and define that these constitute the Delta Table. We begin with INSERT, which is straightforward, write a new parquet file and append an add action to the log. The interesting one is the UPDATE.
The bulk of the talk will concentrate on UPDATE, we will walk through the idea of copy-on-write semantics. Starting with the naive appraoch (just rewrite every row in the table for a 1 row change), expose why it collapses, then we build the real mechanism. A first scan to find the rows that need to be updated, and retrive the file IDs of the files that contain these rows. A second scan of these file IDs only, and then we modify according to the query, and then we do 3 things.
We then extend the same idea to DELETE, but we introduce 3 types of deletes, full-table, partition, and predicate-based. And we then close with introducing merge-on-read concept with the help of DeletionVectors feature.
Footnotes: data-skipping optimizations such as partition pruning, min/max and column statistics are mentioned but skipped.
Takeaways
Audiences
Data engineers working with lakehouse architectures (Delta/Iceberg/Hudi). Platform engineers who own pipeline performance and storage costs, and anyone who has wondered why a small UPDATE rewrites large amounts of data. Useful for those debugging write amplification or evaluating when to enable deletion vectors.
Bio
I work at e6data as part of the query engine team. I am currently working on adding DML support on Deltalake tables to the rust-based query engine.
Presentation Link
https://pitch.com/v/delta-lake-internals-tjm7wv
A link to the workshop I conducted as part of Rootconf
https://hasgeek.com/rootconf/build-a-sql-query-engine-from-scratch-workshop/
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}