Rows, columns, and consequences

Speak at Rootconf’s Special Edition on Databases

Anirudh Rowjee

Anirudh Rowjee

@Rowjee

A Survey of LSM Trees backed by Object Storage

Submitted Apr 25, 2026

Ever since Patrick O’Neil’s seminal 1996 paper, Log-Structured Merge Trees (LSM Trees) have paved the way for storage systems with excellent write-time performance while keeping read-time performance very competitive. As data volumes grow, the LSM Tree emerges as the ideal storage system for large volumes of high-throughput data. With the rise of AI Systems and with humans generating more data than ever, the scalability and performance of storage systems is of paramount importance. Simultaneously, we’re seeing dramatic improvements in the capacity and performance of large-scale Object Storage services - AWS S3 now has a native vector search engine, a new way to perform atomic writes, and now a way to access S3 as a filesystem, highlighting the sheer scale and quality of workloads that are moving towards Object Storage. This, combined with the emergence of data lake and warehouse formats such as Iceberg and Delta Lake, prove that Object Storage is no longer just a destination for backups - it’s emerging as a first-class storage tier, especially for massive volumes of data.

The LSM Tree with its immutable and tiered-by-design structure is uniquely positioned to take advantage of the new wave of storage systems - but as we know, there’s no free lunch - tradeoffs are inevitable! This talk walks through the foundational design of the LSM Tree, and how the present state of the art pushes this design to evolve, walking through some of the latest advancements in the space.

We examine questions like - what does data compaction look like when done with object storage in the loop? How is read latency impacted by having to read from different tiers of storage at once? What do these systems cost, given object storage pricing models? - and we’ll walk away with more insight and awareness into what really makes these object-storage-backed LSM Trees tick, and more importantly, fail.

Key Takeaways:

  1. How State-of-the-art Object-Storage backed LSM Trees are architected
  2. What does the research tell us are the unique failure modes of these storage engines?
  3. What are the tradeoffs we need to make when building something like this?

This talk will be beneficial for students, engineers, and really, anyone curious about how LSM Trees are evolving to meet new storage demands. The audience is expected to have familiarity with storage systems, and a passable understanding of distributed systems.

Anirudh Rowjee is a Software Engineer at Couchbase, working on Magma, Couchbase’s High Density LSM-Tree based OLTP Storage Engine. He also started the Bengaluru Systems Meetup with friends, and is an avid reader, runner, and (systems engineering and distributed systems) enthusiast.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

We care about site reliability, cloud costs, security and data privacy