Rootconf Mini 2024 (on 22nd & 23rd Nov)

Geeking out on systems and security since 2012

Srikanth Venugopalan

@sriv_e6x Author

Object storage for new use cases through Indexes on lakehouses

Submitted Oct 24, 2024

ABSTRACT:

Object storage has been around for a long time. While it is a cheap and scalable storage option, it has been traditionally limited to use cases such as storing unstructured data, or as a blob storage for binary data. With data footprints growing at an exponential rate, object storage is being used for a class of use cases that were previously thought to be impossible. While the most well-known example is the use of object storage for SQL analytics on structured data, engineering teams are exploring its use for use cases involving logs and metrics, IOT and sensor data, geospatial data, vector search, etc.

In this talk, we will explore an option where a few techniques can help Object Storage process large amounts of data with low latency and high concurrency requirements. Specifically, we will explore the world of Indices, table format and Vectorized data fetching that can help achieve this goal.

We will also talk about heterogeneous data and how they can be unified at retrieval time to build a fit-for-purpose resultset, and how indexing helps achieve these.

We will show some benchmarks around the experiments that we have been running around information retrieval at scale, on various cloud platforms.

KEY TAKEAWAYS:

  1. Learn about database internals around Indices (we will go into some depth on 2 types of database indices)

  2. Get some insights on some of the limitations that may hit you when you try to access data at scale from Object Stores

  3. Get introduced to some useful write patterns that can help simplify retrieval

AUDIENCE:

  1. Data Engineers - Individuals who are building data infrastructure and platforms typically handle large-scale data processing and relevant workloads.

  2. Cloud Architects - Those who build the ideal strategy for various use cases that require information retrieval or analytics on large datasets stored in an Object Store.

  3. Database internals developers/enthusiasts - Anyone who builds databases or is interested in building one or even just curious about how Lakehouse engines work their way around large data.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

We care about site reliability, cloud costs, security and data privacy

Supported by

Platinum Sponsor

Nutanix is a global leader in cloud software, offering organizations a single platform for running apps and data across clouds.

Platinum Sponsor

PhonePe was founded in December 2015 and has emerged as India’s largest payments app, enabling digital inclusion for consumers and merchants alike.

Silver Sponsor

The next-gen analytics engine for heavy workloads.

Sponsor

Community sponsor

Peak XV Partners (formerly Sequoia Capital India & SEA) is a leading venture capital firm investing across India, Southeast Asia and beyond.

Venue host - Rootconf workshops

Thoughtworks is a pioneering global technology consultancy, leading the charge in custom software development and technology innovation.

Community Partner

FOSS United is a non-profit foundation that aims at promoting and strengthening the Free and Open Source Software (FOSS) ecosystem in India. more

Community Partner

A community of Rust language contributors and end-users from Bangalore. We have presence on the following telegram channels https://t.me/RustIndia https://t.me/fpncr LinkedIn: https://www.linkedin.com/company/rust-india/ Twitter (not updated frequently): https://twitter.com/rustlangin more