Nov 2024
18 Mon
19 Tue
20 Wed
21 Thu
22 Fri 09:00 AM – 05:10 PM IST
23 Sat
24 Sun
Srikanth Venugopalan
ABSTRACT:
Object storage has been around for a long time. While it is a cheap and scalable storage option, it has been traditionally limited to use cases such as storing unstructured data, or as a blob storage for binary data. With data footprints growing at an exponential rate, object storage is being used for a class of use cases that were previously thought to be impossible. While the most well-known example is the use of object storage for SQL analytics on structured data, engineering teams are exploring its use for use cases involving logs and metrics, IOT and sensor data, geospatial data, vector search, etc.
In this talk, we will explore an option where a few techniques can help Object Storage process large amounts of data with low latency and high concurrency requirements. Specifically, we will explore the world of Indices, table format and Vectorized data fetching that can help achieve this goal.
We will also talk about heterogeneous data and how they can be unified at retrieval time to build a fit-for-purpose resultset, and how indexing helps achieve these.
We will show some benchmarks around the experiments that we have been running around information retrieval at scale, on various cloud platforms.
KEY TAKEAWAYS:
Learn about database internals around Indices (we will go into some depth on 2 types of database indices)
Get some insights on some of the limitations that may hit you when you try to access data at scale from Object Stores
Get introduced to some useful write patterns that can help simplify retrieval
AUDIENCE:
Data Engineers - Individuals who are building data infrastructure and platforms typically handle large-scale data processing and relevant workloads.
Cloud Architects - Those who build the ideal strategy for various use cases that require information retrieval or analytics on large datasets stored in an Object Store.
Database internals developers/enthusiasts - Anyone who builds databases or is interested in building one or even just curious about how Lakehouse engines work their way around large data.
Hosted by
Supported by
Platinum Sponsor
Platinum Sponsor
Community sponsor
Venue host - Rootconf workshops
Community Partner
Community Partner
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}