The Fifth Elephant 2024 Annual Conference (12th &13th July)

Maximising the Potential of Data — Discussions around data science, machine learning & AI

Aditi Ahuja

Vector Databases: A Bird's Eye View

Submitted Jun 3, 2024

This talk is focused on an equipping the audience with an overall understanding of the current vector database landscape, and how vector databases work internally with a focus on a few common algorithms.

Audience

Primarily intended for engineers looking to understand the intersection of information retrieval with artificial intelligence.

Scope

  1. Algorithms to store and retrieve vector embeddings are not a recent development. However, in a growing market for vector DBs, they are coming to the forefront with databases competing to build competitive vector indexing and retrieval capabilities.
  2. What is driving this shift to a new paradigm and why?
  3. Explore why established commonly used indexing algorithms are not up to the task.
  4. Understand some of the popular algorithms that address these limitations and the vector indexing offerings that have adopted these.
  5. Evaluate the capabilities offered by custom-built, dedicated vector databases vs those by integrating vector search into an existing database.

Takeaway

  1. Understanding of the current vector database landscape.
  2. Equip the audience with an understanding of the broad classification of Approximate Nearest Neighbour search algorithms.
  3. The above, in turn, is a foundation to understand the fundamental algorithms behind vector indexing and search.

Pain Points

Since solutions to store and retrieve vector embeddings are now a key infrastructural component as part of RAG pipelines, evaluating the various offerings will require understanding the most common algorithms.

This involves understanding of some aspects of multiple moving parts such as tuning embedding models, leveraging the embedding to provide context in RAG pipelines.

This talk aims to provide a system overview to build upon when surveying the developments in this domain.

Format

The narrative will start from why vector databases are needed and the recent developments driving this demand. It will then transition into discussing the internals of vector databases and the specific trade-offs to consider when evaluating vector databases compared to non-vector ones.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jump starting better data engineering and AI futures

Supported by

Gold Sponsor

Atlassian unleashes the potential of every team. Our agile & DevOps, IT service management and work management software helps teams organize, discuss, and compl

Silver Sponsor

Together, we can build for everyone.

Workshop sponsor

Datastax, the real-time AI Company.

Lanyard Sponsor

We reimagine the way the world moves for the better.

Sponsor

MonsterAPI is an easy and cost-effective GenAI computing platform designed for developers to quickly fine-tune, evaluate and deploy LLMs for businesses.

Community Partner

FOSS United is a non-profit foundation that aims at promoting and strengthening the Free and Open Source Software (FOSS) ecosystem in India. more

Beverage Partner

BONOMI is a ready to drink beverage brand based out of Bangalore. Our first segment into the beverage category is ready to drink cold brew coffee.