Unavailable

This livestream is restricted

Already a member? Login with your membership email address

Tickets

Loading…

Karthika Vijayan

Karthika Vijayan

@karthikav

Enhancing retrieval in RAG: The fused features way

Submitted May 13, 2025

Retrieval-Augmented Generation (RAG) has emerged as a dominant framework for leveraging Large Language Models (LLMs) to generate responses grounded in extensive textual corpora. Within this architecture, the retrieval component plays a critical role in determining the overall system accuracy by surfacing the most relevant text chunks based on semantic similarity to the user query. Typically, this similarity is computed via cosine scores between vector embeddings of the query and document segments. This talk will highlight the limitations of conventional retrieval methods and motivate the need for more expressive and effective embedding strategies.

We will look into both sparse and dense embeddings, and how each captures different aspects of meanings from text. The talk will focus on how combining these embeddings can give a more complete representation of queries and documents. I will explain simple yet powerful techniques to fuse multiple embeddings and show how this improves retrieval results. Through practical examples and empirical insights, the session will demonstrate how such fusion techniques significantly outperform single-embedding baselines in RAG pipelines.

Outline of the talk

  • Introduction to embeddings as effective representation of text
  • What are sparse and dense embeddings and what do they really represent
  • Ways to combine multiple embeddings to form fused or composite features
  • Retrieval scores in RAG, showcasing the effectiveness of fused features

Takeaways

  • Learn about underlying mechanisms of text embeddings
  • Learn cool ways to make features from text; simple refinements that make huge difference in RAG
  • Listen to nuances of some original work that I have done for in-house projects

If you’re a Gen AI enthusiast, going to build many many RAG systems and have curiosity around LLMs, this talk is for you!

Speaker bio
Dr. Karthika Vijayan is a Solution Consultant at Sahaj Software. She has been conducting research in the field of conversational AI with voice and text data for almost a decade. Her research has been published in several journals and presented at various international conferences. Prior to joining Sahaj Software, she worked as a research fellow at the National University of Singapore and at IISc Bangalore. She has done her PhD from IIT Hyderabad.

Previous talk links
https://www.youtube.com/watch?v=o6YHcDLod8A
https://www.youtube.com/watch?v=-uoUwGpzIL0
https://www.youtube.com/watch?v=kphYc_lvKIk&list=PLkPaq00oPRfzz9O4q06rOL2dHCEX7PQwU&index=18
https://www.youtube.com/watch?v=gvJhtBdmUi8&t=897s

Profile links:
https://scholar.google.com/citations?user=fJp6O0UAAAAJ&hl=en
https://www.linkedin.com/in/karthika-vijayan/
https://www.researchgate.net/profile/Karthika-Vijayan

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hybrid Access Ticket

Hosted by

Jump starting better data engineering and AI futures

Supported by

Gold sponsor

Sahaj is an artisanal technology services company crafting purpose-built AI and data-led solutions for businesses.

Gold sponsor

Atlassian unleashes the potential of every team. Our agile & DevOps, IT service management and work management software helps teams organize, discuss, and compl

Gold sponsor

Together, we can build for everyone.

Bronze sponsor & Swag sponsor

AI-Powered Upskilling for Modern Data Professionals

Bronze sponsor

Thoughtworks is a pioneering global technology consultancy, leading the charge in custom software development and technology innovation.

Bronze sponsor

Community sponsor

We are a boutique product, engineering and AI consultancy that cares about the impact we create in the world.

Community partner

Build your own homelab server rack at The Fifth Elephant

Community Partner

A community of interdisciplinary individuals with a shared interest in the practice of data visualisation across India

Community partner

Bengaluru Systems Meetup

Community partner

Grace Hopper Celebration India 2025, hosted by AnitaB.org India, is Asia’s largest gathering of women and allies in technology.