Tickets

Loading…

Abhijeet Kumar

Abhijeet Kumar

@abhijeet3922

Rachna Saxena

@rachnas Speaker

Develop Visual Augmented Q&A using multi-vector Vision Embeddings with Vector DB

Submitted May 6, 2025

Problem Statement Most of the documents include infographics (visual elements of information) such as tables, charts, images etc. often used to convey complex information to readers. Multi-modal LLMs are powerful tools that can be used for question answering on such complex documents. However, there are two challenges which limits the productivity and value add:

  1. Multi-modal LLMs performance degrades with longer context (lost in the middle issue)
  2. Existing Text-RAG based Q&A systems could only process textual information (embeddings)

Solution: Vision-RAG systems are modern state-of-the-art architectures which encodes text and infographics jointly to answer user’s queries. Vision Language Models like ColPali can encodes visual elements along with text information.

Why matter ? Many analyst teams in business or captive companies manually research complex documents with turnaround time of days to weeks.

Outline
It will be a hand-on session for participants and will cover following modules:

  • Module 1: What is Visual Augmented Q&A (talk)

    • Introduction to Multi-modal LLM
    • Introduction to Visual Language Model.
  • Module 2: Foundation: Prompting for Q&A using Multi-modal LLM.

  • Module 3: Setting up Vision based RAG:

    • Vision Embedding using ColPali (talk)
    • Setting up Late interaction Retrieval using ColPali
    • Hands-on Develop end-to-end Visual augmented Q&A
  • Module 4: Practical challenges with Vision based RAG (talk)

  • Module 5: Integration with Vector DB

    • Overall Architecture (talk)
    • Hands on: Storing multi-vector representation in Vector DB
    • Hands-on: Embedding based retrieval & ColPali based re-ranker
    • End to end python process - Demo

Takeaways
By end of the Multi-modal RAG workshop, participants will be able to:

  • Understand working with Vision based Vector DB
  • Develop end to end process for Visual Augmented Q&A
  • Practical challenges & strategies to address them

Audience

  • Aspiring Data Scientists
  • AI/Devops Engineer
  • Researchers in Gen AI space

Biography
I am Director, Data Science with 12+ years of relevant experience in solving problems leveraging advanced analytics, machine learning and deep learning techniques. I started my career as a computer scientist in a government research organization (Bhabha Atomic Research Center) and did research on variety of domains such as conversational speech, satellite imagery and texts.

As part of my work, I have published and presented several research papers in multiple research conferences over years. I had an opportunity to be speaker in past 5th Elephant & PyCon conferences in past years. I had trained professionals in machine learning (M.Tech course) as Guest Faculty at BITS, Pilani, WILP program.

Workshop Material
In Progress: https://github.com/abhijeet3922/vision-RAG/

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hybrid Access Ticket

Hosted by

Jump starting better data engineering and AI futures

Supported by

Gold sponsor

Sahaj is an artisanal technology services company crafting purpose-built AI and data-led solutions for businesses.

Gold sponsor

Atlassian unleashes the potential of every team. Our agile & DevOps, IT service management and work management software helps teams organize, discuss, and compl

Gold sponsor

Together, we can build for everyone.

Bronze sponsor & Swag sponsor

AI-Powered Upskilling for Modern Data Professionals

Bronze sponsor

Thoughtworks is a pioneering global technology consultancy, leading the charge in custom software development and technology innovation.

Community partner

Grace Hopper Celebration India 2025, hosted by AnitaB.org India, is Asia’s largest gathering of women and allies in technology.

Community partner

Bengaluru Systems Meetup

Community partner

Build your own homelab server rack at The Fifth Elephant