The Fifth Elephant 2025 Annual Conference CfP
Speak at The Fifth Elephant 2025 Annual Conference
Submitted May 6, 2025
Problem Statement Most of the documents include infographics (visual elements of information) such as tables, charts, images etc. often used to convey complex information to readers. Multi-modal LLMs are powerful tools that can be used for question answering on such complex documents. However, there are two challenges which limits the productivity and value add:
Solution: Vision-RAG systems are modern state-of-the-art architectures which encodes text and infographics jointly to answer user’s queries. Vision Language Models like ColPali can encodes visual elements along with text information.
Why matter ? Many analyst teams in business or captive companies manually research complex documents with turnaround time of days to weeks.
Outline
It will be a hand-on session for participants and will cover following modules:
Module 1: What is Visual Augmented Q&A (talk)
Module 2: Foundation: Prompting for Q&A using Multi-modal LLM.
Module 3: Setting up Vision based RAG:
Module 4: Practical challenges with Vision based RAG (talk)
Module 5: Integration with Vector DB
Takeaways
By end of the Multi-modal RAG workshop, participants will be able to:
Audience
Biography
I am Director, Data Science at Fidelity Investments with 12+ years of relevant experience in solving problems leveraging advanced analytics, machine learning and deep learning techniques. I started my career as a computer scientist in a government research organization (Bhabha Atomic Research Center) and did research on variety of domains such as conversational speech, satellite imagery and texts.
As part of my work, I have published and presented several research papers in multiple research conferences over years. I had an opportunity to be speaker in past 5th Elephant & PyCon conferences in past years. I had trained professionals in machine learning (M.Tech course) as Guest Faculty at BITS, Pilani, WILP program.
Workshop Material
In Progress: https://github.com/abhijeet3922/vision-RAG/
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}