Speak at The Fifth Elephant 2026 Annual Conference
Share you work with the community
Jul 2026
27 Mon
28 Tue
29 Wed
30 Thu
31 Fri 09:00 AM – 06:00 PM IST
1 Sat
2 Sun
Pranav Balasaheb Bhosale
@pranavbhosale
Submitted Jun 25, 2026
Large Language Models are powerful at reasoning and recommendation, but they routinely hallucinate entities that do not exist in an organization’s proprietary data. The common response is to build a Retrieval-Augmented Generation (RAG) stack with embeddings, a vector database, and retrieval infrastructure. However, many enterprise datasets are not collections of long documents—they are structured master data such as product catalogs, audience segments, taxonomies, brands, and reference lists containing thousands of short records. In our case, we needed to ground an agent’s recommendations in a proprietary catalog while keeping latency, operational complexity, and deployment footprint low.
This session presents a practical evaluation of lightweight retrieval approaches for agent grounding without a vector database. We benchmarked BM25 variants, MiniLM semantic embeddings, and Model2Vec static embeddings across real-world planning queries and evaluated them not only on recall, but also on a “trap rate” metric that measures how often retrieval surfaces wrong-but-plausible results that could mislead a downstream LLM. The findings were surprising: lightweight static embeddings came within a point of transformer-based semantic retrieval while requiring a fraction of the runtime footprint, and enriching records with descriptions often increased the number of confusable false positives without significantly improving recall. We will share the evaluation methodology, results, and a practical framework for selecting retrieval strategies based on how agents consume proprietary data.
This session will be valuable for:
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}