Santosh Kewat

@kewats

Your AI Agent Is Guessing. Give It Context It Can Trust.

Submitted Jun 25, 2026

Session Description

Everyone is plugging LLMs into their data right now. You wire up an MCP server, point Claude or Cursor at your warehouse, ask “what was revenue last quarter?” and get back a confident, well-formatted, wrong answer. The model picked the staging table. It used a deprecated column. It invented a join. It had no idea your team redefined “active user” six months ago.

The bottleneck in agentic AI usually isn’t the model. It’s context. The agent doesn’t know which of your 400 tables to trust, what your metrics mean, where the data came from, or whether it’s fresh. So it guesses, and you pay for that guess in tokens, retries, and lost trust.

This talk is about closing that gap with open-source tooling only. We’ll walk through how DataHub Core, an open-source metadata platform with around 3M+ downloads a month, works as a context layer for AI agents, and how you can run the whole thing yourself.

We’ll cover, with live demos:

  • The context problem, made concrete. Why RAG and prompt engineering alone don’t fix fragmentation, drift, freshness, or provenance. We’ll show a real text-to-SQL agent getting a question wrong, then watch it get the same question right once it has context to work from.
  • MCP as the open plug. The DataHub MCP Server (open source, uvx mcp-server-datahub@latest) gives any MCP-compatible agent a set of tools to call: search, lineage, schema, SQL generation from real query history, and glossary lookups. We’ll connect it live to Claude Desktop and to Block’s open-source Goose agent.
  • What “context” actually means in practice. Technical metadata like schemas and lineage, operational signals, and business knowledge like glossary terms and ownership. We’ll look at why an agent needs all three to stop making things up.
  • Self-hosting it end to end. Ingest from Postgres, Snowflake, dbt, or Airflow. Build a context graph. Point your agent at your own GMS endpoint with a personal access token. No vendor, no cloud account, runs on your laptop or your cluster.

You’ll leave knowing how to take your existing data, wherever it lives, and give your AI apps a context layer they can actually rely on, using only FOSS components and the open MCP standard.

Key Takeaways

  • Why throwing a bigger model at wrong answers doesn’t help, and why better context does.
  • How MCP works as a vendor-neutral protocol any agent can speak.
  • A reproducible, fully open-source setup you can copy: DataHub Core, the MCP Server, your own data, your own agent.
  • How to tell when your agent is failing on context versus failing on the model, so you fix the right thing.

Target Audience

  • Data Engineers
  • Analytics Engineers
  • AI/ML Engineers
  • Data Architects

Speaker Bio

Santosh Kewat

Santosh Kewat is a software engineer at DataHub, the open-source context platform for AI agents. With around two decades of experience, he’s built and scaled distributed systems across data platforms, e-commerce, supply chain, and finance, including engineering roles at Alteryx, Walmart Labs, BlueStone, and Goldman Sachs. At DataHub he leads the evolution of platform architecture to adopt multi-tenancy while running it at scale on Kubernetes across AWS, GCP, and Azure, the infrastructure that keeps that context reliable in production. Based in Bengaluru.

Chakravarthy Racharla

Chakravarthy Racharla is a software engineer at DataHub, where he works on the systems that ingest, connect, and serve metadata across the platform, the same foundation that now powers context for AI agents. He’s spent over twenty years building large-scale infrastructure: a software architect at Cisco, and before that a master technologist at Hewlett Packard Enterprise, where he worked on OpenStack and cloud systems like OneView and CloudSystem through fifteen-plus years of enterprise platform engineering. That long arc of building systems that have to hold up in production is what he brings to the data-and-AI context problem today.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jumpstart better data engineering and AI futures