BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//HasGeek//NONSGML Funnel//EN
DESCRIPTION:A members-only meet-up by The Fifth Elephant
X-WR-CALDESC:A members-only meet-up by The Fifth Elephant
NAME:Enterprise AI in Production
X-WR-CALNAME:Enterprise AI in Production
REFRESH-INTERVAL;VALUE=DURATION:PT12H
SUMMARY:Enterprise AI in Production
TIMEZONE-ID:Asia/Kolkata
X-PUBLISHED-TTL:PT12H
X-WR-TIMEZONE:Asia/Kolkata
BEGIN:VEVENT
SUMMARY:Check-in
DTSTART:20260619T080000Z
DTEND:20260619T081500Z
DTSTAMP:20260724T232434Z
UID:session/85YbQbhYaj8qJkgxvUSfx8@hasgeek.com
SEQUENCE:3
CREATED:20260616T045249Z
LAST-MODIFIED:20260616T045300Z
LOCATION:Inmobi\, Bangalore
ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com
BEGIN:VALARM
ACTION:display
DESCRIPTION:Check-in in 5 minutes
TRIGGER:-PT5M
END:VALARM
END:VEVENT
BEGIN:VEVENT
SUMMARY:Introduction by Ramakrishna Reddy Yekulla
DTSTART:20260619T081500Z
DTEND:20260619T082000Z
DTSTAMP:20260724T232434Z
UID:session/HG1cj7J3LRU9By7GYRFAhZ@hasgeek.com
SEQUENCE:1
CREATED:20260616T044858Z
LAST-MODIFIED:20260616T044900Z
LOCATION:Inmobi\, Bangalore
ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com
BEGIN:VALARM
ACTION:display
DESCRIPTION:Introduction by Ramakrishna Reddy Yekulla in 5 minutes
TRIGGER:-PT5M
END:VALARM
END:VEVENT
BEGIN:VEVENT
SUMMARY:Anchor talk: When the agent workflow survives production but the M
 CP server splits an RCE
DTSTART:20260619T082000Z
DTEND:20260619T085500Z
DTSTAMP:20260724T232434Z
UID:session/7mXH27MQkwzpacame65BAD@hasgeek.com
SEQUENCE:2
CATEGORIES:Anchor talk (30 mins)
CREATED:20260616T043311Z
DESCRIPTION:Every enterprise AI workflow that matters in 2026 routes throu
 gh MCP servers starting from the tools that give your AI agents access to 
 files\, databases\, APIs\, till shell commands. But the security posture o
 f these servers is systematically poor: 43% have command injection vulnera
 bilities\, 36% have SSRF exposure\, and the real CVEs (CVE-2025-6514\, CVS
 S 9.6) are execSync(args.cmd). These are bugs that a static analyzer catch
 es in 10 seconds.\n\nThis talk is about **MCPeek**: an offline TypeScript 
 static analyzer that scans MCP server source code and fails the CI build b
 efore a vulnerable server reaches production. I'll cover what the enterpri
 se AI production stack actually looks like from a security perspective\, h
 ow taint analysis works on MCP handler code\, and what a first-of-its-kind
  audit of 50+ production MCP servers found.\n\n**Key Takeways:**\n  - How 
 to evaluate a third-party MCP server before connecting it to your agent wo
 rkflow (the "due diligence\, not runtime guarantee" model)\n  - The specif
 ic vulnerability patterns to look for: command injection\, path traversal\
 ,SSRF\, hardcoded credentials\, tool poisoning via description fields\n  -
  How to add MCPeek to your CI pipeline (GitHub Action\, two lines\, fails 
 on high-severity findings\, SARIF output to GitHub Code Scanning)\n  - Wha
 t the limits of static analysis are and which runtime tools to pair with i
 t for the threats it can't see\n\nThis is not a research talk. Every findi
 ng I present maps to a public CVE. The tool is open-source ([Github Reposi
 tory](https://github.com/iamakash-06/MCPeek)\, [npm repository](https://ww
 w.npmjs.com/package/mcpeek)). The CI integration I'll show is production-r
 eady today. The audit data I'll present is from real MCP servers your team
  may already be using.\n\n**Target Audience:** AI Engineers\, AppSec\, Dev
 SecOps Engineers\, Engineering Leads\, Platform and Infrastructure Enginee
 rs.\n\n**Speaker Bio:** I'm a Solution Consultant at Sahaj Software in Che
 nnai. I've been neck-deep in MCP\, Claude Code\, and agentic architectures
  since before they had proper names. I've spoken at GitTogether 2025\, The
  Fifth Elephant 2025\, and many other conferences across AI-assisted devel
 opment\, MCPs and privacy-preserving ML. Right now I'm obsessed with: what
  happens to developer cognition when AI writes most of your code\, why nob
 ody's building agent runtimes for the browser yet and MCP Security. [Linke
 din](www.linkedin.com/in/akash-sathish-74a4271ba)\n\n\n
LAST-MODIFIED:20260616T045203Z
LOCATION:Inmobi\, Bangalore
ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com
URL:https://hasgeek.com/fifthelephant/enterprise-ai-in-production-meetup/s
 chedule/when-the-agent-workflow-survives-production-but-the-mcp-server-spl
 its-an-rce-7mXH27MQkwzpacame65BAD
BEGIN:VALARM
ACTION:display
DESCRIPTION:Anchor talk: When the agent workflow survives production but t
 he MCP server splits an RCE in 5 minutes
TRIGGER:-PT5M
END:VALARM
END:VEVENT
BEGIN:VEVENT
SUMMARY:Lightning talk: Making Generative UI work in production
DTSTART:20260619T090000Z
DTEND:20260619T091000Z
DTSTAMP:20260724T232434Z
UID:session/X3daZ4jUy7WTJhsyzRT7Kw@hasgeek.com
SEQUENCE:5
CREATED:20260616T043426Z
DESCRIPTION:# Format\nLightning Talk\n\n# Speaker\n Zahle Khan\, Founding 
 Engineer\, Thesys\n[LinkedIn](https://www.linkedin.com/in/zahle-khan/)\n[T
 witter](https://x.com/zahlekhan)\n\n# Abstract\nUsers get bored by Agents 
 that respond with wall of text. That's why ChatGPT\, Claude introduced Gen
 erative UI. Instead of hardcoding UI templates\, the agent is given the ab
 ility to compose contextual UI.\nWe shipped our first version on JSON firs
 t. After spending a year\, working with 10\,000+ developers and mangaging 
 production traffic.\,JSON held in demos and drifted under complexity. Toke
 n counts were high\, latency scaled with tokens\, malformed output forced 
 retries. The format was the bottleneck.\nSo we built OpenUI Lang. Compact\
 , line-oriented\, streaming-first. Syntax shaped like code\, which is what
  LLMs were trained on.\nBenchmarks\, reproducible from the public repo\, i
 dentical UIs\, same model\, temperature zero: 67.1% fewer tokens than Verc
 el JSON-Render. 65.4% fewer than our own previous JSON. 61.4% fewer than Y
 AML. 2 to 3x faster render latency at 60 tokens per second. Near-zero malf
 ormed output.\n\nMore information on OpenUI at openui.com\nBenchmarks avai
 lable at https://github.com/thesysdev/openui/tree/main/benchmarks
LAST-MODIFIED:20260616T045419Z
LOCATION:Inmobi\, Bangalore
ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com
URL:https://hasgeek.com/fifthelephant/enterprise-ai-in-production-meetup/s
 chedule/making-generative-ui-work-in-production-X3daZ4jUy7WTJhsyzRT7Kw
BEGIN:VALARM
ACTION:display
DESCRIPTION:Lightning talk: Making Generative UI work in production in 5 m
 inutes
TRIGGER:-PT5M
END:VALARM
END:VEVENT
BEGIN:VEVENT
SUMMARY:Lightning talk: Why data quality matters when working with data at
  scale
DTSTART:20260619T091500Z
DTEND:20260619T092500Z
DTSTAMP:20260724T232434Z
UID:session/Kvz4UC8VhdiUhC9bow6aaA@hasgeek.com
SEQUENCE:5
CATEGORIES:Lightning talks (10 mins)
CREATED:20260616T044042Z
DESCRIPTION:Most data quality problems aren't bugs in the data. They're br
 oken contracts between producers and consumers. The contract gets implicit
 ly defined when the first staging pipeline runs\, then quietly violated in
  production when an upstream service ships a "harmless" schema change\, a 
 field gets nullified\, or volume changes by 10x without warning. By the ti
 me the dashboards look wrong\, the bad data has propagated everywhere.\n\n
 This lightning talk walks through a practical two-layer framework for enfo
 rcing data quality as a first-class engineering concern rather than retrof
 itting it as cleanup. Producer-level enforcement using strict schemas\, sc
 hema registries\, and Avro-formatted contracts with forward and backward c
 ompatibility checks. Processing-layer enforcement using the Apache Iceberg
  Write-Audit-Publish (WAP) pattern\, with blocking and non-blocking qualit
 y checks before data is committed to live tables. The operational reality 
 of running this at billions of events daily across petabytes\, and the one
  architectural decision that didn't survive contact with production.\n\n1.
  A concrete two-layer architecture for enforcing data quality in productio
 n: producer-layer schema contracts plus processing-layer Write-Audit-Publi
 sh. What each layer catches\, what it misses\, and how they compose.\n\n2.
  The operational reality of WAP at scale: how to decide what's a blocking 
 check versus a warning\, how audit failures get handled without paging on-
 call at 3am\, and the architectural decision I'd change if I were doing it
  again today.\n\nAudiences:\nSenior data engineers\, data architects\, ML 
 platform engineers\, and engineering leaders responsible for production da
 ta infrastructure at scale. The talk assumes familiarity with streaming da
 ta systems and schemas\, but doesn't assume prior hands-on experience with
  Iceberg or the Write-Audit-Publish pattern specifically.\n\nBio\nAsif Man
 soor Amanullah is a Lead Data Engineer at Apple\, specializing in large-sc
 ale data infrastructure\, real-time streaming systems\, and privacy-first 
 analytics platforms. He has over a decade of experience building data syst
 ems at some of the world's most technically demanding technology companies
 \, leading the design and implementation of revenue-critical pipelines\, u
 nified analytics platforms\, and audience data systems processing billions
  of events daily across petabytes of data.
LAST-MODIFIED:20260720T093713Z
LOCATION:Inmobi\, Bangalore
ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com
URL:https://hasgeek.com/fifthelephant/enterprise-ai-in-production-meetup/s
 chedule/why-data-quality-matters-when-working-with-data-at-scale-Kvz4UC8Vh
 diUhC9bow6aaA
BEGIN:VALARM
ACTION:display
DESCRIPTION:Lightning talk: Why data quality matters when working with dat
 a at scale in 5 minutes
TRIGGER:-PT5M
END:VALARM
END:VEVENT
BEGIN:VEVENT
SUMMARY:Lightning talk: When AI Agents access your data: controlling data 
 flow in modern pipelines
DTSTART:20260619T093000Z
DTEND:20260619T094000Z
DTSTAMP:20260724T232434Z
UID:session/Cg46NGaTAkkayCem9jRLP5@hasgeek.com
SEQUENCE:8
CATEGORIES:Lightning talks (10 mins)
CREATED:20260616T044711Z
DESCRIPTION:{Describe your session in 2 paragraphs}\nModern AI systems inc
 reasingly rely on agents and automated workflows that interact with databa
 ses\, APIs\, and external services. While databases like PostgreSQL provid
 e strong controls for who can access data\, they do not govern how that da
 ta us used once retrieved. \nIn practice\, agent-driven pipelines can quer
 y sensitive data and then propagate it across systems\, often without visi
 bility\, control\, or enforcement. This creates a critical gap where seemi
 ngly valid operations can lead to unintended data exposure.\nThis lightnin
 g talk demonstrates a production-style agent workflow where an LLM-powered
  agent retrieves data and decides how to act on it. Through a live\, minim
 al system\, we show how data can be exfiltrated via agent actions and how 
 introducing a platform-level governance layer enforces policy controls on 
 outbound data usage. The session focuses on execution traces\, failure han
 dling\, and how centralized enforcement provides safety without modifying 
 application logic.\n\n{Mention 1-2 takeaways from your session}\n1. Why co
 ntrolling data access is not enough in AI systems\, you must control data 
 usage and movement\n2. How to introduce policy-based enforcement for agent
  actions to prevent unintended data exfiltration\n\n{Which audiences is yo
 ur session going to beneficial for?}\nBackend engineers working with data 
 pipelines and APIs\, Engineers interested in governance\, security\, and o
 bservability in AI systems.\n\n{Add your bio - who you are\; where you wor
 k}\nKanika Singhal is a technical leader in Flow Network Security at Nutan
 ix with over 15 years of experience in networking and security. Prior to N
 utanix\, she worked on SD‑WAN technologies at VMware.\nLinkedIn - https:
 //www.linkedin.com/in/kanika-singhal-8bb3622b/ 
LAST-MODIFIED:20260617T095310Z
LOCATION:Inmobi\, Bangalore
ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com
URL:https://hasgeek.com/fifthelephant/enterprise-ai-in-production-meetup/s
 chedule/when-ai-agents-access-your-data-controlling-data-flow-in-modern-pi
 pelines-Cg46NGaTAkkayCem9jRLP5
BEGIN:VALARM
ACTION:display
DESCRIPTION:Lightning talk: When AI Agents access your data: controlling d
 ata flow in modern pipelines in 5 minutes
TRIGGER:-PT5M
END:VALARM
END:VEVENT
BEGIN:VEVENT
SUMMARY:Anchor talk: ₹11 lakh/month: how we took the GPU out of Face Mat
 ch
DTSTART:20260619T094500Z
DTEND:20260619T102000Z
DTSTAMP:20260724T232434Z
UID:session/42UkHPMwHVNK6WjXv2VLrH@hasgeek.com
SEQUENCE:10
CATEGORIES:Anchor talk (30 mins)
CREATED:20260616T044542Z
DESCRIPTION:Face matching is one of the highest-volume workloads in identi
 ty verification. At IDfy\, a single GPU pod handling 1 RPS cost us ₹3\,5
 00/day. After moving the model to BF16 inference on Intel CPUs via OpenVIN
 O\, the same 1 RPS pod cost ₹350/day. Same TAT\, same throughput\, same 
 accuracy envelope. At our traffic shape (50 RPS sustained for the peak hou
 r\, 10 RPS for the remaining 23)\, that translates to roughly ₹11 lakh a
  month in savings on this single workload\, before you account for the GPU
  capacity it freed up for workloads that genuinely need it.\n\nThis talk i
 s not a “CPU beats GPU” pitch. It is the operational story of how we g
 ot there: the calibration set we built\, the operators that refused to qua
 ntize cleanly\, the one architectural tweak we made so OpenVINO could fuse
  properly\, and the production canary we ran to convince ourselves the acc
 uracy was stable. I’ll share two more migrations from IDfy’s 40+ model
  fleet\, including one where the move failed in production and what teleme
 try caught it before users did.\n\nTakeaways:\n\n1.    A cost-vs-latency d
 ecision matrix for GPU vs quantized CPU inference\, with the metrics that 
 actually predict whether a migration will survive production.\n    2.    T
 he three quantization failure modes we see most often\, and the observabil
 ity signals that catch them before users do.\n\nAudience:\n\nProduction ML
  and AI engineers\, platform and infra teams\, and engineering leaders who
  own inference cost-to-serve at scale.\n\nBio:\n\nVivek Kalyanarangan is S
 r. Technical Architect\, AI at IDfy\, where a 20-person team operates 40+ 
 production ML models across biometric authentication\, document recogntion
  and OCR\, fraud detection and large scale NLP.  He has 13+ years across a
 nalytics\, big data\, and deep learning.\n\nAuthor of Quantization and Fas
 t Inference (Manning\, MEAP 2026) and freeCodeCamp course LLMs from Scratc
 h. Contributor to open source ML and published papers.
LAST-MODIFIED:20260720T091756Z
LOCATION:Inmobi\, Bangalore
ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com
URL:https://hasgeek.com/fifthelephant/enterprise-ai-in-production-meetup/s
 chedule/rs11-lakh-month-how-we-took-the-gpu-out-of-face-match-42UkHPMwHVNK
 6WjXv2VLrH
BEGIN:VALARM
ACTION:display
DESCRIPTION:Anchor talk: ₹11 lakh/month: how we took the GPU out of Face
  Match in 5 minutes
TRIGGER:-PT5M
END:VALARM
END:VEVENT
BEGIN:VEVENT
SUMMARY:Break
DTSTART:20260619T102000Z
DTEND:20260619T105000Z
DTSTAMP:20260724T232434Z
UID:session/AuStizkVqCA7YAWS6rYXUq@hasgeek.com
SEQUENCE:4
CREATED:20260616T044617Z
LAST-MODIFIED:20260616T045509Z
LOCATION:Inmobi\, Bangalore
ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com
BEGIN:VALARM
ACTION:display
DESCRIPTION:Break in 5 minutes
TRIGGER:-PT5M
END:VALARM
END:VEVENT
BEGIN:VEVENT
SUMMARY:Anchor talk: Debugging Agents in Production
DTSTART:20260619T105000Z
DTEND:20260619T112500Z
DTSTAMP:20260724T232434Z
UID:session/NqSWbpVqSEGETCWWFZffee@hasgeek.com
SEQUENCE:8
CATEGORIES:Anchor talk (30 mins)
CREATED:20260616T044640Z
DESCRIPTION:**Debugging Agents in Production**\n\nEvery distributed system
  is impossible to debug without custom built observability and tracing too
 ls. Multi-agent systems are no different.\n\nA single request from a user 
 may result in a number of concurrent actions across multiple agents and it
  quickly gets hard to tell where things went wrong in the workflow.\n\nFro
 m a customer perspective\, most important thing you can optimize for is th
 e speed at which you can find an issue and fix it. Particularly\, when the
  system failures cannot always be consistently reproduced.\n\nAt Isotopes 
 AI\, we invested heavily into a production quality observability platform 
 with three main goals\n	- you should not need a shell window to debug\n	- 
 the tooling should indicate problems\n	- there should be a neat way to agg
 regate across multiple instances.\n\nSpecifically\, we needed to follow a 
 single session across multiple machines\, track one agent's behaviour acro
 ss many sessions\, and look into a single worker for every session it was 
 running at a given moment.\n\nThis talk is a practitioner's account of the
  observability tooling we built to meet that need and architectural choice
 s we made to allow us to retain the replayability of events.\n\nAt its cor
 e is an event-sourced view of a session: every exchange behind an answer i
 s captured\, so the entire session can be traced and replayed after the fa
 ct. From real production traces\, we will look at three things that helps 
 us fix customer issues once a system is live:\n\n- **Making the context wi
 ndow work for you** — every model carries its own finite context window\
 , and the interesting question isn't "are we close to the limit" but *how*
  that window is being spent: instructions\, tool definitions\, prior conve
 rsation\, injected schema and sample data\, retrieved memory. We will brea
 k down that composition and show how to keep the window lean — trimming 
 what an agent doesn't need — without the system losing the context it ac
 tually depends on.\n\n- **Seeing problems that live *between* agents** —
  the hardest failures aren't inside any one agent\; they emerge in the han
 d-offs: a request that quietly retries\, a step that re-enters itself\, an
  issue that only shows up when you look across every agent at once. We wil
 l show how we surface these cross-agent problems in a single view\, and wh
 at they reveal about a system under production load. \n\n- **Spending mode
 l time wisely** — when a request feels slow\, the model itself is usuall
 y not the whole story. A surprising amount of wall-clock time hides in pla
 ces you wouldn't first look\, and once you can *see* where it goes\, the f
 ix is often small and obvious. We will walk through one such case that cha
 nged how we think about latency in agent systems.\n\nThe key takeaway: how
  to follow a single user question through every agent it touched\, and whi
 ch signals actually tell you whether the system is healthy in production.\
 n\n\n**Target audience**: Engineers who build\, operate\, or debug LLM age
 nt systems in production\n\n**BIO**\nRajesh Balamohan has been working as 
 an \; Prior to this he worked in Salesforce\, Waii and worked in companies
  like Cloudera/Hortonworks on bigdata performance tuning.\nhttps://www.lin
 kedin.com/in/rajeshbalamohan/\n\n\n\n is working as an engineer at Isotope
 s AI. Prior to that\, he worked at Collate\, contributing heavily to OpenM
 edata.\nhttps://www.linkedin.com/in/sachin-chaurasiya/
LAST-MODIFIED:20260618T074138Z
LOCATION:Inmobi\, Bangalore
ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com
URL:https://hasgeek.com/fifthelephant/enterprise-ai-in-production-meetup/s
 chedule/debugging-agents-in-production-NqSWbpVqSEGETCWWFZffee
BEGIN:VALARM
ACTION:display
DESCRIPTION:Anchor talk: Debugging Agents in Production in 5 minutes
TRIGGER:-PT5M
END:VALARM
END:VEVENT
BEGIN:VEVENT
SUMMARY:Lightning talk: AI Governance in Indian banks isn't a policy probl
 em — it is an engineering problem
DTSTART:20260619T113000Z
DTEND:20260619T114000Z
DTSTAMP:20260724T232434Z
UID:session/4qh7YeSgYmAvVyYepnUMuh@hasgeek.com
SEQUENCE:5
CATEGORIES:Lightning talks (10 mins)
CREATED:20260616T044654Z
DESCRIPTION:{Describe your session in 2 paragraphs}\n\n\nEvery major India
 n bank now has an AI governance policy document. Most are useless in produ
 ction.\nI've spent the last several years operationalizing AI governance f
 rameworks — RBI's FREE AI guidelines\, the EU AI Act (for global banking
  clients)\, ISO 42001\, NIST AI RMF\, and the DPDP Act — across producti
 on ML systems at some of India's largest financial institutions. The gap b
 etween what these frameworks demand on paper and what engineering teams ac
 tually build is enormous.\nThis talk breaks down the specific engineering 
 failures I've seen when banks try to make AI governance real:\nWhat I'll c
 over:\n\nThe model inventory problem. Most banks can't answer "how many ML
  models are in production right now?" — not because they don't care\, bu
 t because there's no reliable system of record. Shadow models\, notebook-d
 eployed scoring functions\, and vendor black-box APIs all escape the gover
 nance net. I'll show what an operational model registry looks like when go
 vernance is a first-class requirement\, not a retrofit.\nAudit trail archi
 tectures that actually survive an RBI inspection. RBI's FREE AI framework 
 demands explainability\, fairness testing\, and decision traceability. The
  typical response is a PDF report generated post-hoc. The production-grade
  response is a lineage pipeline that captures input data snapshots\, featu
 re transformations\, model version\, inference output\, and the human over
 ride (if any) — queryable months after the decision was made. I'll walk 
 through the architecture: what we logged\, where we stored it\, how we mad
 e it queryable\, and what broke.\nFairness testing in production\, not in 
 notebooks. Every bank runs fairness metrics during model development. Almo
 st none monitor fairness drift in production. I'll share a concrete implem
 entation where caste and gender proxy variables in credit scoring were mon
 itored post-deployment\, including the operational decision of what happen
 s when the fairness threshold is breached mid-quarter — do you kill the 
 model\, flag for manual review\, or retrain? Each choice has different eng
 ineering and business consequences.\nThe organizational failure mode. The 
 Chief Risk Officer owns the governance policy. The ML engineering team own
 s the models. Neither owns the pipeline that connects them. I'll describe 
 the architectural pattern that bridges this — a governance middleware la
 yer that sits between model serving and downstream consumers — and the o
 ne design decision I would change today.\n\n\n\n{Mention 1-2 takeaways fro
 m your session}\n\nWhat the audience takes away:\n\nA concrete reference a
 rchitecture for AI governance in regulated financial services\nThree speci
 fic failure modes they will encounter (with mitigation patterns)\nOperatio
 nal metrics: what to track\, where governance adds latency\, and the real 
 cost of compliance-grade logging\nThe single biggest architectural mistake
  in AI governance (treating it as a reporting layer instead of an infrastr
 ucture layer)\n\nThis is not a talk about why governance matters. Everyone
  in the room already knows that. This is a talk about the engineering deci
 sions that make governance survive contact with production.\n{Which audien
 ces is your session going to beneficial for?}\n\n{Add your bio - who you a
 re\; where you work}\n\nRamanuj is an AI architecture and governance pract
 itioner with 16+ years of enterprise experience. He has deployed productio
 n AI systems across India's largest banks and insurance companies\, operat
 ionalizing regulatory frameworks including RBI's FREE AI guidelines\, ISO 
 42001\, and the DPDP Act. His career spans Oracle (Principal — AI Archit
 ecture & Strategy)\, AWS (where he led AWS's AI initiatives in India's FSI
  practice)\, EY\, Cognizant\, and HDFC Life. He teaches Business Analytics
  at TAPMI Bangalore and publishes "Signal at the Top\," a newsletter on AI
  strategy for business leaders.
LAST-MODIFIED:20260616T045855Z
LOCATION:Inmobi\, Bangalore
ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com
URL:https://hasgeek.com/fifthelephant/enterprise-ai-in-production-meetup/s
 chedule/ai-governance-in-indian-banks-isnt-a-policy-problem-its-an-enginee
 ring-problem-4qh7YeSgYmAvVyYepnUMuh
BEGIN:VALARM
ACTION:display
DESCRIPTION:Lightning talk: AI Governance in Indian banks isn't a policy p
 roblem — it is an engineering problem in 5 minutes
TRIGGER:-PT5M
END:VALARM
END:VEVENT
BEGIN:VEVENT
SUMMARY:Lightning talk: Sovereign LLM Inference on heterogenous AI acceler
 ators using llm-d and vLLM
DTSTART:20260619T114500Z
DTEND:20260619T115500Z
DTSTAMP:20260724T232434Z
UID:session/P23UrY5oTFawiGUq8Ms7ff@hasgeek.com
SEQUENCE:7
CATEGORIES:Lightning talks (10 mins)
CREATED:20260616T044226Z
DESCRIPTION:**Description**\n\nMost production inference clusters today ar
 e single-vendor — not because it is\noptimal\, but because it is the sim
 plest way to set things up. Real fleets are\naccumulating heterogeneity\, 
 through procurement cycles\, supply constraints\,\nand the widening cost g
 ap between accelerators. The open question is whether a\nsingle Kubernetes
 -native serving layer can take a heterogeneous GPU fleet and\nbeat plain k
 8s round-robin on throughput and time-to-first-token\, with no\napplicatio
 n-level changes. This lightning talk reports what we measured.\n\nWe bench
 marked **llm-d** (a CNCF inference framework built on vLLM and the\nGatewa
 y-API InferencePool) on the **[NxtGen](https://www.nxtgen.com) sovereign c
 loud's 3-vendor cluster**:\n4× NVIDIA H100-NVL + 8× AMD MI325X + 8× Int
 el Gaudi3 over a shared 100 G RoCE\nfabric\, serving `ibm-granite/granite-
 4.1-8b` and `sarvamai/sarvam-30b`. Across\nsingle-vendor pools (NVIDIA-onl
 y\, AMD-only\, Gaudi-only) and heterogeneous pools\n(NVIDIA+AMD\, NVIDIA+A
 MD+Gaudi)\, llm-d's prefix-cache-aware routing\ndelivers **+25 to +91% thr
 oughput** and **5–22× better TTFT** vs plain\nKubernetes round-robin 
 — and the advantage grows with pool size and\nheterogeneity. The biggest
  win is on the 20-pod 3-vendor pool\, where llm-d\nhits +91% throughput at
  the same load.\n\n\nArtifacts are available in https://github.com/pravein
 gk/llmd-benchmarking-nxtgen\n\n\n**Take-aways:**\n\n- **Heterogeneous GPU 
 fleets stop being a tax once routing is cache- and\n  load-aware.** A sing
 le Kubernetes serving layer can absorb NVIDIA + AMD +\n  Intel concurrentl
 y and beat round-robin by close to 2× on throughput\, with\n  the same po
 ds\, same vLLM\, same flags — only the routing layer differs. The\n  win
  is unambiguously attributable to llm-d's prefix-cache-aware router.\n- **
 Sovereign and on-prem inference is now operationally viable on\n  mixed-ve
 ndor hardware.** Procurement no longer has to align with a single\n  vendo
 r's roadmap to get good aggregate throughput\; older accelerators can\n  a
 bsorb low-priority workloads while premium hardware handles\n  latency-sen
 sitive paths.\n\n\n**Audience:**\n\n- Platform and SRE teams running on-pr
 em / sovereign / hybrid-cloud LLM inference\n- ML infrastructure engineers
  evaluating Kubernetes-native serving stacks\n  (vLLM\, llm-d\, KServe)\n-
  Teams considering or already running heterogeneous GPU fleets\n  (NVIDIA 
 + AMD + Intel) and worried about how to schedule across them\n- Sovereign-
 cloud and regulated-industry teams (BFSI\, government\, healthcare)\n  who
  need on-prem inference and cannot rely on hyperscaler-only stacks\n- Open
 -source contributors interested in the llm-d / vLLM / Gateway-API\n  Infer
 ence Extension projects\n\n**Bio:**\n\nPravein Govindan Kannan is a Staff 
 Research Scientist at IBM Research working on Systems and Networking for A
 I Inference. He  contributes to open-source projects like llm-d\, UCCL and
  NIXL.\n
LAST-MODIFIED:20260722T140705Z
LOCATION:Inmobi\, Bangalore
ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com
URL:https://hasgeek.com/fifthelephant/enterprise-ai-in-production-meetup/s
 chedule/sovereign-llm-inference-on-heterogenous-ai-accelerators-using-llm-
 d-and-vllm-P23UrY5oTFawiGUq8Ms7ff
BEGIN:VALARM
ACTION:display
DESCRIPTION:Lightning talk: Sovereign LLM Inference on heterogenous AI acc
 elerators using llm-d and vLLM in 5 minutes
TRIGGER:-PT5M
END:VALARM
END:VEVENT
BEGIN:VEVENT
SUMMARY:Lightning talk: Grounding AI Agents in production: a practitioner'
 s implementation guide
DTSTART:20260619T120000Z
DTEND:20260619T121000Z
DTSTAMP:20260724T232434Z
UID:session/8sLZhQRji6vor5Nqx3Fe1S@hasgeek.com
SEQUENCE:8
CATEGORIES:Anchor talk (30 mins)
CREATED:20260616T044727Z
DESCRIPTION:## Session Description\nMost enterprise AI agents that fail in
  production don't fail because of model quality - they fail because they h
 ave no ground beneath them. Without structured semantic context\, agents h
 allucinate over ambiguous schemas\, misfire on intent\, and produce answer
 s that are technically fluent but operationally wrong. This session walks 
 through the full grounding stack - from semantic metadata and knowledge gr
 aph design to hybrid Text-to-SQL and RAG pipelines - drawing on a real dep
 loyment built for a Fortune 500 client's enterprise data platform. We'll e
 xamine how a multi-agent system was constructed on top of complex relation
 al infrastructure\, how a knowledge graph was introduced to give agents th
 e contextual backbone they needed to reason correctly across domains\, and
  how a query validation gate was added after observing the failure modes t
 hat emerge when agents are trusted without checks.\n\nThe session goes bey
 ond architecture slides. We cover the operational decisions that only beco
 me visible under real workloads: why a custom orchestration approach was c
 hosen over a managed platform\, how intent classification was layered in b
 efore query generation\, where the RAG and structured-query paths conflict
  and how those conflicts were resolved at runtime\, and what the tracing a
 nd observability layer revealed about agent behaviour that offline testing
  never surfaced. Attendees will leave with a concrete implementation model
  - not a framework pitch\, but a set of grounded architectural patterns bu
 ilt around knowledge graphs\, semantic injection\, and evaluation signals 
 that transfer across domains and industries.\n\n---\n\n## Key Takeaways\n\
 n1. A repeatable grounding architecture - covering knowledge graph design\
 , semantic metadata injection\, intent classification\, query validation g
 ates\, and RAG-structured-query conflict resolution - that you can adapt f
 or any enterprise AI agent operating over complex or semi-structured data.
 \n\n2. A practical failure taxonomy drawn from production traces: the spec
 ific points where ungrounded agents break\, how observability tooling surf
 aces those failures\, and which architectural interventions - including kn
 owledge graph grounding - actually reduced hallucination and improved reli
 ability at scale.\n\n---\n\n## Target Audience\n\nThis session is most val
 uable for System Architects designing or operating multi-agent systems in 
 enterprise environments\, platform. Backend engineers integrating LLM-base
 d capabilities with existing data infrastructure\, and engineering leaders
  making build-vs-buy and infrastructure decisions around GenAI platforms.\
 n\nAnyone asking **"how do we make this reliable in production?"** will fi
 nd direct\, transferable answers here.\n\n---\n\n## Speaker Bio\n\nNabendu
  Karmakar is a technology leader and product builder with over 14 years of
  experience at the frontier of Agentic AI\, Generative AI\, Data and Fulls
 tack Engineering. His career has been shaped by a single question most tea
 ms skip: will this system still work a few months later\, inside a messy e
 nterprise\, with real data and real constraints?\n\nHe currently serves as
  Principal Architect at [Fractal Analytics](https://fractal.ai)\, where he
  leads a team of engineers building large-scale data and AI platforms for 
 Fortune 500 clients across manufacturing\, insurance\, CPG\, and pharma. H
 is work spans multi-agent systems\, knowledge graph-driven architectures\,
  Text-to-SQL engines\, and RAG pipelines - always with a consistent focus 
 on systems that are operationally sound\, architecturally honest\, and bui
 lt to deliver measurable impact in the real world.\n\nBefore Fractal\, he 
 was part of Jio Haptik - one of the world's largest conversational AI plat
 forms - and began his career at L&T Infotech\, one of India's most respect
 ed technology firms.
LAST-MODIFIED:20260616T050104Z
LOCATION:Inmobi\, Bangalore
ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com
URL:https://hasgeek.com/fifthelephant/enterprise-ai-in-production-meetup/s
 chedule/grounding-ai-agents-in-production-a-practitioners-implementation-g
 uide-8sLZhQRji6vor5Nqx3Fe1S
BEGIN:VALARM
ACTION:display
DESCRIPTION:Lightning talk: Grounding AI Agents in production: a practitio
 ner's implementation guide in 5 minutes
TRIGGER:-PT5M
END:VALARM
END:VEVENT
BEGIN:VEVENT
SUMMARY:Closing by Ramakrishna Reddy Yekulla
DTSTART:20260619T121500Z
DTEND:20260619T122000Z
DTSTAMP:20260724T232434Z
UID:session/2kwGWVqUJ4KkBwbabt9LeU@hasgeek.com
SEQUENCE:3
CREATED:20260616T044822Z
LAST-MODIFIED:20260616T045500Z
LOCATION:Inmobi\, Bangalore
ORGANIZER;CN="The Fifth Elephant":MAILTO:no-reply@hasgeek.com
BEGIN:VALARM
ACTION:display
DESCRIPTION:Closing by Ramakrishna Reddy Yekulla in 5 minutes
TRIGGER:-PT5M
END:VALARM
END:VEVENT
END:VCALENDAR