When AI Grades the Humans: Lessons from a Healthcare Trust Layer in Production

Jun 2026

15 Mon

16 Tue

17 Wed

18 Thu

19 Fri 02:00 PM – 06:00 PM IST

20 Sat

21 Sun

When AI Grades the Humans: Lessons from a Healthcare Trust Layer in Production

Submitted Jun 15, 2026

Submission type: Lightning talks (10 mins)

Most AI evaluation frameworks assume the AI is the actor being judged. In production healthcare, we inverted this: AI became the primary evaluator of healthcare providers against quality criteria defined by a US virtual health system. CogniSwitch deployed a Trust Layer that ingests doctor-patient conversations, mines them against medical ontologies and customer-defined QA criteria, and produces a 360-degree quality view across every actor in the system. The pipeline runs on live conversation data, integrated across the customer’s Snowflake & Postgres via an orchestration layer with versioning.

Getting this to production was a marathon with surprises along the way. Covering the full distribution of real-world cases in our data took longer than the model work. Schema violations cascaded through the pipeline in non-obvious ways. And the LLM Gateway we instrumented for token cost and behavior analysis surfaced model patterns that no offline evaluation had caught. This talk covers what worked, what broke and everything in between that got us to production.

Takeaways:

Your staging data will not cover production.
AI-first creates two change management problems — yours and your customer’s. Both need a plan.
In regulated domains, design for the human handoff — not around it.

Audience:
ML and backend engineers building AI pipelines in regulated or high-stakes domains; anyone responsible for operating AI systems where outputs affect real decisions; teams wrestling with evaluation that goes beyond offline accuracy metrics.

Bio:
Hi, I’m Joshua, Co-Founder and CTO at CogniSwitch, a Trust Layer for Agents (AI / human - no discrimination) in Regulated Industries (Healthcare, Finance). Previously at Aikon Labs. Decade of engineering Software, Data, ML, DL, CL & IR. Built iEngage.ai (a platform used by enterprises to power ~100 use cases & apps) & Ariv.ai (a knowledge bot using conversations in MS Teams & Slack pre-GPT-3)

Enterprise AI in Production

When AI Grades the Humans: Lessons from a Healthcare Trust Layer in Production

Comments