End-to-End Observability for AI Systems

Jul 2026

27 Mon

28 Tue

29 Wed

30 Thu

31 Fri 09:00 AM – 06:00 PM IST

1 Sat

2 Sun

End-to-End Observability for AI Systems

Submitted Jun 25, 2026

I am submitting for: Track 1 - Data engineering & infrastructure Type of session: Hands-on workshop - 2-4 hours

Speaker bio

I am Khushi, working as Lead ML Architect in a fintech-space company called
Finantic.AI. I am enthusiastic about new technologies built on top of existing
fundamental technologies that work just a little better.

Abstract

In AI systems, we not only need to measure the AI-based metrics like tokens, cost and latency, but also many other non-AI-based workflows that can add up overhead and lead to breakage in workflows. For each type of observability system we have had tools for many years, viz. OpenTelemetry, ELK/EFK systems, but in agentic development we have to keep a consistent and robust track of the entire software lifecycle that seamlessly integrates the observability of agents and non-agents.

In this workshop we instrument an entire multi-stage AI pipeline end to end, and treat the deterministic stages and the non-deterministic model stages as one connected trace. We use a generic customer-support resolver as the demo (take data → classify the request → retrieve the information → validate → response routing) deliberately not tied to any one domain — so the patterns transfer to whatever pipeline you actually run. We build it with OpenTelemetry-style spans and view it in Langfuse, which is open-source and self-hosts in minutes.

The pipeline is observed on two kinds of telemetry at once:

Deterministic stages become assertable spans, like any other general logging and tracing system.
Non-deterministic stages , the model generations, carry tokens, cost, and latency, which are attached to the trace, so an LLM step sits in context next to the deterministic steps around it.

Then we debug by trace, we a request, open its trace, and watch the trace localize each one of the stages.

Why this is relevant

As the call notes, in closed-loop systems where agents act on data before a
human sees it, the standard observability pillars are no longer enough. A model metric tells you the model ran; it doesn’t tell you the pipeline was correct, fast, or affordable. This is a practical, OSS-only pattern for instrumenting a real multi-stage AI pipeline so the whole system is observable.

Who this is for

Data, platform, and AI engineers running multi-step LLM pipelines in production who need to see the whole system, not just the model call — and who want a debug-by-trace and cost-attribution workflow that survives an incident.
Comfortable with Python; no ML background required.

Key takeaways

How to instrument a full AI pipeline including deterministic stages and LLM stages.

What attendees need (for the workshop format)

A laptop with Docker and Python. A starter repo is provided, so everyone runs and instruments the same pipeline locally.

https://docs.google.com/presentation/d/1yBwc2RajTaErC18awy7Nzo1lX8dPduwC/edit?usp=drive_link&ouid=117709760681210575135&rtpof=true&sd=true

Speak at The Fifth Elephant 2026 Annual Conference