Manu Manjunath

Manu Manjunath

@manu_manjunath

Agentic debugging with auto-heal in long-running workflows

Submitted Jun 24, 2026

Describe your session in 2 paragraphs

This talk shares practical lessons from building an agentic AI system that does deep technical investigations of production failures spanning several services, long-running jobs and data streaming pipelines. Post failure identification, how to apply ‘data fixes’ to mitigate failures, before a permanent fix is rolled out?

We’ll share what worked for us and (more importantly) what didn’t in designing systems that enable LLMs to reason over complex systems, combining structured data with dynamic tool use. We’ll talk about how to balance accuracy, extensibility and cost. After all, tokens aren’t free! 🙂

Mention 1-2 takeaways from your session

  1. How agents can be used as peer engineers for debugging production incidents? (To free up human’s time for more complex/creative tasks)
  2. How an agent can reduce MTTM and MTTR for your customers in ways humans can’t.

Which audiences is your session going to beneficial for?

Anyone looking to leverage AI beyond just coding.

Add your bio - who you are; where you work

This has everything: https://www.linkedin.com/in/manu-manjunath/

Will share after internal approvals

https://www.loom.com/share/4326a175bc7b419dbfcdcdd7dce63026

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jumpstart better data engineering and AI futures