Speak at The Fifth Elephant 2026 Annual Conference
Share you work with the community
Jul 2026
27 Mon
28 Tue
29 Wed
30 Thu
31 Fri 09:00 AM – 06:00 PM IST
1 Sat
2 Sun
FELIX GEORGE
@felix007
Submitted Jun 24, 2026
Like traditional software, AI agents are prone to failure — and one of their most insidious failure modes is the repetitive futile cycle: a loop of unproductive behavior in which the agent keeps invoking tools or sub-agents without making any real progress toward its goal. These cycles arise naturally from the plan–act–observe paradigm, non-deterministic tools, error recovery, and suboptimal planning. Unlike explicit errors or timeouts, futile cycles produce no error signal; they quietly consume compute and tokens while generating no new insight, which makes them especially hard to detect. This work introduces the concept of futile cycles, distinguishes them from productive cycles (where repetition genuinely drives incremental progress), and proposes an unsupervised framework that detects them by combining structural and semantic analysis of agent execution trajectories.
As AI agents move into production, reliability and observability become first-class engineering concerns. Existing trajectory-based failure-detection approaches are largely supervised, requiring labeled data that is impractical to obtain at scale — so an unsupervised approach removes a key barrier to real-world adoption. The framework was evaluated on a curated dataset of 1,575 trajectories from a LangGraph-based multi-agent stock-market application, where the hybrid method achieved an F1 score of 0.72 (precision 0.62, recall 0.86), substantially outperforming purely structural (F1 0.08) and purely semantic (F1 0.28) methods. Beyond accuracy, the hybrid design offers a real computational advantage by reducing how many trajectories require expensive semantic-similarity comparison, making it practical to deploy.
This session is intended for engineers, researchers, and practitioners who build, operate, or observe AI agent and multi-agent systems — including those working on agent reliability, LLM observability (e.g. OpenLLMetry / OpenTelemetry-style instrumentation), agent frameworks such as LangGraph, and anyone responsible for controlling the cost and quality of agentic applications in production. A working familiarity with AI agents and execution traces is helpful but not required.
Felix George, Research Software Engineer @ IBM SIL
{Add the link to 2-min elevator pitch video}
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}