FELIX GEORGE

@felix007

Unsupervised Cycle Detection in Agentic Application

Submitted Jun 24, 2026

The problem being addressed

Like traditional software, AI agents are prone to failure — and one of their most insidious failure modes is the repetitive futile cycle: a loop of unproductive behavior in which the agent keeps invoking tools or sub-agents without making any real progress toward its goal. These cycles arise naturally from the plan–act–observe paradigm, non-deterministic tools, error recovery, and suboptimal planning. Unlike explicit errors or timeouts, futile cycles produce no error signal; they quietly consume compute and tokens while generating no new insight, which makes them especially hard to detect. This work introduces the concept of futile cycles, distinguishes them from productive cycles (where repetition genuinely drives incremental progress), and proposes an unsupervised framework that detects them by combining structural and semantic analysis of agent execution trajectories.

Why it is relevant

As AI agents move into production, reliability and observability become first-class engineering concerns. Existing trajectory-based failure-detection approaches are largely supervised, requiring labeled data that is impractical to obtain at scale — so an unsupervised approach removes a key barrier to real-world adoption. The framework was evaluated on a curated dataset of 1,575 trajectories from a LangGraph-based multi-agent stock-market application, where the hybrid method achieved an F1 score of 0.72 (precision 0.62, recall 0.86), substantially outperforming purely structural (F1 0.08) and purely semantic (F1 0.28) methods. Beyond accuracy, the hybrid design offers a real computational advantage by reducing how many trajectories require expensive semantic-similarity comparison, making it practical to deploy.

Key takeaways attendees can expect

  • A clear definition of futile vs. productive cycles, and why the distinction matters for agent reliability and cost.
  • How an agent trajectory can be represented as a Directed Acyclic Graph (DAG) for structure or a Call Stack (CS) for temporal order, and the trade-offs of each.
  • The four detection methods — CDDAG and CDCS (structural), CDSA (semantic), and the Hybrid approach — and why combining structural and semantic signals sharply reduces false positives.
  • Concrete empirical results on 1,575 labeled trajectories, including why structural-only and semantic-only methods fall short (high recall, low precision), and where semantic similarity breaks down on numerical time-series data — motivating future multimodal methods.
  • A practical, unsupervised blueprint for catching silent, costly agent failures without labeled training data.

Who the session is intended for

This session is intended for engineers, researchers, and practitioners who build, operate, or observe AI agent and multi-agent systems — including those working on agent reliability, LLM observability (e.g. OpenLLMetry / OpenTelemetry-style instrumentation), agent frameworks such as LangGraph, and anyone responsible for controlling the cost and quality of agentic applications in production. A working familiarity with AI agents and execution traces is helpful but not required.

Felix George, Research Software Engineer @ IBM SIL

https://docs.google.com/presentation/d/15Yg5HA1mbxv9X3m7nM7lMZ-hJuWxe7ZP/edit?usp=share_link&ouid=118242377177111751707&rtpof=true&sd=true

{Add the link to 2-min elevator pitch video}

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jumpstart better data engineering and AI futures