Gurram Poorna Prudhvi

@poornaprudhvi

Memory Is The Moat: Designing AI Agents That Remember, Retrieve, and Reason

Submitted Sep 23, 2025

This session demonstrates why agent memory is the moat for long‑horizon reliability and personalization, then walks through a practical, production‑ready memory stack: short‑term buffers with rolling summaries, episodic “attempt→outcome” logs, semantic retrieval over summaries, and structured user facts for deterministic recall. Through real examples—support triage pulling similar past tickets, a travel agent recalling airline and allergy preferences, and a research copilot indexing daily summaries—attendees see how selective recall curates sharper context than full‑history prompts, with lower latency and cost.

The talk focuses on implementable patterns and guardrails: summarize‑don’t‑stuff, retrieve‑then‑read, episode linking for “what worked last time,” consolidation/decay jobs, hybrid retrieval, and privacy‑by‑design. Each pattern includes a concrete before/after, token and latency impact, and failure modes to avoid (unbounded memory drift, single‑store anti‑pattern, and oversized context windows that degrade reasoning). Attendees leave with a blueprint, checklists, and proven tips to measure retrieval quality, keep prompts lean, and make memories durable, safe, and useful.

Summaries as first‑class artifacts plus targeted retrieval consistently outperform full‑history stuffing for accuracy, latency, and cost in real systems.

Split memory by function—cache for working context, vector index for semantic recall, and OLTP for deterministic facts—and govern with retention, importance scoring, and PII redaction.

Agenda:

why memory matters (3 min)

Set the problem: context window ≠ memory; long‑horizon reliability needs selective recall across sessions.

Failure modes we fix (2 min)

Repeats, contradictions, lost plans, high latency/cost from full‑history prompts.

Practical memory stack overview (5 min)

Short‑term buffer with rolling summaries, episodic “attempt→outcome,” semantic retrieval over summaries, structured facts for determinism.

Context curation patterns (6 min)

Summarize‑don’t‑stuff, retrieve‑then‑read, episode linking (“what worked last time”), consolidation/decay.

Systems choices and guardrails (4 min)

Cache vs vector vs OLTP roles, hybrid retrieval, token caps, importance/recency scoring, privacy‑by‑design.

Real‑life examples (6 min)

Support triage: similar ticket retrieval + preferences; travel agent: recalls airlines/allergies; research copilot: summary‑first indexing with nightly consolidation.

Pitfalls and anti‑patterns (2 min)

Unbounded memory drift, single‑store architectures, oversized contexts degrading reasoning.

Wrap‑up and KPIs (2 min)

Takeaways, blueprint recap, and what to measure: retrieval hit‑rate, precision@k, token usage, p95 latency, success‑per‑try.

Expected audience: AI enthusiasts and practitioners

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jumpstart better data engineering and AI futures

Supported by

Meet-up sponsor

Thoughtworks is a global technology consultancy that integrates strategy, design and engineering to drive digital innovation.

Product demo sponsor