Speak at The Fifth Elephant 2026 Annual Conference
Share you work with the community
Jul 2026
27 Mon
28 Tue
29 Wed
30 Thu
31 Fri 09:00 AM – 06:00 PM IST
1 Sat
2 Sun
Arjun Mahishi
Submitted Jun 24, 2026
Title: Bringing Down MTTR: Building an AI-Powered Diagnostic Platform for Database Support
Author: Arjun Mahishi (arjun.mahishi@gmail.com; Cockroach Labs)
Session type: Talk (30 mins)
Track: Building & implementing AI tools & agents in production
Submission for: The Fifth Elephant
Statue of this doc: Draft (still iterating over it; Will be done before 30th June)
When a customer reports a problem with their CockroachDB cluster, support engineers need to sift through debug zips containing logs, system table dumps, CPU/heap profiles, metrics, and traces -- often hundreds of megabytes of diagnostic data. The traditional workflow involved
downloading these artifacts to individual laptops, running ad-hoc shell scripts, and context-switching between ticketing systems, secure file transfer tools, and runbooks. Mean Time to Resolve (MTTR) suffered.
We built a centralized diagnostic platform that makes all customer artifacts available on a cloud-backed filesystem, with investigation tools pre-installed on a VM, accessible through a web app. Then we layered AI agents on top to generate preliminary root cause analyses and
let engineers chat with an agent that can query, search, and correlate across all the diagnostic data.
The key design decision: instead of integrating with off-the-shelf observability platforms like Datadog, Loki, or Grafana -- which solve live telemetry, not post-mortem debug artifact analysis -- we exposed everything as a filesystem. This lets AI agents use ripgrep, jq,
DuckDB, and Python -- tools with massive LLM training data -- rather than requiring custom APIs or proprietary query interfaces. The diagnostic data from CockroachDB has a unique shape (system table CSVs, custom profile formats, interleaved multi-node logs) that no single
observability tool models well.
The talk includes a live demo of the platform.
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}