Jun 2026
8 Mon
9 Tue
10 Wed
11 Thu
12 Fri 09:00 AM – 06:00 PM IST
13 Sat 09:00 AM – 06:00 PM IST
14 Sun
Sohham Seal
@sohham
Submitted Apr 30, 2026
We sized the storage layer for Nutanix’s Panacea.AI platform — 5 TB and 5 billion log lines a day — three different ways and got three answers an order of magnitude apart. Same workload, same retention, same ingest rate; the engines disagreed on storage by 37× and on CPU by 3×.
| Sizing for 5 TB/day, 30-day retention | Disk | RAM | CPU cores | Compression | Ingest |
|---|---|---|---|---|---|
| Inverted-index, heavy | 750 TB | 24 TB | 1,200+ | 1.5 : 1 | ~25k rows/s |
| Inverted-index, lean | 500 TB | 10 TB | 800 | 1 : 1 | ~50k rows/s |
| Columnar (what we ship) | 20 TB | 4 TB | 400 | 10 : 1 | 870k+ rows/s |
This 15-minute talk is the engineering case for why that gap exists, and what it takes to land on the small end of it in production.
We’ll cover why log analytics is unusually well-suited to a columnar layout — access patterns that are almost always per-bundle and time-windowed, compression headroom on raw log messages, and the operational economics that fall out of those two — and the four schema patterns that did the real work in production:
| Schema pattern | What it does | What it bought us |
|---|---|---|
Delta+ZSTD codec stacking |
Stacks delta encoding under ZSTD on log columns | 9.76 TB raw → 480 GB on disk (20.8×); up to 993× on monotonic IDs |
LowCardinality(String) |
Dictionary-encodes high-frequency strings (levels, hosts, services) | Smaller marks, faster filters, better cache hit rate |
tokenbf_v1 skip indexes |
Bloom-filter-based skip indexes on tokenized log text | Replaces full-text indexing for substring search |
Monthly partitions + ttl_only_drop_parts=1 |
Drops whole parts on TTL instead of mutating | Self-maintaining cluster, no DBA |
The cluster today holds 154 billion rows across logs, metrics, traces, and AI-generated incident reports, sustains 870k+ inserts/sec on a single node, and runs without a dedicated DBA. We’ll close with the one search-latency trade-off we accepted to land here, what it cost, what it didn’t, and the framework we now use to re-evaluate it quarterly.
Delta+ZSTD, LowCardinality, tokenbf_v1, partition-aligned TTLs) that delivered the 37× disk reduction.SREs, platform engineers, and DBAs operating high-volume log telemetry. Engineering managers and architects evaluating storage engines for petabyte-class observability, log search, or AI workloads. Anyone running an inverted-index backend today who suspects the access patterns of their workload have outgrown the model they started with.
Sohham Seal — SDE-2 at Nutanix on the Panacea AI platform; works on the columnar ingestion and query layer that powers AI-driven incident triage across Nutanix’s customer fleet. His recent AI-related work includes GNN-based recommendation systems, biometric security, and EMG-based gesture prediction (IEEE TIFS, PCEMS Best Paper). Will present in Bangalore.
Mohit Gurnani — SDE-4 at Nutanix; architects and leads Panacea.AI, an agentic AI platform processing 20 PB+ of observability data annually across 29,000+ enterprise clusters — ClickHouse for high-cardinality analytics, Kubernetes-native event-driven pipelines, and LangGraph agents on top. IEEE-published author; previously presented at ICDMAI 2017. Designed the migration described in this talk. Will join Q&A virtually from Columbus, OH.
Hosted by
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}