Call for submissions: Platform Engineering Meet-ups

Call for submissions: Platform Engineering Meet-ups

Share your journey of building platforms that power engineering teams

Vishnu Raj V

@vishnuraj555

Signal > Noise: An InMobi Story

Submitted Oct 17, 2025

Elevator Pitch

We turned observability from “more dashboards” into a capability that engineers use every day. This talk shares how InMobi runs Prometheus+Thanos, Loki+Promtail, and Sentry, the bumps we hit, the fixes that worked, what still hurts, and the practical patterns that cut noise/cost and improve MTTR—without a platform rewrite. We’ll also cover how GitOps made observability a product engineers own.

Talk outline

From Dashboards to Daily Habit
How we reframed observability as a product capability, not a tool shelf—owned by engineers, tied to SLOs and incident flow.

Metrics that Scale (Prometheus → Thanos)
Operational patterns that kept query cost/cardinality in check and made long-term metrics actually fast.

Logs that Answer Questions (Loki → Promtail)
Pipelines that turn noisy logs into incident timelines: structure, sampling, retention by value. 320TB of compressed logs for 7 days.

Errors & Traces that Drive Triage (Sentry)
Release health, ownership, and trace↔error linkage to route issues to the right team quickly. How we scaled to 13M Spans,300K Transactions,3M error and more

GitOps: Ownership & Guardrails
Repo-as-the-API for observability:

Self-serve templates for alerts, dashboards, and scrape configs

The Bumps (and Fixes)
Cardinality spikes, pricey queries, trace usability, uneven standards—what broke, how we fixed it, what still hurts.

Playbooks, You Can Copy
Label budgets, naming conventions, downsampling/query patterns, log filters/sampling, opinionated alert design, SLO guardrails.

Results and Trade-offs
Impact on MTTR, alert volume, query cost; where we chose “good enough” over perfect.

Q&A / Show-and-Tell
Real dashboards, rules, and pipelines—brief live walkthrough, then questions.

Target Audience

Platform/SRE and FullStack engineers running observability stacks

DevOps teams owning on-call and incident response

Backend engineers responsible for service reliability

Staff+ ICs shaping org-wide standards and tooling

Engineering managers driving SLOs and MTTR improvements

Key Takeaways

Capability over tools: Standards, ownership, and SLOs make signals actionable.

Cut noise, keep insight: Label hygiene, downsampling, and log pipelines reduce cost without losing detail.

Developer-first value: Faster triage, cleaner postmortems, and safer releases become routine.

Speaker Bio

SDE4 at InMobi (6 years), 14 years in backend/platform engineering. I build and operate our org-wide observability stack—Prometheus/Thanos for metrics, Loki/Promtail for logs, and Sentry for errors & traces—focused on practical reliability, developer velocity, and cost control.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

We care about site reliability, cloud costs, security and data privacy