Submissions

Call for submissions: Platform Engineering Meet-ups

Share your journey of building platforms that power engineering teams

Accepting submissions

Not accepting submissions

Dev Kulkarni Co-author

Sakib Malik Co-author

Worker Controller: A Multi-tenant Consumer Proxy for Consumption at Scale

Session Description At Zomato, the journey from a customer placing an order to its fulfillment, and beyond, depends on a complex web of systems running seamlessly in the background. To power this at a massive scale, we rely on an event-driven architecture built on Kafka and SQS. With thousands of topics and queues in production — and new ones being added almost daily as teams roll out features or… more
  • 6 comments
  • Submitted
  • 01 Oct 2025
Session type: Talk (30 mins)

Deepak Gupta

Video thumbnail

Unleashing BuildPiper's Magic for Lightning-Fast Enterprise Delivery Pipelines

In today’s fast-paced development landscape, platform engineers are the unsung heroes building robust internal platforms that empower teams to deliver software at scale. This session dives deep into BuildPiper, a fully managed Kubernetes and application delivery platform that transforms chaotic CI/CD workflows into seamless, secure pipelines. We’ll explore how BuildPiper’s intuitive setup, real-t… more
  • 0 comments
  • Submitted
  • 01 Oct 2025
Session type: Demo (5-10 mins)

Amit Khanna

Oxygen: Breathing Simplicity into ETL at Scale

At InMobi, the data platform powers a diverse ecosystem of analytics and machine learning workloads, processing billions of events daily. To orchestrate, optimize, and scale these ETL pipelines, we leverage Apache Airflow for workflow management, Apache Spark for distributed SQL-based data processing, and Kubernetes for dynamic infrastructure provisioning. more
  • 3 comments
  • Submitted
  • 02 Oct 2025
Session type: Talk (30 mins)

Vivek Singh

TARA : GenAI Customer Assistant on a Platform-First Stack

TARA (Travel Assistance & Recommendation Agent) is an AI-first, platform-driven chatbot that empowers business teams to create, deploy, and iterate conversational flows without developer intervention. This talk presents the architecture, engineering lessons, and real-world impact of TARA — covering platform-based flow management, workflow automation at scale (300k executions/day), hybrid guided +… more
  • 1 comment
  • Submitted
  • 11 Oct 2025
Session type: Talk (30 mins)

Sahiti

No HR, No Budget, No BDFL: Infrastructure Patterns for Volunteer Communities

I’ve learned some things from volunteering to run infrastructure for communities I care about. Like any other tech support story, there’s a lot of toil; and trying to automate toil always brings up novel engineering challenges. more
  • 1 comment
  • Submitted
  • 14 Oct 2025
Session type: Talk (30 mins)
Ravindra Harish

Ravindra Harish

CASE STUDY: Banking Tech Observability - 100% migration from AppDynamics to Open-Source

Our session will cover the practical journey of migrating a large-scale Banking mobile application and netbanking platform from AppDynamics to a fully open-source observability stack. We will provide an in-depth walkthrough of how we leveraged OpenTelemetry for distributed tracing, adopted Micrometre and prom-client for robust metrics collection, and integrated Istio to tap into network/data plan… more
  • 1 comment
  • Submitted
  • 16 Oct 2025
Session type: Talk (30 mins)

Dipal Patel

Building an Internal Developer Platform: From Jenkins to Self-Service CI/CD

Session Abstract When your Internal Developer Platform’s CI/CD infrastructure becomes a developer productivity bottleneck—5 instances, 2-3k builds daily, $50k annual costs, and compliance gaps—you need a platform engineering approach that works. This session covers our real-world journey from Jenkins to GitHub Actions, focusing on how we built a self-service, scalable IDP that empowers developers… more
  • 2 comments
  • Submitted
  • 17 Oct 2025
Session type: Talk (30 mins)

Vishnu Raj V

Signal > Noise: An InMobi Story

Elevator Pitch We turned observability from “more dashboards” into a capability that engineers use every day. This talk shares how InMobi runs Prometheus+Thanos, Loki+Promtail, and Sentry, the bumps we hit, the fixes that worked, what still hurts, and the practical patterns that cut noise/cost and improve MTTR—without a platform rewrite. We’ll also cover how GitOps made observability a product en… more
  • 1 comment
  • Submitted
  • 17 Oct 2025
Session type: Talk (30 mins)

Nagesh

Sustainability @Scale

InMobi Ad Exchange operates at a vast scale with infrastructure stats like 500K vCPUs, 2M GB RAM, 5 PB data storage, 1.5M GB daily network egress, and 2.5M GB daily load balancer processing. These resources result in a cloud bill resembling a mobile number. more
  • 2 comments
  • Submitted
  • 17 Oct 2025
Session type: Talk (30 mins)

Saurabh Jain

Bringing Order to Distributed Chaos: A Functional Architecture for Multi-Tenant Task Execution

Session Description Elastic compute environments promise flexibility, but achieving determinism and observability at scale is far from trivial. Workloads often involve shared datasets, unpredictable runtimes, and heterogeneous client configurations — all of which challenge traditional scheduling and orchestration paradigms. more
  • 3 comments
  • Submitted
  • 18 Oct 2025
Session type: Talk (30 mins)

Jinu Rose John

Building Reliable Services with SLOs and Error Budgets

Session Overview: In this session, we will delve into the essential concepts of Service Level Objectives (SLOs) and Error Budgets within the framework of Site Reliability Engineering (SRE). Attendees will learn how to effectively establish structured SLOs that align with user expectations and business goals, allowing for a strategic balance between maintaining high system reliability and enabling… more
  • 1 comment
  • Submitted
  • 17 Dec 2025
Session type: Talk (30 mins)
Shubham Patil

Shubham Patil

Sub-Second Decisions: Design Principles for Real-Time Go Workflow Engines on GCP for Martech Scale

Description: Processing massive event streams for real-time Martech decisions, like session personalization, DCO, NBA, personalized ad targeting etc., frequently requires complex, stateful logic best modeled as workflows under strict sub-second latency, but traditional engines can introduce unacceptable overhead. Building such systems on modern cloud platforms presents challenges: how do you orch… more
  • 2 comments
  • Submitted
  • 18 Dec 2025
Session type: Talk (30 mins)

Vivek Pemawat

Designing Scalable Unit Test Reporting for Distributed Systems

Describe your session (2 paragraphs) Modern engineering teams run thousands of unit tests across multiple services and repositories, yet test reporting often remains fragmented, noisy, and hard to act upon. Developers struggle with flaky tests, poor failure context, oversized logs, and limited visibility into trends across builds and teams. As systems scale, these gaps significantly impact develo… more
  • 0 comments
  • Submitted
  • 19 Dec 2025
Session type: Demo (5-10 mins)

Siddhant N Trivedi

Disaster Recovery for Stateful K8s Workloads with Portworx DR

Abstract Disaster recovery (DR) is no longer optional—especially when dealing with stateful workloads in Kubernetes environments. In this talk, I’ll dive into how Portworx DR makes it easier to build a resilient infrastructure by enabling efficient disaster recovery strategies tailored for Kubernetes. more
  • 1 comment
  • Submitted
  • 20 Dec 2025
Session type: Vendor pitch - dev tool/solution (15 mins)

Snehasish Roy

Clockwork: The Backbone of PhonePe’s 2 Billion Daily Job Execution Platform

Overview Have you ever had an alarm fail to wake you up, causing a ripple effect of chaos in your morning? At PhonePe, we understand the criticality of such ‘alarms’ in our digital ecosystem. more
  • 0 comments
  • Submitted
  • 25 Dec 2025
Session type: Talk (30 mins)

Srinivas Anant

Observability in Kubernetes: The good, the bad, and the ugly

Overview As our Kubernetes footprint grew, so did the challenge of understanding what was happening inside our cluster. We found ourselves drowning in logs from multiple sources, missing critical alerts, and struggling to connect the dots between metrics, logs, and events when things went wrong. more
  • 0 comments
  • Submitted
  • 02 Jan 2026
Session type: Talk (30 mins)

Hosted by

We care about site reliability, cloud costs, security and data privacy