Building Realtime CDC and Fabric Mirroring(Streaming data) at scale : Solving Replication Lag and Schema Evolution in Real-Time Data Platforms

Jul 2026

27 Mon

28 Tue

29 Wed

30 Thu

31 Fri 09:00 AM – 06:00 PM IST

1 Sat

2 Sun

Building Realtime CDC and Fabric Mirroring(Streaming data) at scale : Solving Replication Lag and Schema Evolution in Real-Time Data Platforms

Submitted Jun 25, 2026

I am submitting for: Track 1 - Data engineering & infrastructure Type of session: 30 mins talk

{Describe your session in 2 paragraphs
Real-time analytics platforms promise fresh data without complex ETL pipelines, but operating them at cloud scale introduces a very different set of challenges. In this talk, we share lessons learned while building Microsoft Fabric Mirroring, a system that continuously replicates Azure SQL Database workloads into OneLake with near real-time latency eliminating the need for complex ETL pipelines and enabling immediate analytics experiences. We will start our discussion with a brief overview of fabri mirroring and how data gets replicated to one lake and then proceed to the challenges that we faced in the prod systems and how did we solve it. As adoption grew, we encountered production issues where transaction logs were being generated faster than replication could process them, leading to replication lag, log growth, and potential customer impact. We will discuss how we investigated bottlenecks, redesigned the log capture pipeline, introduced adaptive throughput tuning, and built auto reseed mechanism to prevent the log from getting full. We will also discuss intelligently fine tuning some of the parameters and effective use of log pool cache to increase the capture speed.

We will also dive into one of the hardest problems in continuously replicated systems: schema evolution. Production databases constantly change through application deployments, DDL operations, and index maintenance. Supporting these changes without interrupting replication required fundamental architectural changes in metadata propagation, consistency management, and failure recovery. Through real incidents, design trade-offs, and operational lessons, attendees will gain practical insights into building large-scale data replication systems that remain fast, reliable, and resilient in the face of constant change.

{Mention 1-2 takeaways from your session}

How replication is working and data is getting replicated to onelake
Techniques for handling replication lag and preventing log growth in large-scale streaming replication systems.
Architectural patterns for safely supporting schema evolution without disrupting real-time analytics.
How automated recovery systems such as Auto-Reseed improve reliability and reduce operational burden.

{Which audiences is your session going to beneficial for?}
Data platform engineers
Database engineers
Distributed systems engineers
Cloud infrastructure engineers
Architects building real-time analytics systems
Engineers operating CDC, replication, and data movement platforms
Engineering leaders interested in large-scale reliability and operational excellence

{Add your bio - who you are; where you work}
Sourav Bhuwalka is a Senior Software Engineer at Microsoft working on Azure SQL Database replication and Fabric Mirroring.. His work includes building replication platform and infrastrucure that powers realtime CDC and fabric mirroring for thousands of Azure SQL Database customers while ensuring scalability, availability, and operational resilience.

{Add the link to draft slides - PDF/PPT - with comments access}
Will submit the ppt with the detailed solution if required and the talk gets selected
https://learn.microsoft.com/en-us/fabric/mirroring/overview
Draft slide : https://onedrive.live.com/personal/1F71BAA556871266/_layouts/15/Doc.aspx?sourcedoc={2EDE0031-D31A-492B-ABD3-5170ED64E539}&file=Fabric_Mirroring_CDC_at_Scale_FifthElephant2026.pptx&action=edit&mobileredirect=true

{Add the link to 2-min elevator pitch video}

Speak at The Fifth Elephant 2026 Annual Conference

Building Realtime CDC and Fabric Mirroring(Streaming data) at scale : Solving Replication Lag and Schema Evolution in Real-Time Data Platforms

Comments