CDC - Clear Data Colon: Flushing Out Bottlenecks for Smooth-Flowing Analytics

Submitted Mar 30, 2025

Topic of your submission: Distributed data systems Type of submission: 30 mins talk I am submitting for: Rootconf Annual Conference 2025

This Session we will talk about how we used change data capture as a means to scale up perfromance and reliability of analytics when combined with scaleable state management like HUDI and powerful OLAP engine like Trino, increase database reliability by offloading analytics workload and cater to our data governance needs.

The Session also touches on the aspects of architecture which made this solution not just a one-problem solution but a platform that cut across multiple teams to on board their workloads as well to same pattern without writting a single piece of code. The platform currently replicates 200+ tables (and growing) from 5+ different databases at sub-minute latency which resulted in reduction on 50000+ QPS on those databases, Improved analytical queries touching those DBs by 10X and resulted in reducing platform cost by 10% by reducing read-replica counts of DBs.

Some aspects of which are touched upon in this popular blog : https://medium.com/allthatscales/from-transactional-bottlenecks-to-lightning-fast-analytics-74e0d3fff1c0

This session also wishes to communicate the need of understanding data access pattern to figure out the right datamangement solution for you for cost and performance needs of your organization. One size does not fit all.

Any team struggling tounify their databases or trying to make their existing databases reliable should attend this session in order to take some data engineering concepts and ideas that can help them solve their problems without a continuity loss.

I am the Head of data and platform engineering at Uptycs, Inc.- a CNAPP and XDR platform company that develops cycber security solutions for:
EDR
XDR
CWPP
CIEM
CSPM
KSPM
SSPM
SCSM
AISPM
DSPM
... and more ... with an aim to provide a unified platform that gives an enterprise ability to manage security of its entire infrastructure from code to cloud. What it translates to is a data plat that ingests 100+Million EPS, close to Petabyte of data daily, runs 500k+ queries daily that ends up scanning 500TB+ data daily on a dataplatform that is touching exaByte scale. I will be presenting with https://www.linkedin.com/in/aakashsankritya/ - a brilliant Data engineer from Uptycs who recently moved to Swiggy - and coauthored the blog.

Rootconf 2025 Annual Conference CfP

CDC - Clear Data Colon: Flushing Out Bottlenecks for Smooth-Flowing Analytics

Comments