Cache Me If You Can: Building a Lightning-Fast Analytics Cache

Submitted Mar 30, 2025

Topic of your submission: Platform engineering Type of submission: 30 mins talk I am submitting for: Rootconf Annual Conference 2025

This session will talk about how we build an ETL free, asynchronous, on-demand, analytics results caching platform to power our CNAPP and XDR platform. In this session we will also talk about what were our unique challeneges in terms of performance, scale, usability, access control and most importantly state and nature of the product that necessitated buildling the solution we did. Some of which were an attribute of our analytical engine running 500k+ analytics queries scanning 500TB+ data daily on exaByte scale Dtalake storing data in HDFS + S3 , with dashboard and results repeat access running into 1000s per day. This Distributed caching sits on 5 (and horizontally scaleable) - 8 core,32GB boxes with a 500TB high thoughtput and high IOPS disk attached to each.

We will also touch upon how we used OSS like Alluxio. duckdb, trino to build this platform but the lessons learnt are applicable in a wider platform and data engineering domains and finally how this platform ended up giving us a path to a more reliable, simple and scalable data management system which would unify some of our databases. It is also a popular blog published recently : https://medium.com/allthatscales/cache-me-if-you-can-building-a-lightning-fast-analytics-cache-at-terabyte-scale-c6beb58b24ce

Finally, we will also talk about how this platform alligned with our need to incorporate GenaI into our product to improve our customers usability of the product.

This session would be benifical for any data engineering enthusiast and practioner, along with people interested in learning how to build a scalabale analytics platform.

I am the Head of data and platform engineering at Uptycs, Inc.- a CNAPP and XDR platform company that develops cycber security solutions for:EDR,XDR,CWPP,CIEM,CSPM,KSPM,SSM,SCSM,AISPM,DSPM... and more ... with an aim to provide a unified platform that gives an enterprise ability to manage security of its entire infrastructure from code to cloud. What it translates to is a data plat that ingests 100+Million EPS, close to Petabyte of data daily, runs 500k+ queries daily that ends up scanning 500TB+ data daily on a dataplatform that is touching exaByte scale.
I have multiple patents granted/pending in using bigdata/ML for solving problems in cyber security domain and my expertise lies in building scalable and cost efficient platforms and products.

I will be copresenting with an Architect in Data engineering team - https://www.linkedin.com/in/sureshkumar-veerapathiran/ who is coauther of the blog above.

Rootconf 2025 Annual Conference CfP

Cache Me If You Can: Building a Lightning-Fast Analytics Cache

Comments