Shril Kumar

@shril

Cut the Thread: Deleting Customers Without Touching the Data

Submitted Jun 20, 2026

Abstract

You might have submitted request to Google or Meta (Facebook) to delete your personal data. Delete this customer’s data sounds like a one-line request. At petabyte scale it can spawn a job that runs for days and burns serious compute. Once data lands in cloud object storage (S3, GCS, Azure Blob) as immutable files, a single GDPR or CCPA deletion request means rewriting and repacking those files, chasing the same IDs as they spread into copies and downstream tables, then proving to auditors that every copy is gone. As privacy law tightens and AI pipelines fan personal data into ever more derived datasets, brute-force deletion is becoming both fragile to operate and hard to justify on cost. This session makes a simple argument: don’t delete the data, cut the thread that ties it to a person.

The core design pattern is Identifier Severance: store every record against a stand-in “virtual” ID, keep the lookup table that links it to the real account isolated, and “delete” someone by destroying their row in that table, orphaning the historical records with no path back to the individual, and collapsing a lake-wide rewrite into a small, fast, audited change.

The secondary pattern for the most sensitive fields, we layer on Crypto-Shredding: encrypt each subject’s data under its own key and destroy the key on request. The concept is the easy part; this talk digs into where it breaks, why severed-link data can still be personal data, why the guarantee collapses if a single copy of the old mapping survives in a backup, replica, or log, and how residual signals like device or location can still re-identify someone. You’ll leave knowing when severing the link is enough and when you have to destroy the keys outright.

Key Takeaways

  1. A design pattern for “delete the link, not the data” on object-store lakes: identifier severance as the default, cryptographic erasure for the most sensitive fields. The patterns are tool-agnostic, grounded in public standards, and shaped by running erasure at petabyte scale under real deletion deadlines.
  2. A decision framework for which technique to apply where, driven by data sensitivity, query needs, and how strong an erasure guarantee you have to defend. A checklist of the failure modes that turn “deletion” into theater: a surviving mapping in a backup, replica, or log, and identifiers that leaked into derived datasets before severance.

Who Should Attend

Data platform owners, data engineers, and architects running compliant enterprise platforms, anyone who has to honor GDPR / CCPA erasure at lake scale across immutable storage and a sprawl of derived datasets. No privacy-law background needed; comfort with the basics of a data lake is enough.

Bio

Shril Kumar is a Senior Software Engineer working @Roku.
He works for the Ad-Tech Data Platform which powers 100M+ Roku devices worldwide. He has 8 years of work-experience, and prior to this, he helped build Marketing Platform at Groupon.

{Add the link to draft slides - PDF/PPT - with comments access}

{Add the link to 2-min elevator pitch video}

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jumpstart better data engineering and AI futures