This livestream is restricted
Already a member? Login with your membership email address
Dec 2025
1 Mon
2 Tue
3 Wed
4 Thu 09:00 AM – 05:15 PM IST
5 Fri
6 Sat
7 Sun
Submitted Nov 10, 2025
In AI driven workflows, a critical bottleneck is the scarcity of realistic training datasets—and strict privacy and security rules forbid using actual customer records as well. To address these challenges, we have developed an agentic synthetic data generation pipeline that produces domain-rich, realistic, and coherent datasets, for training PII (Personally Identifiable Information) and sensitive-reference detection models, preserving customer’s privacy. This LLM-driven workflow autonomously curates synthetic samples across varied industries—such as finance, healthcare, and legal—while incorporating guardrails to ensure that generated content remains non-toxic, unbiased, and contextually safe.
In this session, we will present NetApp’s end-to-end framework for detecting sensitive data and sensitive references, powered by synthetic data. Our approach demonstrates how synthetic datasets can effectively bridge the data availability gap while maintaining strong alignment with real-world linguistic patterns. Through extensive experimentation, we observed progressive improvements in detection accuracy as synthetic data volume and diversity increased. The session will delve into the architecture of the agentic pipeline, data quality validation strategies, and domain adaptation techniques. Attendees will gain insights into how synthetic data can enable responsible AI development, reinforce data governance, and ensure compliance without exposing or relying on real customer information.
This session will be particularly beneficial for machine learning engineers, data scientists, and AI researchers working on privacy-sensitive applications or responsible AI initiatives. It will also provide valuable insights for leaders/architects working in sensitive or high security domains where data governance and compliance play important role. Attendees from organizations dealing with regulated data—such as finance, healthcare, and government sectors—will gain an understanding of how synthetic data can be strategically leveraged to enhance model performance while maintaining strict privacy guarantees.
Presenter:
Darshan Adiga,
Senior Data Scientist at NetApp
Co-author:
Lakshya Daulani,
Data Scientist at NetApp
Hosted by
Supported by
Masterclass sponsorship
Round table partners
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}