Enterprise-ready & Compliant Synthetic Data Generation for Data Governance

Sep 2025

15 Mon 11:00 AM – 11:59 PM IST

16 Tue 11:00 AM – 11:59 PM IST

17 Wed 11:00 AM – 11:59 PM IST

18 Thu 11:00 AM – 11:59 PM IST

19 Fri 11:00 AM – 11:59 PM IST

20 Sat 11:00 AM – 11:59 PM IST

21 Sun 11:00 AM – 11:59 PM IST

Sep 2025

22 Mon 11:00 AM – 11:59 PM IST

23 Tue 11:00 AM – 11:59 PM IST

24 Wed 11:00 AM – 11:59 PM IST

25 Thu 11:00 AM – 11:59 PM IST

26 Fri 11:00 AM – 11:59 PM IST

27 Sat 11:00 AM – 11:59 PM IST

28 Sun 11:00 AM – 11:59 PM IST

Sep 2025

29 Mon 11:00 AM – 11:59 PM IST

30 Tue 11:00 AM – 11:59 PM IST

1 Wed 11:00 AM – 11:59 PM IST

2 Thu 11:00 AM – 11:59 PM IST

3 Fri 11:00 AM – 11:59 PM IST

4 Sat 11:00 AM – 11:59 PM IST

5 Sun 11:00 AM – 11:59 PM IST

Oct 2025

6 Mon 11:00 AM – 11:59 PM IST

7 Tue 11:00 AM – 11:59 PM IST

8 Wed 11:00 AM – 11:59 PM IST

9 Thu 11:00 AM – 11:59 PM IST

10 Fri 11:00 AM – 11:59 PM IST

11 Sat 11:00 AM – 11:59 PM IST

12 Sun 11:00 AM – 11:59 PM IST

Oct 2025

13 Mon 11:00 AM – 11:59 PM IST

14 Tue 11:00 AM – 11:59 PM IST

15 Wed 11:00 AM – 11:59 PM IST

16 Thu 11:00 AM – 11:59 PM IST

17 Fri 11:00 AM – 11:59 PM IST

18 Sat 11:00 AM – 11:59 PM IST

19 Sun 11:00 AM – 11:59 PM IST

Oct 2025

20 Mon 11:00 AM – 11:59 PM IST

21 Tue 11:00 AM – 11:59 PM IST

22 Wed 11:00 AM – 11:59 PM IST

23 Thu 11:00 AM – 11:59 PM IST

24 Fri 11:00 AM – 11:59 PM IST

25 Sat 11:00 AM – 11:59 PM IST

26 Sun 11:00 AM – 11:59 PM IST

Oct 2025

27 Mon 11:00 AM – 11:59 PM IST

28 Tue 11:00 AM – 11:59 PM IST

29 Wed 11:00 AM – 11:59 PM IST

30 Thu 11:00 AM – 11:59 PM IST

31 Fri 11:00 AM – 11:59 PM IST

1 Sat 11:00 AM – 11:59 PM IST

2 Sun 11:00 AM – 11:59 PM IST

Nov 2025

3 Mon 11:00 AM – 11:59 PM IST

4 Tue 11:00 AM – 11:59 PM IST

5 Wed 11:00 AM – 11:59 PM IST

6 Thu 11:00 AM – 11:59 PM IST

7 Fri 11:00 AM – 11:59 PM IST

8 Sat 11:00 AM – 11:59 PM IST

9 Sun 11:00 AM – 11:59 PM IST

Nov 2025

10 Mon 11:00 AM – 11:59 PM IST

11 Tue 11:00 AM – 11:59 PM IST

12 Wed 11:00 AM – 11:59 PM IST

13 Thu 11:00 AM – 11:59 PM IST

14 Fri 11:00 AM – 11:59 PM IST

15 Sat 11:00 AM – 11:59 PM IST

16 Sun 11:00 AM – 11:59 PM IST

Nov 2025

17 Mon 11:00 AM – 11:59 PM IST

18 Tue 11:00 AM – 11:59 PM IST

19 Wed 11:00 AM – 11:59 PM IST

20 Thu 11:00 AM – 11:59 PM IST

21 Fri 11:00 AM – 11:59 PM IST

22 Sat 11:00 AM – 11:59 PM IST

23 Sun 11:00 AM – 11:59 PM IST

Nov 2025

24 Mon 11:00 AM – 11:59 PM IST

25 Tue 11:00 AM – 11:59 PM IST

26 Wed 11:00 AM – 11:59 PM IST

27 Thu 11:00 AM – 11:59 PM IST

28 Fri 11:00 AM – 11:59 PM IST

29 Sat 11:00 AM – 11:59 PM IST

30 Sun 11:00 AM – 11:59 PM IST

Dec 2025

1 Mon 11:00 AM – 11:59 PM IST

2 Tue 11:00 AM – 11:59 PM IST

3 Wed 11:00 AM – 11:59 PM IST

4 Thu 11:00 AM – 11:59 PM IST

5 Fri

6 Sat

7 Sun

Enterprise-ready & Compliant Synthetic Data Generation for Data Governance

Submitted Nov 10, 2025

Type of submission: 15 mins talk

In AI driven workflows, a critical bottleneck is the scarcity of realistic training datasets—and strict privacy and security rules forbid using actual customer records as well. To address these challenges, we have developed an agentic synthetic data generation pipeline that produces domain-rich, realistic, and coherent datasets, for training PII (Personally Identifiable Information) and sensitive-reference detection models, preserving customer’s privacy. This LLM-driven workflow autonomously curates synthetic samples across varied industries—such as finance, healthcare, and legal—while incorporating guardrails to ensure that generated content remains non-toxic, unbiased, and contextually safe.

In this session, we will present NetApp’s end-to-end framework for detecting sensitive data and sensitive references, powered by synthetic data. Our approach demonstrates how synthetic datasets can effectively bridge the data availability gap while maintaining strong alignment with real-world linguistic patterns. Through extensive experimentation, we observed progressive improvements in detection accuracy as synthetic data volume and diversity increased. The session will delve into the architecture of the agentic pipeline, data quality validation strategies, and domain adaptation techniques. Attendees will gain insights into how synthetic data can enable responsible AI development, reinforce data governance, and ensure compliance without exposing or relying on real customer information.

Takeaways:

Real-world data scarcity no longer bottlenecks model training or fine-tuning—high-quality synthetic corpora can fill the gap. Diverse, coherent synthetic datasets are key to achieving robust, generalizable performance across domains.
By leveraging agentic synthetic-data generation, we create datasets that so closely mimic real-world documents they’re indistinguishable from genuine records—and we’ve observed consistent performance improvements with each increment of quality synthetic samples, motivating continued investment in this approach

Target audiences

This session will be particularly beneficial for machine learning engineers, data scientists, and AI researchers working on privacy-sensitive applications or responsible AI initiatives. It will also provide valuable insights for leaders/architects working in sensitive or high security domains where data governance and compliance play important role. Attendees from organizations dealing with regulated data—such as finance, healthcare, and government sectors—will gain an understanding of how synthetic data can be strategically leveraged to enhance model performance while maintaining strict privacy guarantees.

Authors

Presenter:
Darshan Adiga,
Senior Data Scientist at NetApp

Co-author:
Lakshya Daulani,
Data Scientist at NetApp

The Fifth Elephant 2025 Winter Edition Call for Proposals