The Fifth Elephant

The Fifth Elephant 2025 Annual Conference

Less hype. More engineering.

Jul 2025

14 Mon

15 Tue

16 Wed

17 Thu

18 Fri

19 Sat 08:45 AM – 05:55 PM IST

20 Sun

Bangalore International Centre, Bangalore

Tickets

All submissions

This submission has been added to the schedule

A Pragmatic Guide to Robust Data Quality Checks

Submitted May 16, 2025

I am submitting for: Speaking at the Fifth Elephant 2025 Annual Conference Type of submission: 30 mins talk Choose the topic your submission falls under: Data & ML Infrastructure track

Ignoring data quality introduces significant risks such as flawed insights and poor business outcomes. This 30-minute talk moves beyond simplistic validation, offering a journey through a multi-tiered data quality assurance approach. We’ll start with foundational checks: schema validation, data volume monitoring, and defining value ranges (high/low thresholds) for immediate outlier detection. These establish baseline reliability.

Next, the presentation explores advanced techniques for ensuring high-quality data. It covers anomaly-based checks for unusual pattern identification. The talk also covers inter-dataset consistency (e.g. different datasets matching the expectations that we’d have in terms of overlap). We’ll see how we can use custom metrics to monitor data drift.

This talk is designed to equip attendees with a clear understanding of how to layer these different checks to create a comprehensive and resilient data quality framework. The presentation will cover practical implementation considerations within data stacks and showcase examples using popular open-source data quality frameworks.

Key Takeaways:

Attendees will gain a clear framework for implementing a multi-layered data quality strategy, progressing from basic validation to advanced anomaly and drift detection.
Participants will understand how to identify and address common data quality pitfalls, including inter-data inconsistencies and data duplication, leading to more trustworthy data assets, with insights into practical tooling.

If you’ve struggled with poor quality data leading to data cascades, this talk is for you.

Speaker Bio

Anay Nayak is a Solution Consultant at Sahaj Software with over 20 years of experience driving innovation and success in the design and delivery of large-scale enterprise projects across diverse domains. Over the last 6+ years, he has been actively working on building data platforms and integrating data science models to deliver reliable and actionable business insights.

All submissions

Comments

Jul 2025

14 Mon

15 Tue

16 Wed

17 Thu

18 Fri

19 Sat 08:45 AM – 05:55 PM IST

20 Sun

Hybrid Access Ticket

Hosted by

The Fifth Elephant

Jumpstart better data engineering and AI futures

Supported by

Gold sponsor

Sahaj Software

Sahaj is an artisanal technology services company crafting purpose-built AI and data-led solutions for businesses.

Gold sponsor

Atlassian

Atlassian unleashes the potential of every team. Our agile & DevOps, IT service management and work management software helps teams organize, discuss, and compl

Gold sponsor