The Fifth Elephant 2025 Annual Conference CfP

The Fifth Elephant 2025 Annual Conference CfP

Speak at The Fifth Elephant 2025 Annual Conference

Anay Nayak

Anay Nayak

@anaynayak

A Pragmatic Guide to Robust Data Quality Checks

Submitted May 16, 2025

Ignoring data quality introduces significant risks such as flawed insights and poor business outcomes. This 30-minute talk moves beyond simplistic validation, offering a journey through a multi-tiered data quality assurance approach. We’ll start with foundational checks: schema validation, data volume monitoring, and defining value ranges (high/low thresholds) for immediate outlier detection. These establish baseline reliability.

Next, the presentation explores advanced techniques for ensuring high-quality data. It covers anomaly-based checks for unusual pattern identification. We’ll address “gradual drift”—subtle data distribution changes impacting models or analytics. The talk also covers inter-dataset consistency (e.g. different datasets matching the expectations that we’d have in terms of overlap ) and strategies for data uniqueness and deduplication.

This talk is designed to equip attendees with a clear understanding of how to layer these different checks to create a comprehensive and resilient data quality framework. The presentation will cover practical implementation considerations within modern data stacks and showcase illustrative examples using popular open-source data quality frameworks.

Key Takeaways:

  1. Attendees will gain a clear framework for implementing a multi-layered data quality strategy, progressing from basic validation to advanced anomaly and drift detection.
  2. Participants will understand how to identify and address common data quality pitfalls, including inter-data inconsistencies and data duplication, leading to more trustworthy data assets, with insights into practical tooling.

If you’ve struggled with poor quality data leading to data cascades, this talk is for you.

Speaker Bio

Anay Nayak is a Solution Consultant at Sahaj Software with over 19 years of experience driving innovation and success in the design and delivery of large-scale enterprise projects across diverse domains. Over the last 6+ years, he has been actively working on building data platforms and integrating data science models to deliver reliable and actionable business insights.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jump starting better data engineering and AI futures