The Fifth Elephant 2024 Annual Conference (12th &13th July)

Maximising the Potential of Data — Discussions around data science, machine learning & AI

Mukund Tripathi

Shikhar Vaish

@shikharvaish

Rahul Madan

Rahul Madan

@rahulmadan

Bichitra Kumar Sahoo

Bichitra Kumar Sahoo

@bichitra

Ensuring Data Quality with Data Contracts and OpenLineage

Submitted Jun 10, 2024

Abstract

In the modern data landscape, ensuring data quality and integrity is paramount. This conference will explore the concept of Data Contracts as a schema registry, incorporating data quality (DQ) checks and leveraging OpenLineage to capture compliance failures. By implementing Data Contracts, organizations can enforce strict data quality standards and track lineage to understand the impact of any discrepancies. This approach not only enhances data reliability but also provides clear visibility into data workflows, facilitating better decision-making and accountability.

  1. Introduce Data Contracts: Explain the concept of Data Contracts and their role as a schema registry to enforce data standards.
  2. Implement Data Quality Checks: Demonstrate how to incorporate data quality checks within Data Contracts to ensure data integrity.
  3. Leverage OpenLineage: Explore the use of OpenLineage for capturing and visualizing data lineage, highlighting the impact of DQ compliance failures.
  4. Practical Applications: Showcase real-world examples and case studies where Data Contracts and OpenLineage have improved data governance and quality.
  5. Future Trends: Discuss the future of data quality management and the evolving role of schema registries and lineage tracking in data ecosystems.

Audience

  • Data Engineers: Professionals responsible for designing, building, and maintaining data pipelines.
  • Data Scientists: Individuals focused on extracting insights from data and ensuring the quality of their analyses.
  • Data Analysts: Analysts who rely on high-quality data for accurate reporting and decision-making.
  • Data Governance Teams: Teams focused on ensuring data policies, standards, and compliance are met within an organization.
  • IT Managers: Managers overseeing data infrastructure and operations.
  • Compliance Officers: Professionals ensuring data practices comply with relevant regulations and standards.

Agenda

Part 1: Introduction and Fundamentals

  • The Importance of Data Quality in Modern Organizations
  • Understanding Data Contracts: Definition and Benefits
  • Integrating Data Quality Checks within Data Contracts
  • Introduction to OpenLineage: Concepts and Architecture
  • Capturing Compliance Failures with OpenLineage
  • Real-world Applications of Data Contracts and OpenLineage

Part 2: Advanced Techniques and Case Studies

  • Implementing Data Contracts in Your Organization: Best Practices
  • Automating Data Quality Checks: Tools and Techniques
  • Visualizing Data Lineage: Tools and Strategies
  • How customer X Improved Data Quality with Data Contracts
  • Setting up and leveraging OpenLineage for Impact Analysis and Operational Metadata for customer Y
  • The Evolving Role of Schema Registries and Lineage Tracking in Data Ecosystems

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jump starting better data engineering and AI futures

Supported by

Gold Sponsor

Atlassian unleashes the potential of every team. Our agile & DevOps, IT service management and work management software helps teams organize, discuss, and compl

Silver Sponsor

Together, we can build for everyone.

Workshop sponsor

Datastax, the real-time AI Company.

Lanyard Sponsor

We reimagine the way the world moves for the better.

Sponsor

MonsterAPI is an easy and cost-effective GenAI computing platform designed for developers to quickly fine-tune, evaluate and deploy LLMs for businesses.

Community Partner

FOSS United is a non-profit foundation that aims at promoting and strengthening the Free and Open Source Software (FOSS) ecosystem in India. more

Beverage Partner

BONOMI is a ready to drink beverage brand based out of Bangalore. Our first segment into the beverage category is ready to drink cold brew coffee.