Call for submissions: Platform Engineering Meet-ups

Call for submissions: Platform Engineering Meet-ups

Share your journey of building platforms that power engineering teams

Amit Khanna

Oxygen: Breathing Simplicity into ETL at Scale

Submitted Oct 2, 2025

At InMobi, the data platform powers a diverse ecosystem of analytics and machine learning workloads, processing billions of events daily. To orchestrate, optimize, and scale these ETL pipelines, we leverage Apache Airflow for workflow management, Apache Spark for distributed SQL-based data processing, and Kubernetes for dynamic infrastructure provisioning.

While these technologies are powerful individually, operating them seamlessly at scale requires more than just integration—it demands a unified framework to ensure reliability, observability, and efficiency.

Challenges We Faced

High Learning Curve – Every engineer had to understand both Airflow and Spark constructs in depth before building pipelines.

Complex Dependency Management – Every application is unique in its nature; generalizing dependencies, creating groupings, and managing ConfigMaps was becoming increasingly challenging.

DAG Redundancy – Common DAG code and features was repeatedly written across teams, slowing productivity and creating maintenance overhead.

Unoptimized Execution – Spark jobs often ran with inefficient configurations, leading to unnecessary cost and performance bottlenecks.

Limited Observability – Lack of consistent logging, metrics, and monitoring made debugging production ETLs difficult.

Data Discoverability Gaps – Inconsistent data definitions resulted in redundant datasets and repeated feature generation.

Our Solution: Oxygen Framework

To address these challenges, we built Oxygen, InMobi’s internal ETL framework, designed to simplify how teams define and operate data pipelines. Oxygen provides:

Abstraction on Airflow DAGs – Developers declare ETL workflows in a simplified, standardized format, while Oxygen translates them into production-ready DAGs.

Containerization of Applications– All dependencies are packaged within containers, ensuring consistent runtime environments and eliminating version conflicts.

Standardized Input/Output Contracts – Data sources and sinks are defined in a consistent way, enabling seamless integration with internal data catalogs.

Observability by Design – Unified logging, metrics, and alerting across Airflow, Spark, and Kubernetes to help teams debug and monitor pipelines with ease.

Intelligent Optimization – Oxygen automatically tunes Spark job execution parameters and resource allocation to balance cost, performance, and reliability.

Best Practices Enforcement – Built-in validations and guardrails ensure teams follow proven engineering practices when authoring ETLs.

Disaster Recovery Support – Easy switchover for kubernetes and airflow cluster.

Driving Adoption at Scale

With over 1000+ DAGs already running, teams had built pipelines in multiple ways, following diverse standards. Achieving adoption required seamless migration without burdening users.

To address this, we built migration automation that involves:

AST-Based Conversion – Converted existing DAGs into abstract syntax trees (AST), extracted components, and mapped them to their Oxygen counterparts.

Automated PR Generation – The automation converted DAGs, raised pull requests with the updated Oxygen versions.

Validation Framework – Automation to compare old and new dags for higher confidence in migration and the final data validation to signoff.

This approach ensured migration was automated, reliable, and minimally disruptive, enabling organization-wide adoption of Oxygen at scale.

Success Criteria

To measure the effectiveness of Oxygen, we defined success along the following dimensions:

Ease of Migration – Minimal time and effort required by client teams to migrate existing DAGs.

Ease of Use & Standardization – Significant reduction in time taken to write new DAGs using Oxygen’s abstractions and standardized patterns.

Observability & Insights – Clear visibility into job performance, with proactive alerts to highlight failures, bottlenecks, or anomalies.

Performance Optimization – Centralized configurations and an intelligent recommendation system driving consistent improvements in resource usage and runtime efficiency.

Dependency Management – Simplified packaging of dependencies within applications, reducing conflicts and ensuring consistency across environments.

Audience & Takeaway

Target Audience: Data engineers, platform engineers, and architects interested in modernizing ETL pipelines with Airflow, Spark, and Kubernetes.

Key Takeaway: How to move beyond vanilla Airflow + Spark deployments and build a scalable, observable, and developer-friendly ETL platform, with automation-driven adoption at scale.

Speaker Bio

Amit Khanna is a Staff Engineering Manager at InMobi, leading the Central Platform team. He and his team build scalable solutions to manage data and machine learning models, powering InMobi’s in-house data and ML platforms.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

We care about site reliability, cloud costs, security and data privacy