Mayur Jadhav

@mjadhav13

An Agent That Builds Agents: AI-Powered Recipe Generation for Local-First ETL

Submitted Jun 24, 2026

An Agent That Builds Agents: AI-Powered Recipe Generation for Local-First ETL

Session Description

Most organisations have run Spark or Glue jobs for datasets that comfortably fit in a single machine’s memory.
For about five years now, the real bottleneck has not been infrastructure; it has been tooling. OrcaSheets is a desktop application that lets analysts build full ETL workflows locally, with no cluster provisioning, no cloud egress, and no YAML configs. The remaining challenge was adoption: our internal DSL for transformation pipelines could not be exposed directly to users, and the unlearning curve had to be smooth. So we built an agent that generates agents.
The Recipe Engine takes natural language descriptions and produces validated, repeatable pipelines, not just SQL but complex algorithms like k-means clustering and time-series forecasting, all running on a Rust binary with an extremely small footprint. For 90% of real-world workloads, OrcaSheets is a legitimate alternative to cloud ETL.

This talk challenges the default assumption that data engineering requires distributed systems. We’ll show production benchmarks comparing local execution against equivalent Glue/Spark jobs and demonstrate the recipe generation system end-to-end: from a messy 2GB CSV upload through AI-generated transformations to a clean, versioned output with clustering
insights. We’ll dive into the meta-problem of building an agent that understands your internal DSL well enough to generate other agents (recipes), the prompt engineering, validation loops, and guardrails that make this reliable in production. The audience will leave with a concrete framework for deciding when local-first is the right architecture and how an AI-powered recipe engine eliminates the boilerplate that makes ETL expensive.

Takeaways

  1. How to build an “agent that creates agents”: the architecture of a recipe generation engine that translates natural language into validated, repeatable transformation pipelines using an internal DSL, including the prompt engineering and validation patterns that make it production-reliable.
  2. A practical framework for when local-first analytics replaces cloud ETL (Spark/Glue), with real cost comparisons showing 10-50x savings and how AI-generated recipes reduce pipeline development time from days to minutes while keeping data fully on-premises.

Target Audience

Data engineers exploring alternatives to cloud ETL for mid-size datasets; ML engineers interested in local-first analytical tooling; platform engineers building internal DSLs or agent-powered developer tools; anyone building AI agents that generate structured, validated outputs rather than freeform text.

Bio

Mayur, Founder at Orcasheets, DataOrc. I build practical, scale-proof systems, starting from QA, through backend/frontend, to co-founding one of the fastest analytics engines today.

At OrcaSheets, I’m making analytics that run locally on your PC, handling 100M+ rows in seconds and turning raw data into insights, dashboards, and decisions.

Before this, I co-founded DataOrc, a data engineering consultancy that’s delivered for 75+ enterprises, helping them scale data pipelines, cut cloud bills by up to 90%, and stay production-grade at petabyte scale.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jumpstart better data engineering and AI futures