Navdeep Agarwal

@orcanavdeep

Mayur Jadhav

@mj_jadhav13

Big Data Analytics on Tiny Machines

Submitted Mar 2, 2026

CFP: Big Data Analytics on Tiny Machines

Talk Title

Big Data Analytics on Tiny Machines: How Rust is Ending the Cloud-First Lie

Abstract

For decades, the data industry has sold us a convenient lie: “Your data is too big — you need a cluster.” We’ve been conditioned to spin up Spark clusters, provision cloud warehouses, and burn through compute budgets for workloads that a single machine could handle — if only the tools were built right.

This talk challenges that narrative. At OrcaSheets, we replaced cloud-heavy Spark pipelines with a Rust-native analytics engine running on a laptop. Multi-gigabyte datasets. Complex SQL queries. Dataframe transforms. All local. All fast. No cluster required.

I’ll walk through how we built this using Rust’s analytics ecosystem — DataFusion for SQL, Polars for dataframe operations, and Apache Arrow as the shared in-memory format — and why this stack is fundamentally impossible to replicate in garbage-collected languages.

What You’ll Learn

The Lie We’ve Been Told

  • Why the “big data” narrative pushed us toward distributed systems we never needed
  • The real bottleneck was never hardware — it was runtime overhead, memory bloat, and GC pauses
  • How 90% of “big data” workloads fit comfortably on a single modern machine

What We’re Letting Go

  • Cloud-first architectures for analytical workloads that don’t need them
  • JVM-based engines (Spark, Flink) where GC pauses destroy latency predictability
  • The assumption that scaling out is cheaper than scaling smart
  • Data leaving your machine just to be queried

What We’re Gaining

  • Zero-copy data pipelines — Arrow buffers shared across DataFusion and Polars without serialization
  • Deterministic memory management — no GC pauses, predictable performance on every query
  • Local-first analytics — data stays on your machine, queries run in milliseconds
  • Rust’s ownership model as a data engineering superpower — memory safety without a runtime tax

Why Only Rust Makes This Possible

  • Zero-cost abstractions that let you write high-level analytics code with bare-metal performance
  • No garbage collector — when you’re processing gigabytes of columnar data, GC pauses aren’t a nuisance, they’re a dealbreaker
  • Zero-copy Arrow interop — DataFusion query results flow directly into Polars transforms without copying a single byte
  • Fearless concurrency for parallel query execution across cores, not clusters
  • Compile-time guarantees that eliminate an entire class of runtime failures in data pipelines

The Future: Local-First Data Analytics

  • Why the pendulum is swinging from cloud-back-to-local for analytical workloads
  • The emerging Rust-native analytics stack: DataFusion + Polars + Arrow + Iceberg
  • How WASM extends this to the browser — same Rust engine, every platform
  • What this means for data privacy, latency, and cost

Talk Format

30-minute talk with live demo

Speaker Bio

Navdeep, Co-Founder of Dataorc and OrcaSheets. Built and scaled Dataorc to 60+ enterprise clients, currently managing critical ONDC infrastructure processing 10+ million transactions daily across 15TB+ of data. At OrcaSheets, building a Rust-native local-first analytics platform using DataFusion, Polars, Arrow, and Tauri — proving that big data analytics doesn’t need big infrastructure.

Target Audience

Rust developers, data engineers, and anyone skeptical of their cloud bill.

Key Takeaway

The next generation of data analytics won’t run on clusters — it’ll run on your laptop, written in Rust.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

A community of Rust language contributors and end-users from Bangalore. We have presence on the following telegram channels https://t.me/RustIndia https://t.me/fpncr LinkedIn: https://www.linkedin.com/company/rust-india/ more