CFP: Big Data Analytics on Tiny Machines

Talk Title

Big Data Analytics on Tiny Machines: How Rust is Ending the Cloud-First Lie

For decades, the data industry has sold us a convenient lie: “Your data is too big — you need a cluster.” We’ve been conditioned to spin up Spark clusters, provision cloud warehouses, and burn through compute budgets for workloads that a single machine could handle — if only the tools were built right.

This talk challenges that narrative. At OrcaSheets, we replaced cloud-heavy Spark pipelines with a Rust-native analytics engine running on a laptop. Multi-gigabyte datasets. Complex SQL queries. Dataframe transforms. All local. All fast. No cluster required.

I’ll walk through how we built this using Rust’s analytics ecosystem — DataFusion for SQL, Polars for dataframe operations, and Apache Arrow as the shared in-memory format — and why this stack is fundamentally impossible to replicate in garbage-collected languages.

What You’ll Learn

The Lie We’ve Been Told

Why the “big data” narrative pushed us toward distributed systems we never needed
The real bottleneck was never hardware — it was runtime overhead, memory bloat, and GC pauses
How 90% of “big data” workloads fit comfortably on a single modern machine

What We’re Letting Go

Cloud-first architectures for analytical workloads that don’t need them
JVM-based engines (Spark, Flink) where GC pauses destroy latency predictability
The assumption that scaling out is cheaper than scaling smart
Data leaving your machine just to be queried

What We’re Gaining

Zero-copy data pipelines — Arrow buffers shared across DataFusion and Polars without serialization
Deterministic memory management — no GC pauses, predictable performance on every query
Local-first analytics — data stays on your machine, queries run in milliseconds
Rust’s ownership model as a data engineering superpower — memory safety without a runtime tax

Why Only Rust Makes This Possible

Zero-cost abstractions that let you write high-level analytics code with bare-metal performance
No garbage collector — when you’re processing gigabytes of columnar data, GC pauses aren’t a nuisance, they’re a dealbreaker
Zero-copy Arrow interop — DataFusion query results flow directly into Polars transforms without copying a single byte
Fearless concurrency for parallel query execution across cores, not clusters
Compile-time guarantees that eliminate an entire class of runtime failures in data pipelines

The Future: Local-First Data Analytics

Why the pendulum is swinging from cloud-back-to-local for analytical workloads
The emerging Rust-native analytics stack: DataFusion + Polars + Arrow + Iceberg
How WASM extends this to the browser — same Rust engine, every platform
What this means for data privacy, latency, and cost

Talk Format

30-minute talk with live demo

Speaker Bio

Navdeep, Co-Founder of Dataorc and OrcaSheets. Built and scaled Dataorc to 60+ enterprise clients, currently managing critical ONDC infrastructure processing 10+ million transactions daily across 15TB+ of data. At OrcaSheets, building a Rust-native local-first analytics platform using DataFusion, Polars, Arrow, and Tauri — proving that big data analytics doesn’t need big infrastructure.

Target Audience

Rust developers, data engineers, and anyone skeptical of their cloud bill.

Key Takeaway

The next generation of data analytics won’t run on clusters — it’ll run on your laptop, written in Rust.

CFP: RustINDIA Conference 2026

Big Data Analytics on Tiny Machines