Srihari

Srihari

@haleshot

CocoIndex: incrementally processing data with a Rust core and a Python face

Submitted Mar 12, 2026

Session description

CocoIndex is a framework for data pipelines that stay in sync with their sources without reprocessing everything on every run. The engine tracks what changed across runs, skips what hasn’t, and applies only the necessary updates. From the outside, writing pipelines looks like Python. You declare target states and the runtime handles the rest. The engine doing that work is Rust. I’ve been using CocoIndex and contributing to it; have landed PRs in the Rust layer, including a structured error system for tunneling Python exceptions through Rust and migrating the logging stack to the tracing crate. That experience shapes this talk: what the Rust core actually does, and why the incremental processing problem is one where Rust earns its place.

The engine needs to do a few things well. It tracks processing component identity across runs using stable path keys, so it always knows what to recompute and what to clean up. Rust’s ownership model is part of why this works cleanly: in-memory state is managed deterministically, no garbage collector involved, and the engine’s reconciliation logic takes care of deleting target states when component paths disappear. Inputs are fingerprinted to skip unchanged work, and so is code: Python hashes each memoized function’s bytecode, and Rust’s global logic registry (a RwLock<HashSet<Fingerprint>>) invalidates the cache the moment that fingerprint changes. Memoization itself operates in two tiers: a cheap fingerprint check first, full validation second. Writes to the LMDB state store get batched for performance; LMDB only allows one RW transaction at a time, so batching amortizes that cost across multiple writes (Rust’s type system enforces the ownership of those closures without needing runtime locks everywhere.). Concurrent processing runs through Tokio with no GIL in the picture, and PyO3 bridges Python’s asyncio event loop with the Tokio runtime across the boundary (unlike raw C extensions where type errors at the boundary tend to surface as segfaults, the Rust side is statically typed so Rust-side errors are compile-time failures.). The CocoIndex blog on Rust’s ownership model gets into why this fits data engines particularly well. There’s also a community contributor actively building a native Rust SDK right now. Towards the end, I’ll connect this to something I’ve been noticing for a while: Python developers have been running Rust without thinking about it (Ruff, uv, Polars, Pydantic v2 all have Rust cores behind their Python APIs), and there’s now a wave of full rewrites too: Zerobrew for (homebrew), prek (pre-commit), Turso (SQLite). CocoIndex is part of that story, and the talk ends by asking what all of these projects are reaching for.

Takeaways

You’ll leave with a real picture of how CocoIndex’s engine works: what stable paths, fingerprinting, write batching, and the PyO3 bridge are each doing and why those specific choices were made.

You’ll also come away with a clearer sense of the pattern behind tools like Ruff, uv, and the growing list of OSS rewrites, and what the incremental processing problem specifically has to do with it.

Target audience

Rust developers who want to see PyO3 in a real production project, and anyone building or thinking about building tools with a Rust core. The talk is also relevant if you work on data pipelines and have wondered whether Rust is worth reaching for.

Bio

Srihari Thyagarajan is a Technical Writer at Deepnote, previously doing educational dev-rel and advocacy at marimo. He works on documentation, developer tooling integrations, tutorials, and community partnerships. He contributes to CocoIndex and helps co-organize SciPy India. He has been attending and speaking at open-source events across India, including PyCon India and IndiaFOSS.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

A community of Rust language contributors and end-users from Bangalore. We have presence on the following telegram channels https://t.me/RustIndia https://t.me/fpncr LinkedIn: https://www.linkedin.com/company/rust-india/ more