Ishan

Over 6M records per second: Building data streaming logic for Nautilus trading engine

Submitted Mar 8, 2026

Abstract

Trading systems replay hundreds of GBs to test trading strategies. The data streaming layer plays a key role in reading data from storage in a correct, fast and efficient way. A good data streaming layer is the difference between being able to backtest hundreds of strategies vs a few handful.

In this talk, I’ll take you through the journey of Nautilus’s data streaming layer from reading records at 300k/s to 6M/s. We’ll discuss

  • Data storage layer
    • Parquet data storage format for trading data
    • Mental models for storing and querying trading data with filters and partitions
    • Key ecosystem libraires and picking the right one
  • Streaming layer
    • Reading data in ascending order of timestamps
    • Merging data across partitions in ascending order
    • Testing for high degree of correctness
  • War stories
    • Oh no memory blowing up and using bytehound to identify the issue
    • Oh no fixing the memory issue messed up the performance
    • Finding the right balance

Takeaways

  • Challenges and design decisions for high performance data streaming for parquet files
  • Tools and mental models for debugging and fixing memory and performance issues

Audience

Platform and performance engineers and system architects who build and maintain fast systems that go brrr will enjoy this.

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

A community of Rust language contributors and end-users from Bangalore. We have presence on the following telegram channels https://t.me/RustIndia https://t.me/fpncr LinkedIn: https://www.linkedin.com/company/rust-india/ more