NIC topology and placement for AI: a Rust tool for faster distributed training.

Apr 2026

13 Mon

14 Tue

15 Wed

16 Thu

17 Fri

18 Sat 09:00 AM – 06:00 PM IST

19 Sun 09:00 AM – 06:00 PM IST

NIC topology and placement for AI: a Rust tool for faster distributed training.

Submitted Mar 18, 2026

Session type: 30 mins talk

Distributed AI training runs over RDMA and depends on each GPU using the right NIC. When the scheduler assigns a rank to a NIC that isn’t the best for that GPU (e.g. cross-NUMA or extra PCIe hops), you can lose 20–40% throughput or see unstable runs. Most cluster schedulers don’t understand PCIe/NUMA topology, so we need a small, reliable tool that discovers how NICs and GPUs are connected and outputs placement recommendations—which rank should use which NIC and, optionally, which CPU/NUMA node.

This session walks through building that tool in Rust. We’ll cover where topology comes from (nvidia-smi, sysfs, netlink), how to model NIC–GPU–NUMA affinity and score “best NIC per GPU,” and how to emit placement (e.g. env vars or JSON) for the launcher or scheduler. We’ll show why Rust is a good fit: safe parsing of vendor output, clear data structures for devices and affinity, no unsafe in the policy layer, and a small binary that fits into node agents or sidecars. The same approach applies to any environment where NIC topology and placement matter for AI or HPC.

Key takeways from the session includes:

How to discover and use NIC/GPU topology (nvidia-smi, sysfs, PIX/PXB/SYS) and turn it into rank-to-NIC placement so distributed training gets the best RDMA path.
Why Rust fits topology and placement tooling—safe parsing, clear types, and easy integration with schedulers—and how to structure a minimal but real tool you can extend.

This talk will be relavent to developers and SREs working on AI/ML infrastructure, distributed training, or GPU clusters.

I am senior software engineer at Nutanix Networking team, focused on network acceleration technologies, with experience in high-performance networking and GPU clusters. He is exploring Rust for systems and networking tooling and is interested in NIC topology and placement for AI workloads.

CFP: RustINDIA Conference 2026

NIC topology and placement for AI: a Rust tool for faster distributed training.

Comments