Deploying ML models to edge devices often means integrating multiple platform-specific runtimes such as CoreML, OpenVINO, DirectML, and lightweight mobile engines. This talk explores how Rust can serve as a portable systems layer for building and shipping cross-platform ML inference while keeping performance overhead low.
Talk Overview
-
Rust as the portability layer
- Using Rust’s cross-platform toolchain to target multiple OS and hardware platforms.
- Designing a dead-simple unified inference interface around models and tensor APIs, while supporting multiple runtime backends.
-
Rust’s FFI story in practice
- Integrating with C and C++ runtimes such as OpenVINO and MNN.
- Bridging into Swift and Objective-C for CoreML.
- Managing ABI compatibility and building safe wrappers around system APIs.
-
Minimizing overhead
- Avoiding unnecessary memory copies across FFI boundaries.
- Carefully managing tensor memory layouts to prevent allocation overhead.
- Using efficient caching strategies to keep inference fast on edge devices.
-
The realities of raw FFI
- Of course, the theory rarely survives first contact with system libraries.
- We’ll cover the practical “horrors” of raw FFI and platform integrations: undocumented behaviors, fragile bindings, and surprising platform semantics—such as macOS copy-on-write allocations appearing where you least expect them.
-
Drivers, runtimes, and platform quirks
- Finicky GPU drivers and runtime-specific constraints.
- Operator support differences and backend limitations.
- How portable inference becomes a balancing act between performance, safety, and platform compatibility.
This talk shares practical lessons from building and shipping a real cross-platform inference system in Rust, focusing on the systems engineering challenges behind such portable ML deployment.
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}