The Fifth Elephant 2025 Annual Conference CfP

The Fifth Elephant 2025 Annual Conference CfP

Speak at The Fifth Elephant 2025 Annual Conference

Preethi Srinivasan

Preethi Srinivasan

@3pi

Scaling Down to Scale Up: FSDP for Training Large Models on Small GPUs

Submitted May 30, 2025

Abstract

Training large deep learning models like LLMs and vision transformers has traditionally required high-end GPUs with large memory, making them inaccessible to many. This talk explores how Fully Sharded Data Parallel (FSDP) in PyTorch can help overcome this barrier by enabling large model training and fine-tuning on smaller GPUs (8–16GB), using commodity hardware or affordable cloud credits.

We’ll walk through practical experiments showcasing what’s feasible on constrained setups using FSDP. The session will cover configuration techniques such as mixed precision, CPU offload, and activation checkpointing, while analyzing trade-offs with inter-GPU communication overhead.

We’ll also explore how FSDP pairs with LoRA and QLoRA for memory-efficient fine-tuning.

Introduction & Motivation

  1. The resource challenge in training large models (LLMs, ViTs)
  2. Why this talk: democratizing access to large-model training
  3. What is FSDP?

Fundamentals of Fully Sharded Data Parallel (FSDP)

  1. What gets sharded: parameters, gradients, optimizer states
  2. What gets parallelized: Data
  3. What Becomes Feasible on Small GPUs?

Case studies (planned/early results)

  1. Configuring FSDP
  2. Parameter wrapping strategies and CPU Offloading
  3. PyTorch utilities and best practices

Communication Overhead: The Hidden Cost

  1. Measuring communication time as GPU count increases
  2. Where FSDP starts to hurt and how to mitigate it
  3. Trade-offs in memory, speed, setup complexity

FSDP + PEFT (LoRA and QLoRA)

  1. How parameter-efficient fine-tuning complements FSDP
  2. Use cases where this pairing is especially powerful
  3. Discussion of memory footprint vs training stability

Conclusion and Future Directions

  1. Summary of what’s possible with FSDP today

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Jump starting better data engineering and AI futures