Tickets

Loading…

Preethi Srinivasan

Preethi Srinivasan

@3pi

Scaling Down to Scale Up: FSDP for Training Large Models on Small GPUs

Submitted May 30, 2025

Abstract

Training large deep learning models like LLMs and vision transformers has traditionally required high-end GPUs with large memory, making them inaccessible to many. This talk explores how Fully Sharded Data Parallel (FSDP) in PyTorch can help overcome this barrier by enabling large model training and fine-tuning on smaller GPUs (8–16GB), using commodity hardware or affordable cloud credits.

We’ll walk through practical experiments showcasing what’s feasible on constrained setups using FSDP. The session will cover configuration techniques such as mixed precision, CPU offload, and activation checkpointing, while analyzing trade-offs with inter-GPU communication overhead.

We’ll also explore how FSDP pairs with LoRA and QLoRA for memory-efficient fine-tuning.

Introduction & Motivation

  1. The resource challenge in training large models (LLMs, ViTs)
  2. Why this talk: democratizing access to large-model training
  3. What is FSDP?

Fundamentals of Fully Sharded Data Parallel (FSDP)

  1. What gets sharded: parameters, gradients, optimizer states
  2. What gets parallelized: Data
  3. What Becomes Feasible on Small GPUs?

Case studies (planned/early results)

  1. Configuring FSDP
  2. Parameter wrapping strategies and CPU Offloading
  3. PyTorch utilities and best practices

Communication Overhead: The Hidden Cost

  1. Measuring communication time as GPU count increases
  2. Where FSDP starts to hurt and how to mitigate it
  3. Trade-offs in memory, speed, setup complexity

FSDP + PEFT (LoRA and QLoRA)

  1. How parameter-efficient fine-tuning complements FSDP
  2. Use cases where this pairing is especially powerful
  3. Discussion of memory footprint vs training stability

Conclusion and Future Directions

  1. Summary of what’s possible with FSDP today

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hybrid Access Ticket

Hosted by

Jump starting better data engineering and AI futures

Supported by

Gold Sponsor

Sahaj is an artisanal technology services company crafting purpose-built AI and data-led solutions for businesses.

Gold Sponsor

Atlassian unleashes the potential of every team. Our agile & DevOps, IT service management and work management software helps teams organize, discuss, and compl

Gold Sponsor

Together, we can build for everyone.

Bronze sponsor & Swag sponsor

AI-Powered Upskilling for Modern Data Professionals

Bronze Sponsor

Thoughtworks is a pioneering global technology consultancy, leading the charge in custom software development and technology innovation.

Community partner

Grace Hopper Celebration India 2025, hosted by AnitaB.org India, is Asia’s largest gathering of women and allies in technology.

Community partner

Bengaluru Systems Meetup