The Fifth Elephant

The Fifth Elephant 2025 Annual Conference

Less hype. More engineering.

Jul 2025

14 Mon

15 Tue

16 Wed

17 Thu

18 Fri

19 Sat 08:45 AM – 05:55 PM IST

20 Sun

Bangalore International Centre, Bangalore

Tickets

All submissions

Previous Next

This submission has been added to the schedule

Democratizing Large Model Training with FSDP and QLoRA

Submitted May 30, 2025

I am submitting for: Speaking at the Fifth Elephant 2025 Annual Conference Type of submission: 30 mins talk Choose the topic your submission falls under: Applied AI Engineering & Agentic AI track

Abstract

Training large deep learning models like LLMs and vision transformers has traditionally required high-end GPUs with large memory, making them inaccessible to many. This talk explores how Fully Sharded Data Parallel (FSDP) in PyTorch can help overcome this barrier by enabling large model training and fine-tuning on smaller GPUs (8–16GB), using commodity hardware or affordable cloud credits.

We’ll walk through practical experiments showcasing what’s feasible on constrained setups using FSDP. The session will cover configuration techniques such as mixed precision, CPU offload, and activation checkpointing, while analyzing trade-offs with inter-GPU communication overhead.

We’ll also explore how FSDP pairs with LoRA and QLoRA for memory-efficient fine-tuning.

Introduction & Motivation

The resource challenge in training large models (LLMs, ViTs)
Why this talk: democratizing access to large-model training
What is FSDP?

Fundamentals of Fully Sharded Data Parallel (FSDP)

What gets sharded: parameters, gradients, optimizer states
What gets parallelized: Data
What Becomes Feasible on Small GPUs?

Case studies (planned/early results)

Configuring FSDP
Parameter wrapping strategies and CPU Offloading
PyTorch utilities and best practices

Communication Overhead: The Hidden Cost

Measuring communication time as GPU count increases
Where FSDP starts to hurt and how to mitigate it
Trade-offs in memory, speed, setup complexity

FSDP + PEFT (LoRA and QLoRA)

How parameter-efficient fine-tuning complements FSDP
Use cases where this pairing is especially powerful
Discussion of memory footprint vs training stability

Conclusion and Future Directions

Summary of what’s possible with FSDP today

All submissions

Previous Next

Comments

Jul 2025

14 Mon

15 Tue

16 Wed

17 Thu

18 Fri

19 Sat 08:45 AM – 05:55 PM IST

20 Sun

Hybrid Access Ticket

Hosted by

The Fifth Elephant

Jumpstart better data engineering and AI futures

Supported by

Gold sponsor

Sahaj Software

Sahaj is an artisanal technology services company crafting purpose-built AI and data-led solutions for businesses.

Gold sponsor

Atlassian

Atlassian unleashes the potential of every team. Our agile & DevOps, IT service management and work management software helps teams organize, discuss, and compl

Gold sponsor