The Fifth Elephant 2025 Annual Conference (18 & 19 July)
Less hype. More engineering.
Jul 2025
7 Mon
8 Tue
9 Wed
10 Thu
11 Fri
12 Sat
13 Sun
Jul 2025
14 Mon
15 Tue
16 Wed
17 Thu
18 Fri
19 Sat 08:45 AM – 05:50 PM IST
20 Sun
Submitted May 30, 2025
Training large deep learning models like LLMs and vision transformers has traditionally required high-end GPUs with large memory, making them inaccessible to many. This talk explores how Fully Sharded Data Parallel (FSDP) in PyTorch can help overcome this barrier by enabling large model training and fine-tuning on smaller GPUs (8–16GB), using commodity hardware or affordable cloud credits.
We’ll walk through practical experiments showcasing what’s feasible on constrained setups using FSDP. The session will cover configuration techniques such as mixed precision, CPU offload, and activation checkpointing, while analyzing trade-offs with inter-GPU communication overhead.
We’ll also explore how FSDP pairs with LoRA and QLoRA for memory-efficient fine-tuning.
Hosted by
Supported by
Gold Sponsor
Gold Sponsor
Bronze Sponsor
Community partner
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}