The Fifth Elephant 2025 Annual Conference CfP
Speak at The Fifth Elephant 2025 Annual Conference
Submitted May 30, 2025
Training large deep learning models like LLMs and vision transformers has traditionally required high-end GPUs with large memory, making them inaccessible to many. This talk explores how Fully Sharded Data Parallel (FSDP) in PyTorch can help overcome this barrier by enabling large model training and fine-tuning on smaller GPUs (8–16GB), using commodity hardware or affordable cloud credits.
We’ll walk through practical experiments showcasing what’s feasible on constrained setups using FSDP. The session will cover configuration techniques such as mixed precision, CPU offload, and activation checkpointing, while analyzing trade-offs with inter-GPU communication overhead.
We’ll also explore how FSDP pairs with LoRA and QLoRA for memory-efficient fine-tuning.
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}