Mehant

Mehant

@kmehant

Enhancing language model tuning throughput

Submitted Mar 17, 2025

Abstract

Getting your first language model tuning up and running is great, however, in an enterprise, you would always want to extract more value out of your expensive infrastructure by having more training cycles completed. This session presents various concepts to increase language model tuning throughput. Covering topics (available in open source) from usage of simple to complex knobs down the stack.

Overview of concepts for throughput enhancement

  • Training knobs. e.g. gradient checkpointing + increased batch size etc

  • Faster implementations. e.g. replace attention module with flash attention, padding free

  • Data sampling / collation for efficient load distribution. e.g. multi pack data sampler

  • Sparse techniques + increased batch size. e.g. LoRA, Prompt Tuning

  • Parallelisms. e.g. data parallel techniques like distributed data parallel and fully sharded data parallel (FSDP)

  • Fast kernels. e.g. replace common operations such as cross entropy loss with triton kernel implementations.

  • Torch compile. e.g. compile torch code to kernels

Takeaways

In-depth conceptual awareness and understanding of various knobs pin pointing to specific stack/methods to maximize throughput. Audiences should also be able to get a hang of application of these concepts.

Which audiences is your session going to beneficial for?

Tuning users or AI professionals.

Bio

I am Mehant Kammakomati. I work as a research software engineer at IBM Research - India. I work on language model tuning capabilities aiming to give best tuning experience to our users.

Comments

Login to leave a comment

No comments posted yet

Hosted by

Jump starting better data engineering and AI futures

Supported by

Meet-up sponsor

Nutanix is a global leader in cloud software, offering organizations a single platform for running apps and data across clouds.

Community sponsor