Speed up your LLMs with Triton

About the workshop 📚

One way that you make your LLMs faster is through GPU programming. Therefore, it becomes important to understand Triton, a programming language for writing efficient GPU code. Getting started with GPU programming can be challenging as there are very scattered resources.
This hands-on workshop aims to help participants:

  • Understand the the fundamentals of GPU architecture and its programming model.
  • Grasp the core concepts of Triton.
  • Improve your model speed.

By the end of the workshop, participants will be able to optimize LLMs for better performance.

  • The workshop is of 90 mins duration.
  • This is an in-person workshop.
  • Only 35 seats are open for workshop participation.
  • Recording of the workshop will be made available for The Fifth Elephant members.

Important prerequisites to attend the workshop 📝

  1. Background in how deep learning models are trained, and a basic understanding of model inferencing.
  2. Basic understanding of the architecture of popular deep-learning models.
  3. Participants must have a notebook (with GPU support) ready with the following packages installed:
    ● PyTorch: Start Locally | PyTorch
    ● Open AI Triton: Installation — Triton documentation

If the GPU is not available in the laptop, participants can set it up in the Google Colab notebook.

Workshop outline 🗂️

  • Understanding GPU hardware.
  • Understanding the GPU programming model.
  • Coding a matmul kernel faster than PyTorch in Triton.
  • How to go from here to implement a complete model from scratch in Triton (HuggingFace free).

Who should attend this workshop 👨 💻

  • Data Scientists
  • ML engineers
  • AI engineers
  • Early career researchers

How will participants benefit from the workshop 🎓

  • Participants will gain a deeper understanding of how GPUs work and learn to write performant code for GPUs.
  • This interactive workshop will help them improve the throughput or reduce the latency of their existing models.
  • Participants can implement their models from scratch, often rivalling HuggingFace’s implementation in latency and throughput.

About the instructor 🧑 🏫

Romit works at Meraki Labs as an AI engineer on text-to-speech models and LLM inference optimizations.

How to register

This workshop is free to attend for The Fifth Elephant members or The Fifth Elephant Conference ticket buyers.

This workshop is open to 35 participants only. Seats will be available on first-come-first-serve basis. RSVP to secure a seat. 🎟️

Contact information ☎️

For inquiries about the workshop, contact +91-7676332020 or write to info@hasgeek.com

Videos

See all
Speed up your LLMs with Triton

Speed up your LLMs with Triton

2 hours12 July 2024

Venue

Underline Centre

3rd Floor, above Blue Tokai

24, 3rd A Cross, 1st Main Rd, Domlur

Bengaluru - 560071

Karnataka, IN

Loading…

Hosted by

Jump starting better data engineering and AI futures