Speed up your LLMs with Triton

Name: Speed up your LLMs with Triton
Start: 2024-07-12T16:00:00+05:30
End: 2024-07-12T17:35:00+05:30
Location: Underline Centre

Workshop

Jul 2024

8 Mon

9 Tue

10 Wed

11 Thu

12 Fri 04:00 PM – 05:35 PM IST

13 Sat

14 Sun

Underline Centre, Bengaluru

Jul 2024

8 Mon

9 Tue

10 Wed

11 Thu

12 Fri 04:00 PM – 05:35 PM IST

13 Sat

14 Sun

Underline Centre, Bengaluru

About the workshop 📚

One way that you make your LLMs faster is through GPU programming. Therefore, it becomes important to understand Triton, a programming language for writing efficient GPU code. Getting started with GPU programming can be challenging as there are very scattered resources.
This hands-on workshop aims to help participants:

Understand the the fundamentals of GPU architecture and its programming model.
Grasp the core concepts of Triton.
Improve your model speed.

By the end of the workshop, participants will be able to optimize LLMs for better performance.

The workshop is of 90 mins duration.
This is an in-person workshop.
Only 35 seats are open for workshop participation.
Recording of the workshop will be made available for The Fifth Elephant members.

Important prerequisites to attend the workshop 📝

Background in how deep learning models are trained, and a basic understanding of model inferencing.
Basic understanding of the architecture of popular deep-learning models.
Participants must have a notebook (with GPU support) ready with the following packages installed:
● PyTorch: Start Locally | PyTorch
● Open AI Triton: Installation — Triton documentation

If the GPU is not available in the laptop, participants can set it up in the Google Colab notebook.

Workshop outline 🗂️

Understanding GPU hardware.
Understanding the GPU programming model.
Coding a matmul kernel faster than PyTorch in Triton.
How to go from here to implement a complete model from scratch in Triton (HuggingFace free).

Who should attend this workshop 👨 💻

Data Scientists
ML engineers
AI engineers
Early career researchers

How will participants benefit from the workshop 🎓

Participants will gain a deeper understanding of how GPUs work and learn to write performant code for GPUs.
This interactive workshop will help them improve the throughput or reduce the latency of their existing models.
Participants can implement their models from scratch, often rivalling HuggingFace’s implementation in latency and throughput.

Videos

See all

Speed up your LLMs with Triton

2 hours12 July 2024

Venue

Underline Centre

3rd Floor, above Blue Tokai

24, 3rd A Cross, 1st Main Rd, Domlur

Bengaluru - 560071

Karnataka, IN

Loading…

Hosted by

The Fifth Elephant

Jump starting better data engineering and AI futures

Speed up your LLMs with Triton

About the workshop 📚

Important prerequisites to attend the workshop 📝

Workshop outline 🗂️

Who should attend this workshop 👨 💻

How will participants benefit from the workshop 🎓

About the instructor 🧑 🏫

How to register

Contact information ☎️

Videos

Venue

Loading…

Videos

Related events

The Fifth Elephant 2024 Annual Conference (12th &13th July): Maximising the Potential of Data — Discussions around data science, machine learning & AI

Venue

Loading…