The Fifth Elephant

The Fifth Elephant 2024 Annual Conference (12th &13th July)

Maximising the Potential of Data — Discussions around data science, machine learning & AI

Jul 2024

8 Mon

9 Tue

10 Wed

11 Thu

12 Fri

13 Sat 09:00 AM – 06:05 PM IST

14 Sun

Bangalore International Centre, Bangalore

All submissions

Previous Next

Triton, the hard way!

Submitted Jun 3, 2024

Session type: Workshop

Abstract

A lot of engineers are interested in using LLMs nowadays. However, its efficient execution remains a challenge. Efficient execution is key to mainstream adoption. To run them efficiently, we need accelerated systems such as GPU. This talk will explore the fundamentals of GPU architecture and its programming model, moving beyond model.to('cuda') to understand the inner workings of GPUs. Attendees will gain insights into GPU internals and engage in a hands-on workshop using Triton, a programming language for writing efficient GPU code. We’ll cover core Triton concepts and implement essential operations for modern LLMs, empowering participants to optimize their models for better performance.

References

Who is the audience for your session?

The ideal audience for my talk/workshop includes Data Scientists, ML engineers, and AI engineers aiming to enhance their model’s speed and maximize GPU usage. They are interested in learning Open AI’s Triton. Additionally, this session is suited for early-career researchers experimenting with their custom architectures and who want to improve the speed of their models.

What problem/pain are your trying to solve (for the audience)?

Getting started with GPU programming can be challenging, especially with scatter resources. It makes it tough to understand how to start learning GPU programming. It is even tougher for Triton for which there are very limited resources. This talk aims to help the audience understand the basics of the GPU programming model and grasp the core concepts of Triton, making it easier to begin and improve their model speed. It will also cover the next steps in becoming better at GPU programming.

What will be the scope of your session?

Here is how I am planning to split my talk:

Part 1 - [10 mins] Understanding GPU hardware
Part 2 - [10 mins] Understanding the GPU programming model
Part 3 - [40 mins] Coding a matmul kernel faster than PyTorch in Triton
Part 4 - [10 mins] How to go from here to implement a complete model from scratch in Triton (HuggingFace free!)

The workshop will cover the basics of GPU hardware, programming models, and a hands-on walkthrough of a few kernels in Trition.

How will participants benefit from your session?

Participants will gain a deeper understanding of how GPUs work and learn to write performant code for GPUs. This knowledge will help them improve the throughput or reduce the latency of their existing models. The participants can then go ahead and implement their models from scratch, often rivaling HuggingFace’s implementation in latency and throughput.

What is the appropriate format for your session?

We should have a workshop-based session lasting from 60-90 minutes. I would want to project a paper and pen to draw the actual computations in front of the audience and then present my IDE to code them live.

The coding session will be interactive and I would like it if people could code along with me.

All submissions

Previous Next

Comments

Jul 2024

8 Mon

9 Tue

10 Wed

11 Thu

12 Fri

13 Sat 09:00 AM – 06:05 PM IST

14 Sun

Hosted by

The Fifth Elephant

Jump starting better data engineering and AI futures

Supported by

Gold Sponsor

Atlassian

Atlassian unleashes the potential of every team. Our agile & DevOps, IT service management and work management software helps teams organize, discuss, and compl

Silver Sponsor

Google

Together, we can build for everyone.

Workshop sponsor

Datastax

Datastax, the real-time AI Company.

Lanyard Sponsor

Uber

We reimagine the way the world moves for the better.

Sponsor

Monster API

MonsterAPI is an easy and cost-effective GenAI computing platform designed for developers to quickly fine-tune, evaluate and deploy LLMs for businesses.

Community Partner

FOSS United Foundation

FOSS United is a non-profit foundation that aims at promoting and strengthening the Free and Open Source Software (FOSS) ecosystem in India. more

Beverage Partner

BONOMI

BONOMI is a ready to drink beverage brand based out of Bangalore. Our first segment into the beverage category is ready to drink cold brew coffee.