Demystifying Quantisation in Large Language Models in Plain English with Basic Math

Submitted Jun 28, 2023

Demystifying Quantisation in Large Language Models in Plain English with Basic Math

What will I cover

I will cover some basic maths of sizing up the memory and compute requirements of training and inference of a large language model. Some popular open source models will be used as example.
Quick brush up of data types.
In plain English, how are popular quantisation methods working?
Take an example of a typical computation in a neural network and show what quantisation brings to the table.
What impact does this make on compute, memory requirements? What is the fine print?
Why is this important? How can you apply this in your work?

Why is this topic is important?

Quantisation has emerged as a significant enabler for large language models (LLMs), making them accessible for companies without extravagant budgets (read: throw money at the problem) and paving the way for edge deployments. This talk delves beyond the basic concept of converting floats to integers. I’ll explain the underlying math that governs the memory and computation requirements, demonstrating how quantisation computations facilitate not only inference but also, potentially, training. Additionally, I will illuminate the cost, computational, and business impacts of quantisation.

What can audience learn from it?

Intuitive yet in-depth comprehension of why quantisation is crucial for training or fine-tuning LLMs.
What is, roughly, happening in the maths? Where are the trade-offs?
How does it impact accuracy? What is the evidence for its claims?
How to make informed quantisation trade-offs, equipping them to exploit LLMs across various use cases effectively.

Past Experience

I have in past hosted many talks on ML/AI at Fifth Elephant and other conferences. These includes hands on workshops and short talks. I’m obsessed about giving a clear understanding of underlying maths fundamentals while also explain the business impact.

Profile Links

All submissions

Previous Next

Comments

Aug 2023

7 Mon

8 Tue

9 Wed

10 Thu

11 Fri 09:00 AM – 06:00 PM IST

12 Sat

13 Sun

Hybrid access (members only)

Hosted by

The Fifth Elephant

Jump starting better data engineering and AI futures

Supported by

LlamaIndex

E2E Networks Limited

E2E Cloud is India's first AI hyper scaler, a cloud computing platform providing accelerated cloud-based solutions at maximum optimization and lowest pricing

The Fifth Elephant 2023 Monsoon