- I will cover some basic maths of sizing up the memory and compute requirements of training and inference of a large language model. Some popular open source models will be used as example.
- Quick brush up of data types.
- In plain English, how are popular quantisation methods working?
- Take an example of a typical computation in a neural network and show what quantisation brings to the table.
- What impact does this make on compute, memory requirements? What is the fine print?
- Why is this important? How can you apply this in your work?
Quantisation has emerged as a significant enabler for large language models (LLMs), making them accessible for companies without extravagant budgets (read: throw money at the problem) and paving the way for edge deployments. This talk delves beyond the basic concept of converting floats to integers. I’ll explain the underlying math that governs the memory and computation requirements, demonstrating how quantisation computations facilitate not only inference but also, potentially, training. Additionally, I will illuminate the cost, computational, and business impacts of quantisation.
- Intuitive yet in-depth comprehension of why quantisation is crucial for training or fine-tuning LLMs.
- What is, roughly, happening in the maths? Where are the trade-offs?
- How does it impact accuracy? What is the evidence for its claims?
- How to make informed quantisation trade-offs, equipping them to exploit LLMs across various use cases effectively.
I have in past hosted many talks on ML/AI at Fifth Elephant and other conferences. These includes hands on workshops and short talks. I’m obsessed about giving a clear understanding of underlying maths fundamentals while also explain the business impact.
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}