GenerativeAI July Meetup

Meet and Greet Humans Talking about Generation using AI



Developing User-Friendly LLMs: Introduction to Supervised and Reinforcement Finetuning

Submitted Jul 9, 2023

Three years ago, OpenAI’s GPT-3 model created a buzz in the Machine Learning community and garnered some media attention. However, it had limited impact on regular users. In contrast, the launch of chatGPT two and a half years later became a viral sensation. This was due to the user-friendly and enjoyable experience it provided. In this talk, we’ll explore model finetuning approaches that contributed to the excitement around chatGPT. We’ll cover when and who should do this finetuning and give a quick overview of notebooks used to customize Large Language Models- LLMs for specific purposes.

Developing a system like chatGPT involves three stages: pretraining the base language model, supervised finetuning, and finetuning with Reinforcement Learning. The first stage is crucial for good performance, but it’s expensive ($100,000 - $10,000,000) and requires deep expertise. On the other hand, finetuning existing models is cheaper ($100 - $1,000) and easier. With the availability of many open source large language models, we’ll show hackers how to customize them for their specific needs. We’ll also discuss different situations and the corresponding finetuning strategies.

This talk is mainly geared towards Machine Learning engineers, as we will go through some code snippets. However, we will also discuss high-level concepts to make it useful for Generative AI enthusiasts.

Personal Bio:

  • I have been training Deep Learning models using Supervised Learning and Reinforcement Learning since 2016.
  • Over the years, I’ve trained Deep Reinforcement Learning agents to play games like Doom and Overcooked, as well as execute equity trades efficiently.
  • In 2021, I finetuned the T5 language model for my startup, enabling it to generate questions and answers from paragraphs.
  • I also write about training Large Language Models on my blog at

Slides of the talk are at-


