Updates

About the workshop

This workshop will provide a comprehensive understanding of LlamaIndex and how to utilize Large Language Models (LLMs) along with the LlamaIndex toolkit to build a variety of custom data-driven applications. We’ll focus on leveraging the Retrieval Augmented Generation (RAG) paradigm to create powerful systems such as Q&A systems, chatbots, and data agents. A core component of the workshop will be exploring how LlamaIndex serves as a crucial bridge between LLMs and your custom data.

The Bangalore edition of the workshop was held on 26 August 2023. View participant feedback and discussion about the workshop here.

Who should participate?

This workshop is designed for data scientists, Machine Learning (ML) engineers, and researchers interested in developing applications powered by language models. Prior knowledge of language models and some programming experience, preferably in Python, will be beneficial.

Key takeaways - what participants will learn from the workshop

Here is the detailed outline for each module of the workshop.

Module 1: Grasping the RAG Paradigm and LlamaIndex Framework

  1. Introduction to the Retrieval-Augmented Generation (RAG) paradigm
  2. Importance and applications of RAG
  3. Introduction to Large Language Models (LLMs)
  4. Overview of the LlamaIndex Framework
  5. Significance and use cases
  6. Delving into LlamaIndex’s Components
    a. Data Loaders (LlamaHub)
    b. Indexing
    c. Retriever and Response Synthesis
    d. Query Engine/Chat Engine

Module 2: Delving into RAG Use Cases

  1. QA Systems and Summarisation System
  2. Router Engine for routing the queries
  3. SubQuestion Query Engine for document comparisons

Module 3: Customising RAG with Open-source LLMs and Embeddings

  1. Customizing with Service Context
    a. Chunk size
    b. Chunk overlap
    c. Open-source LLMs
    d. Embeddings
  2. RAG with Open-source LLMs and Embeddings

Module 4: Managing Metadata in the RAG System Creation

  1. Importance of metadata management
  2. Techniques and tools for metadata management in RAG systems

Module 5: Evaluating RAG system.

  1. Response Evaluation.
  2. Retrieval Evaluation.

Module 6: Enhancing RAG through Fine-tuning Embeddings

  1. Introduction to fine-tuning and its importance
  2. Discussing various fine-tuning techniques and their impact on RAG systems

Module 7: Constructing a Text2SQL System

  1. Understanding Text2SQL
  2. Text2SQL over multiple tables

Project: Develop a Fine-tuned RAG System with LlamaIndex

  1. In-depth guidance on developing a fine-tuned RAG system

Duration of the workshop

The workshop will be held online. The duration is 8 hours long, including breaks. The workshop has both theoretical and practical sessions.

Prior set-up and prep required by participants before coming to the workshop

Participants should have:

  1. Basic knowledge of Python programming and familiarity with language models.
  2. Our session will be conducted on Google Colab, so please ensure you have access to Google Colab.
  3. We’ll be utilizing GPT-based models (gpt3.5-turbo and gpt-4) for building applications with LlamaIndex, so having an OpenAI API key will be essential.

About the workshop instructor(s)

The workshop will be conducted by Ravi Theja - a Data Scientist at Glance-InMobi, who holds a Master’s degree in Computer Science from IIIT-B and has published research in the field. The instructor is recognized for his open-source contributions to LlamaIndex, bringing practical insights from his contributions and industry experience to the workshop.

About The Fifth Elephant

The Fifth Elephant is a community-funded organization. If you like the work that The Fifth Elephant does, and want to support meet-ups and activities in different cities in India, consider contributing by picking up a membership

Contact

Join The Fifth Elephant Telegram group at https://t.me/fifthel or follow @fifthel on Twitter.
For inquiries, contact The Fifth Elephant at +91-7676332020.

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more

{{ gettext('Draft') }}

Hosted by

The Fifth Elephant - known as one of the best data science and Machine Learning conference in Asia - has transitioned into a year-round forum for conversations about data and ML engineering; data science in production; data security and privacy practices. more