This workshop will provide a comprehensive understanding of LlamaIndex and how to utilize Large Language Models (LLMs) along with the LlamaIndex toolkit to build a variety of custom data-driven applications. We’ll focus on leveraging the Retrieval Augmented Generation (RAG) paradigm to create powerful systems such as Q&A systems, chatbots, and data agents. A core component of the workshop will be exploring how LlamaIndex serves as a crucial bridge between LLMs and your custom data.
The Bangalore edition of the workshop was held on 26 August 2023. View participant feedback and discussion about the workshop here.
This workshop is designed for data scientists, Machine Learning (ML) engineers, and researchers interested in developing applications powered by language models. Prior knowledge of language models and some programming experience, preferably in Python, will be beneficial.
Here is the detailed outline for each module of the workshop.
- Introduction to the Retrieval-Augmented Generation (RAG) paradigm
- Importance and applications of RAG
- Introduction to Large Language Models (LLMs)
- Overview of the LlamaIndex Framework
- Significance and use cases
- Delving into LlamaIndex’s Components
a. Data Loaders (LlamaHub)
c. Retriever and Response Synthesis
d. Query Engine/Chat Engine
- QA Systems and Summarisation System
- Router Engine for routing the queries
- SubQuestion Query Engine for document comparisons
- Customizing with Service Context
a. Chunk size
b. Chunk overlap
c. Open-source LLMs
- RAG with Open-source LLMs and Embeddings
- Importance of metadata management
- Techniques and tools for metadata management in RAG systems
- Response Evaluation.
- Retrieval Evaluation.
- Introduction to fine-tuning and its importance
- Discussing various fine-tuning techniques and their impact on RAG systems
- Understanding Text2SQL
- Text2SQL over multiple tables
- In-depth guidance on developing a fine-tuned RAG system
The workshop will be held online. The duration is 8 hours long, including breaks. The workshop has both theoretical and practical sessions.
Participants should have:
- Basic knowledge of Python programming and familiarity with language models.
- Our session will be conducted on Google Colab, so please ensure you have access to Google Colab.
- We’ll be utilizing GPT-based models (gpt3.5-turbo and gpt-4) for building applications with LlamaIndex, so having an OpenAI API key will be essential.
The workshop will be conducted by Ravi Theja - a Data Scientist at Glance-InMobi, who holds a Master’s degree in Computer Science from IIIT-B and has published research in the field. The instructor is recognized for his open-source contributions to LlamaIndex, bringing practical insights from his contributions and industry experience to the workshop.
The Fifth Elephant is a community-funded organization. If you like the work that The Fifth Elephant does, and want to support meet-ups and activities in different cities in India, consider contributing by picking up a membership