The Fifth Elephant

The Fifth Elephant 2024 Annual Conference (12th &13th July)

Maximising the Potential of Data — Discussions around data science, machine learning & AI

Jul 2024

8 Mon

9 Tue

10 Wed

11 Thu

12 Fri

13 Sat 09:00 AM – 06:05 PM IST

14 Sun

Bangalore International Centre, Bangalore

All submissions

Previous Next

RAG Vs Fine-Tuning: Implementation Anecdotes from Data Catalog Enrichment Solution

Submitted Jun 12, 2024

Session type: 30 mins talk

Abstract

This talk will take the audience through our experience from building a content generation solution for data catalog enrichment effort from modeling perspective (RAG based pre-trained model & RAG based FineTuned model).

For this use-case, I will talk about the approach taken to

Understand the data inputs in prompt
Enrich the prompt.
Construct a few-shot setup using RAG.
Finetuning Llama Vs Pretrained Llama, GPT3.5-turbo
Multiple evaluation metrics from monitoring and governance perspective.

I will talk about finetuning details suing LORA technique and will also compare the results from 3 models namely few-shot pretrained Llama2-13B, few-shot finetuned Llama2-7B and GPT-3.5 turbo.

The talk will draw various insights about behaviour of these models in terms of content generation. This also includes accuracy (with ground truth), alignment (factual consistency with prompt inputs) and toxicity detection.

Use-case

Enterprise Data Catalog is a large effort in any enterprise to keep curated meta-data about the data for the user reference. This includes majorly writing descriptions about the tables and its columns for business consumption. This has always been a manual effort.

Here, we are talking about hundreds of database schemas, thousands of database tables and millions of columns in data-catalog. Often curated content is merely 3-5%. The objective is to enrich the data catalog using AI solution.

Intended Audience

This talk is intended for data enginners, data scientists or researchers in GenAI space and wants to understand model behaviour in different construct (RAG, Finetuning etc).

This talk is intended for data leaders, data stewards, data SMEs who are closer to enterprise data. The topic might interests as an initiative to enrich meta-data of data catalogs for enterprises.

In general, The talk will align to any professional working on Gen AI usecase with python.

Outline

Defineing the scope of Use-Case
LLM Solution - Design and Implementation
Prompt Enggineering
- Prompt Enrichment
- Few-shot using RAG
Implemented Models: Finetuned Llama2-7B, Llama2-13B, GPT3.5 Turbo
Evaluation Metrics and Interpretation
- Bert-Score F1 (Accuracy)
- Factual Consistency Score (Alignement)
- Toxicity Detection
Learning & Challenges - Lessons

Impact

The solution improve meta-data coverage and intend to enrich the catalog by 25% from 3-5% (curated and prepared zones).
It saves time for data stewards to quickly curate content by 50% (notional).
It improves search over data catalog.

All submissions

Previous Next

Comments

Jul 2024

8 Mon

9 Tue

10 Wed

11 Thu

12 Fri

13 Sat 09:00 AM – 06:05 PM IST

14 Sun

Hosted by

The Fifth Elephant

Jump starting better data engineering and AI futures

Supported by

Gold Sponsor

Atlassian

Atlassian unleashes the potential of every team. Our agile & DevOps, IT service management and work management software helps teams organize, discuss, and compl

Silver Sponsor

Google

Together, we can build for everyone.

Workshop sponsor

Datastax

Datastax, the real-time AI Company.

Lanyard Sponsor

Uber

We reimagine the way the world moves for the better.

Sponsor

Monster API

MonsterAPI is an easy and cost-effective GenAI computing platform designed for developers to quickly fine-tune, evaluate and deploy LLMs for businesses.

Community Partner

FOSS United Foundation

FOSS United is a non-profit foundation that aims at promoting and strengthening the Free and Open Source Software (FOSS) ecosystem in India. more

Beverage Partner

BONOMI

BONOMI is a ready to drink beverage brand based out of Bangalore. Our first segment into the beverage category is ready to drink cold brew coffee.