Tickets

Loading…

Anustup Mukherjee

Anustup Mukherjee

@anustup900

Prateek Gupte

@superprat

Journey of Finetuning Open Source Stable Diffusion Models

Submitted May 29, 2025

Abstract:
AI Images are revolutionizing the world from multiple perspectives which includes High Quality creative art, fashion images, animes and much more. Diffusion models are the heart of these use cases. Businesses are revolving around this technology to create niche use cases, However adapting these general purpose models specific use cases like : Can a Diffusion model generate an Indo-Nepalese woman image with correct aesthetic , are always difficult. It needs conventional fine-tuning methods.

Though conventional full finetuning is way difficult, as it has multiple pillars which need to be perfect. This talk will focus on how we at Caimera finetuning SDXL Diffusion Model from scratch to solve Indian Fashion use-cases, also to solve the void of information present around finetuning. The audience will understand what are the steps to finetune a model, what things need to be taken care of, and some best practices from our learnings.

Agenda:

  • Why to finetune a Model?

    • Understanding when a business needs to full-finetune a Foundation model from scratch without training a basic LoRa or solving the use case by any other solution like prompting? Finetuning being a GPU melting job, this answer needs to be addressed first.
    • We will deep dive into what to expect when you finetune a model, what can be fixed and what cannot ? Most of the people end up finetuning to replicate a style for example If I am expecting a model to re-generate my own images, then I will be needing a LoRa training not a full finetune. But when I am trying to make the model learn a full concept or a domain of information I will be needing finetuning. Will showcase why we at Caimera finetuned a model.
  • Gate to Finetuning Data collection

    • Data being always the heart of any finetuning approach, if we train on a garbage dataset we will expect garbage as output.
    • We will discuss our approach of how we understood which problems we tried to solve using finetuning and built a dataset. This will include data collection processes from different web sources, how to select a dataset and deal with the copyrights and dealing with what breadth of diversity in the Images we should put.
    • How to increase the real world data with synthetic images, what are the steps involved there and how we answered the question of ratio between real world vs synthetic images.
    • Comparison Demo on results of training with different data collection methods
  • Data Pre-Processing:

    • Datasets scraped from sources have multiple issues mainly around quality and representation of the concept in the dataset for which we are trying to finetune. This is the key-step to remove garbage from a dataset.
    • Will talk about our approaches and how we handled it.
    • Comparison Demo on results of training with different data processing - techniques.
  • Captioning is the Key:

    • We will talk about the experience that we encountered around captioning and how it impacts.
    • What are the best practices to caption the dataset, how to represent information in a best way so that the model’s Text encoders learn it in a better way.
    • A comparative analysis with results of Manual vs Auto Captioning using LLM’s.
    • Comparison Demo on results of training with different captioning strategy and what worked best.
  • How to choose which model to fine tune and best configuration to finetune:

    • This will include what questions need to be answered to choose a Base model which needs to be finetuned, it will mostly be around quality and model architecture.
    • We will talk about Trainers present in Open source like Kohya-SS, OneTrainer , SimpleTuner and Diffusers which can be used to train. Which one we used and theoretical reasoning behind using and choosing a best trainer for a use case.
    • Importance of getting a perfect Configuration to train. Theoretical understanding of impact and working of Learning Rates, Optimizers, Loss Functions and Network Dimensions and how they impact the training.
    • Comparison Demo on results of training with different Configurations and our learning how we selected the best config.
  • How to understand the training is going wrong while training ?

    • Being a GPU heavy task it’s always a question if we train the model and then it acts poor then the compute is wasted ! We will discuss how we figured out how to understand while the training is in progress. Will it go correctly or not?
    • This will include understanding of training samples and Loss Curves.
    • We will also talk about a custom approach of extracting learning maps from each training layer to showcase what the model is learning.
  • Evaluation Criteria:

    • Here we will talk about how to set up an evaluation metric for the training purpose and learn from how we set it up.
    • What are the best practices for testing each training iteration and finalize the best working version based on metrics.
  • Pathway to achieve superior quality using Model Merging:

    • Generative AI models are always treated as experts, but what Mixtral models taught us when multiple experts combined the results are always amazing !
    • We will talk about how we scaled the results we got to an amazing quality by merging our fine-tuned version with other models , along with steps to do for model merging.
  • Out of the box approaches we tried for Finetuning and how they worked:

    • Along with conventional finetuning method we tried different research and community based approaches like DPO(Direct Preference Optimization), Distillation Techniques with Byte dance models and replacing Clip text encoders with LLaMa model (as proposed in Playground V3 paper), will throw a light on how we did these, how they worked out and what learning we had.
  • Conclusion

  • QnA

Takeaways for audience:

  • Understanding business needs about how and when to finetune a Model to get better quality.
  • Understanding of best practices to finetune a Model
  • Understanding of how a trivial training approach looks like
  • Understanding of what amazing things anyone can solve and not solve by finetuning, getting out of a jargon that training always works!

Target Audience :

MLE at Caimera AI, Former MLE at Newton School, Dark Horse, Shell, WRI, Metvy. Contributed to Google Tensorflow(GSOC), Samsung(Prism), IIT Patna (Projects). Founded MBK Health tech backed by supreme ventures to apply AI for early detection of cardiac diseases and create a hyper local support network for patients using wearables. Holding 4 patents on medical Imaging automations using AI algorithms,holding multiple research papers and Indian young Achievers award winner for contributions in artificial intelligence towards nation. Spoken at Py-Bangalore, Belgium-Py conference, Keras Community Day 23, Girlscript India Summit, MIT TECH X , HPAIR(delegate) etc and multiple meetups, hackathons and events.
(Linkedin: https://www.linkedin.com/in/anustupmukherjee/)

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hybrid Access Ticket

Hosted by

Jump starting better data engineering and AI futures

Supported by

Gold Sponsor

Sahaj is an artisanal software engineering firm, built on the values of trust, respect, curiosity, and craftsmanship, delivering purpose-built solutions to driv

Gold Sponsor

Atlassian unleashes the potential of every team. Our agile & DevOps, IT service management and work management software helps teams organize, discuss, and compl