Personalize your AI image generator by fine-tuning stable diffusion

Personalize your AI image generator by fine-tuning stable diffusion

Hands-on workshop - The Fifth Elephant 2025 Annual Conference

🔍 Workshop overview

AI-generated images are transforming industries - from creative art and fashion to anime and niche design applications. At the core of these breakthroughs are diffusion models. While general-purpose diffusion models are powerful, adapting them to specific cultural, aesthetic, or domain-driven requirements (like generating images of Indo-Nepalese women with authentic aesthetics) demands full fine-tuning.

This hands-on workshop will take participants through the complete process of fine-tuning a Stable Diffusion XL (SDXL) model from scratch, based on the team’s work at Caimera for Indian fashion use-cases. The session will uncover practical methods, best practices, challenges, and lessons learned, helping participants decide when to fine-tune, how to build effective datasets, and how to optimize configurations for real-world, high-quality outcomes.

Note

  • This workshop is of 3 hours duration.
  • It is an in-person and hands-on advanced workshop.
  • Beginners are welcome, but must be prepared to catch up on:
    • Prior Python knowledge
    • A basic understanding of deep learning fundamentals
    • Some experience playing with image generation models (like Stable Diffusion)
      This workshop is better suited for participants with intermediate to advanced AI/ML experience.
  • Limited seats available for participation.
  • Live stream available for The Fifth Elephant members to participate remotely.
  • WIP slides

🧭 Agenda

  • Why fine-tune a model?
    Understanding when businesses truly need to fine-tune a foundation model (versus using LoRa, prompt engineering, etc.) and what can realistically be fixed via fine-tuning.

  • Data collection for fine-tuning
    How to identify the problem to solve, curate datasets from web sources, manage copyright risks, and strike the right ratio between real-world and synthetic images.
    Includes a demo comparing results from different data collection methods.

  • Data pre-processing
    Cleaning, verifying, and ensuring dataset quality and concept representation.
    Demo: impact of different data processing techniques.

  • Captioning is critical
    How captions shape model learning. Best practices for auto-generated vs manual captions, and lessons from using LLM-based captioning.
    Comparison demo: manual vs auto-captioning.

  • Choosing the right base model and training configuration
    Criteria for model selection based on quality and architecture. Overview of open-source trainers (Kohya-SS, OneTrainer, SimpleTuner, Diffusers) and why one was selected.
    Deep dive into optimizers, learning rates, loss functions, network dimensions, and their impact on training.
    Comparison demo: different configurations and learnings.

  • Detecting issues mid-training
    How to monitor loss curves, inspect training samples, and custom techniques like extracting layer-wise learning maps to evaluate learning quality during compute-heavy fine-tuning jobs.

  • Evaluation strategy
    Setting up evaluation metrics, best practices for testing iterations, and selecting optimal versions.

  • Path to achieving superior quality via model merging
    Combining fine-tuned models with other expert models to scale results, inspired by ensemble techniques like Mixtral.

  • Out-of-the-box experiments
    Experiments with Direct Preference Optimization (DPO), model distillation, replacing CLIP encoders with LLaMA, and learnings from Playground v3 concepts.

  • Conclusion + Q&A

💻 Prerequisites

  • Basic understanding of diffusion models and image generation pipelines
  • Familiarity with Python-based AI tooling
  • Interest in practical ML deployment and customization for real-world applications

👥 Who should attend

  • Machine Learning Engineers and AI practitioners
  • Startup founders and product teams exploring AI-driven design or content generation
  • Product managers and technical decision-makers interested in AI integration
  • Advanced AI enthusiasts with prior exposure to Python, deep learning basics, and hands-on image generation experience

📚 What will participants learn?

  • How and when to fine-tune a diffusion model for specific business problems
  • Best practices for dataset collection, curation, and captioning
  • Model configuration, training optimization, and open-source tools for fine-tuning
  • Evaluation frameworks and quality assurance for fine-tuned models
  • Advanced tricks like model merging and experimental techniques beyond conventional training

👨 🏫 Instructor bio

Anustup Mukherjee is a Machine Learning Engineer at Caimera AI and former MLE at Newton School, Shell, WRI, and more. Anustup has contributed to Google TensorFlow (GSoC), Samsung (Prism), and IIT Patna research projects.

Founder of MBK Health Tech, Anustup holds four patents in medical imaging AI, has published multiple papers, and is a recipient of the Indian Young Achievers Award for contributions to AI in healthcare. A sought-after speaker, he has presented at events including *PyBangalore, Belgium-Py, Keras Community Day, MIT Tech X, and HPAIR.

Connect with Anustup on LinkedIn

How to attend this workshop

This workshop is open for The Fifth Elephant members and for The Fifth Elephant 2025 annual conference ticket buyers.

Seats are limited and available on a first-come-first-serve basis. 🎟️

Contact information ☎️

For inquiries about the workshop, contact +91-7676332020 or write to info@hasgeek.com

Venue

Underline Centre, 2nd floor

24, 1st Main, 3rd Cross Road, 3rd Floor,

Above Blue Tokai 24, 3rd A Cross, 1st Main Rd,

Bengaluru - 560071

Karnataka, IN

Loading…

Hosted by

Jump starting better data engineering and AI futures