- This paper introduces the Latent Diffusion Model (LDM) which led to the release of Stable Diffusion - an open-source text-to-image model released in August 2022 that paves the way for a number of other innovations in text-to-image, image-to-image, and image editing applications.
- This paper introduced the idea of applying the diffusion process in the latent space of an autoencoder and not directly on image pixels. This approach reduces computational demands significantly while retaining image synthesis quality and flexibility.
- Given the open-source nature and simple building blocks, there has been tremendous community engagement leading to multiple interesting contributions to the model architecture that enable applications like ControlNet (https://arxiv.org/abs/2302.05543), Dreambooth (https://dreambooth.github.io/) and even audio generation (https://github.com/riffusion/riffusion)
The original paper is published at https://arxiv.org/pdf/2112.10752.pdf
- If you’re interested in the space of text-to-image generation - then this session is for you. We will discuss how the image generation process works and how it can be applied to multiple use cases.
- We will approach the paper in the following steps
- a high-level system design view of the various components in a Latent Diffusion Model
- deep-dive into the diffusion process itself and how it works in conjunction with the CLIP Vision-Text transformer
- diving into code implementations and other design choices
- understanding some extensions like ControlNet and Dreambooth and how they work with the model architecture
Sidharth Ramachandran works at a large European media company and has been applying text-to-image techniques as part of building data products for a streaming platform. He is also a part-time instructor and has co-authored a book published by O’Reilly. He is an enthusiastic learner who is fascinated to see where AI research is heading and what applications it can unlock for humanity.
Amiruddin Nagri, founder of Memex AI, with experience previously in Gojek and ThoughtWorks will lead the discussion and also share insights in the domain of diffusion models. Amir previously conducted a houseful workshop on Stable Diffusion.
This is an online paper reading session. RSVP to participate via Zoom.
The Fifth Elephant member - Bharat Shetty Barkur - is the curator of the paper discussions.
Bharat has worked across different organizations such as IBM India Software Labs, Aruba Networks, Fybr, Concerto HealthAI, and Airtel Labs. He has worked on products and platforms across diverse verticals such as retail, IoT, chat and voice bots, edtech, and healthcare leveraging AI, Machine Learning, NLP, and software engineering. His interests lie in AI, NLP research, and accessibility.
The goal is for the community to understand popular papers in Generative AI, DL, and ML domains. Bharat and other co-curators seek to put together papers that will benefit the community, and organize reading and learning sessions driven by experts and curious folks in GenerativeAI, Deep Learning, and Machine Learning.
The paper discussions will be conducted every month - online and in person.
- Suggest a paper to discuss. Post a comment here to suggest the paper you’d like to discuss. This should involve slides, and code samples to make parts of the paper simpler and more understandable.
- Moderate/discuss a paper someone else is proposing.
- Spread the word among colleagues and friends. Join The Fifth Elephant Telegram group or WhatsApp group.
The Fifth Elephant is a community funded organization. If you like the work that The Fifth Elephant does and want to support meet-ups and activities - online and in-person - contribute by picking up a membership
For inquiries, leave a comment or call The Fifth Elephant at +91-7676332020.