Stable Diffusion Model

- It can (typically) use a U-Net architecture with millions of parameters, making it relatively lightweight for image generation tasks.
- It can (often) be used for tasks like inpainting, outpainting, and image-to-image translations guided by a text prompt.
- It can be associated to a Fine-Tuned Stable Diffusion Model.
- It can be based on a Publicly-Available Latent Diffusion Model.
- It can support:
  - Text-to-Image Generation, for generating images of landscapes based on detailed text descriptions.
  - Image Inpainting, used for filling in missing parts of an image based on prompts.
  - Image Outpainting, used for extending the content of an image beyond its original boundaries.
  - Image-to-Image Translation Generation, used for transforming an image from one representation to another.
- ...
Example(s):
- Stable Diffision v1, Stable Diffision v2, Stable Diffision v2.1 (~2022-12-07) [1].
- SDXL v1.0 (~2023-07-26).
- ...
Counter-Example(s):
- A PixelCNN model.
- A Variational Autoencoder.

See: Latent Diffusion Techniques, U-Net Architecture, CLIP ViT-L/14 Text Encoder, LAION-5B Dataset, Image-Text Pair, Inpainting, Outpainting, Image-to-Image Translation, Text Description Latent Diffusion Model, Text-to-Image Model, Inpainting, Prompt Engineering, Diffusion Model.

References

(Wikipedia, 2024) ⇒ https://en.wikipedia.org/wiki/Stable_Diffusion Retrieved:2024-01-22.
- Stable Diffusion is a deep learning, text-to-image model released in 2022 based on diffusion techniques. It is considered to be a part of the ongoing AI spring.
  It is primarily used to generate detailed images conditioned on text descriptions, though it can also be applied to other tasks such as inpainting, outpainting, and generating image-to-image translations guided by a text prompt.^[1] Its development involved researchers from the CompVis Group at Ludwig Maximilian University of Munich and Runway with a computational donation by Stability AI and training data from non-profit organizations.^[2]^[3]^[4]^[5]
  Stable Diffusion is a latent diffusion model, a kind of deep generative artificial neural network. Its code and model weights have been open sourced,^[6] and it can run on most consumer hardware equipped with a modest GPU with at least 4 GB VRAM. This marked a departure from previous proprietary text-to-image models such as DALL-E and Midjourney which were accessible only via cloud services.^[7]^[8]

(Wikipedia, 2022) ⇒ https://en.wikipedia.org/wiki/Stable_Diffusion Retrieved:2022-12-12.
- Stable Diffusion is a deep learning, text-to-image model released in 2022.