Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions docs/source/en/quicktour.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,9 @@ The quicktour is a simplified version of the introductory 🧨 Diffusers [notebo

Before you begin, make sure you have all the necessary libraries installed:

```bash
!pip install --upgrade diffusers accelerate transformers
```py
# uncomment to install the necessary libraries in Colab
#!pip install --upgrade diffusers accelerate transformers
```

- [🤗 Accelerate](https://huggingface.co/docs/accelerate/index) speeds up model loading for inference and training.
Expand Down
2 changes: 0 additions & 2 deletions docs/source/en/training/dreambooth.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,6 @@ specific language governing permissions and limitations under the License.

# DreamBooth

[[open-in-colab]]

[DreamBooth](https://arxiv.org/abs/2208.12242) is a method to personalize text-to-image models like Stable Diffusion given just a few (3-5) images of a subject. It allows the model to generate contextualized images of the subject in different scenes, poses, and views.

![Dreambooth examples from the project's blog](https://dreambooth.github.io/DreamBooth_files/teaser_static.jpg)
Expand Down
2 changes: 0 additions & 2 deletions docs/source/en/training/lora.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,6 @@ specific language governing permissions and limitations under the License.

# Low-Rank Adaptation of Large Language Models (LoRA)

[[open-in-colab]]

<Tip warning={true}>

Currently, LoRA is only supported for the attention layers of the [`UNet2DConditionalModel`]. We also
Expand Down
2 changes: 0 additions & 2 deletions docs/source/en/training/text_inversion.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,6 @@ specific language governing permissions and limitations under the License.

# Textual Inversion

[[open-in-colab]]

[Textual Inversion](https://arxiv.org/abs/2208.01618) is a technique for capturing novel concepts from a small number of example images. While the technique was originally demonstrated with a [latent diffusion model](https://github.com/CompVis/latent-diffusion), it has since been applied to other model variants like [Stable Diffusion](https://huggingface.co/docs/diffusers/main/en/conceptual/stable_diffusion). The learned concepts can be used to better control the images generated from text-to-image pipelines. It learns new "words" in the text encoder's embedding space, which are used within text prompts for personalized image generation.

![Textual Inversion example](https://textual-inversion.github.io/static/images/editing/colorful_teapot.JPG)
Expand Down
5 changes: 3 additions & 2 deletions docs/source/en/tutorials/basic_training.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,9 @@ This tutorial will teach you how to train a [`UNet2DModel`] from scratch on a su

Before you begin, make sure you have 🤗 Datasets installed to load and preprocess image datasets, and 🤗 Accelerate, to simplify training on any number of GPUs. The following command will also install [TensorBoard](https://www.tensorflow.org/tensorboard) to visualize training metrics (you can also use [Weights & Biases](https://docs.wandb.ai/) to track your training).

```bash
!pip install diffusers[training]
```py
# uncomment to install the necessary libraries in Colab
#!pip install diffusers[training]
```

We encourage you to share your model with the community, and in order to do that, you'll need to login to your Hugging Face account (create one [here](https://hf.co/join) if you don't already have one!). You can login from a notebook and enter your token when prompted:
Expand Down
2 changes: 2 additions & 0 deletions docs/source/en/using-diffusers/custom_pipeline_examples.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ specific language governing permissions and limitations under the License.

# Community pipelines

[[open-in-colab]]

> **For more information about community pipelines, please have a look at [this issue](https://github.com/huggingface/diffusers/issues/841).**

**Community** examples consist of both inference and training examples that have been added by the community.
Expand Down
2 changes: 2 additions & 0 deletions docs/source/en/using-diffusers/custom_pipeline_overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ specific language governing permissions and limitations under the License.

# Load community pipelines

[[open-in-colab]]

Community pipelines are any [`DiffusionPipeline`] class that are different from the original implementation as specified in their paper (for example, the [`StableDiffusionControlNetPipeline`] corresponds to the [Text-to-Image Generation with ControlNet Conditioning](https://arxiv.org/abs/2302.05543) paper). They provide additional functionality or extend the original implementation of a pipeline.

There are many cool community pipelines like [Speech to Image](https://github.com/huggingface/diffusers/tree/main/examples/community#speech-to-image) or [Composable Stable Diffusion](https://github.com/huggingface/diffusers/tree/main/examples/community#composable-stable-diffusion), and you can find all the official community pipelines [here](https://github.com/huggingface/diffusers/tree/main/examples/community).
Expand Down
5 changes: 3 additions & 2 deletions docs/source/en/using-diffusers/img2img.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,9 @@ The [`StableDiffusionImg2ImgPipeline`] lets you pass a text prompt and an initia

Before you begin, make sure you have all the necessary libraries installed:

```bash
!pip install diffusers transformers ftfy accelerate
```py
# uncomment to install the necessary libraries in Colab
#!pip install diffusers transformers ftfy accelerate
```

Get started by creating a [`StableDiffusionImg2ImgPipeline`] with a pretrained Stable Diffusion model like [`nitrosocke/Ghibli-Diffusion`](https://huggingface.co/nitrosocke/Ghibli-Diffusion).
Expand Down
2 changes: 2 additions & 0 deletions docs/source/en/using-diffusers/loading.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ specific language governing permissions and limitations under the License.

# Load pipelines, models, and schedulers

[[open-in-colab]]

Having an easy way to use a diffusion system for inference is essential to 🧨 Diffusers. Diffusion systems often consist of multiple components like parameterized models, tokenizers, and schedulers that interact in complex ways. That is why we designed the [`DiffusionPipeline`] to wrap the complexity of the entire diffusion system into an easy-to-use API, while remaining flexible enough to be adapted for other use cases, such as loading each component individually as building blocks to assemble your own diffusion system.

Everything you need for inference or training is accessible with the `from_pretrained()` method.
Expand Down
7 changes: 5 additions & 2 deletions docs/source/en/using-diffusers/other-formats.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ specific language governing permissions and limitations under the License.

# Load different Stable Diffusion formats

[[open-in-colab]]

Stable Diffusion models are available in different formats depending on the framework they're trained and saved with, and where you download them from. Converting these formats for use in 🤗 Diffusers allows you to use all the features supported by the library, such as [using different schedulers](schedulers) for inference, [building your custom pipeline](write_own_pipeline), and a variety of techniques and methods for [optimizing inference speed](./optimization/opt_overview).

<Tip>
Expand Down Expand Up @@ -141,8 +143,9 @@ pipeline.scheduler = UniPCMultistepScheduler.from_config(pipeline.scheduler.conf

Download a LoRA checkpoint from Civitai; this example uses the [Howls Moving Castle,Interior/Scenery LoRA (Ghibli Stlye)](https://civitai.com/models/14605?modelVersionId=19998) checkpoint, but feel free to try out any LoRA checkpoint!

```bash
!wget https://civitai.com/api/download/models/19998 -O howls_moving_castle.safetensors
```py
# uncomment to download the safetensor weights
#!wget https://civitai.com/api/download/models/19998 -O howls_moving_castle.safetensors
```

Load the LoRA checkpoint into the pipeline with the [`~loaders.LoraLoaderMixin.load_lora_weights`] method:
Expand Down
2 changes: 2 additions & 0 deletions docs/source/en/using-diffusers/reproducibility.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ specific language governing permissions and limitations under the License.

# Create reproducible pipelines

[[open-in-colab]]

Reproducibility is important for testing, replicating results, and can even be used to [improve image quality](reusing_seeds). However, the randomness in diffusion models is a desired property because it allows the pipeline to generate different images every time it is run. While you can't expect to get the exact same results across platforms, you can expect results to be reproducible across releases and platforms within a certain tolerance range. Even then, tolerance varies depending on the diffusion pipeline and checkpoint.

This is why it's important to understand how to control sources of randomness in diffusion models or use deterministic algorithms.
Expand Down
2 changes: 2 additions & 0 deletions docs/source/en/using-diffusers/reusing_seeds.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ specific language governing permissions and limitations under the License.

# Improve image quality with deterministic generation

[[open-in-colab]]

A common way to improve the quality of generated images is with *deterministic batch generation*, generate a batch of images and select one image to improve with a more detailed prompt in a second round of inference. The key is to pass a list of [`torch.Generator`](https://pytorch.org/docs/stable/generated/torch.Generator.html#generator)'s to the pipeline for batched image generation, and tie each `Generator` to a seed so you can reuse it for an image.

Let's use [`runwayml/stable-diffusion-v1-5`](runwayml/stable-diffusion-v1-5) for example, and generate several versions of the following prompt:
Expand Down
2 changes: 2 additions & 0 deletions docs/source/en/using-diffusers/schedulers.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ specific language governing permissions and limitations under the License.

# Schedulers

[[open-in-colab]]

Diffusion pipelines are inherently a collection of diffusion models and schedulers that are partly independent from each other. This means that one is able to switch out parts of the pipeline to better customize
a pipeline to one's use case. The best example of this is the [Schedulers](../api/schedulers/overview.mdx).

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,10 @@ Note that JAX is not exclusive to TPUs, but it shines on that hardware because e

First make sure diffusers is installed.

```bash
!pip install jax==0.3.25 jaxlib==0.3.25 flax transformers ftfy
!pip install diffusers
```py
# uncomment to install the necessary libraries in Colab
#!pip install jax==0.3.25 jaxlib==0.3.25 flax transformers ftfy
#!pip install diffusers
```

```python
Expand Down
7 changes: 5 additions & 2 deletions docs/source/en/using-diffusers/using_safetensors.mdx
Original file line number Diff line number Diff line change
@@ -1,11 +1,14 @@
# Load safetensors

[[open-in-colab]]

[safetensors](https://github.com/huggingface/safetensors) is a safe and fast file format for storing and loading tensors. Typically, PyTorch model weights are saved or *pickled* into a `.bin` file with Python's [`pickle`](https://docs.python.org/3/library/pickle.html) utility. However, `pickle` is not secure and pickled files may contain malicious code that can be executed. safetensors is a secure alternative to `pickle`, making it ideal for sharing model weights.

This guide will show you how you load `.safetensor` files, and how to convert Stable Diffusion model weights stored in other formats to `.safetensor`. Before you start, make sure you have safetensors installed:

```bash
!pip install safetensors
```py
# uncomment to install the necessary libraries in Colab
#!pip install safetensors
```

If you look at the [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5/tree/main) repository, you'll see weights inside the `text_encoder`, `unet` and `vae` subfolders are stored in the `.safetensors` format. By default, 🤗 Diffusers automatically loads these `.safetensors` files from their subfolders if they're available in the model repository.
Expand Down
2 changes: 2 additions & 0 deletions docs/source/en/using-diffusers/weighted_prompts.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ specific language governing permissions and limitations under the License.

# Weighting prompts

[[open-in-colab]]

Text-guided diffusion models generate images based on a given text prompt. The text prompt
can include multiple concepts that the model should generate and it's often desirable to weight
certain parts of the prompt more or less.
Expand Down
78 changes: 39 additions & 39 deletions docs/source/en/using-diffusers/write_own_pipeline.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -42,63 +42,63 @@ To recreate the pipeline with the model and scheduler separately, let's write ou

1. Load the model and scheduler:

```py
>>> from diffusers import DDPMScheduler, UNet2DModel
```py
>>> from diffusers import DDPMScheduler, UNet2DModel

>>> scheduler = DDPMScheduler.from_pretrained("google/ddpm-cat-256")
>>> model = UNet2DModel.from_pretrained("google/ddpm-cat-256").to("cuda")
```
>>> scheduler = DDPMScheduler.from_pretrained("google/ddpm-cat-256")
>>> model = UNet2DModel.from_pretrained("google/ddpm-cat-256").to("cuda")
```

2. Set the number of timesteps to run the denoising process for:

```py
>>> scheduler.set_timesteps(50)
```
```py
>>> scheduler.set_timesteps(50)
```

3. Setting the scheduler timesteps creates a tensor with evenly spaced elements in it, 50 in this example. Each element corresponds to a timestep at which the model denoises an image. When you create the denoising loop later, you'll iterate over this tensor to denoise an image:

```py
>>> scheduler.timesteps
tensor([980, 960, 940, 920, 900, 880, 860, 840, 820, 800, 780, 760, 740, 720,
700, 680, 660, 640, 620, 600, 580, 560, 540, 520, 500, 480, 460, 440,
420, 400, 380, 360, 340, 320, 300, 280, 260, 240, 220, 200, 180, 160,
140, 120, 100, 80, 60, 40, 20, 0])
```
```py
>>> scheduler.timesteps
tensor([980, 960, 940, 920, 900, 880, 860, 840, 820, 800, 780, 760, 740, 720,
700, 680, 660, 640, 620, 600, 580, 560, 540, 520, 500, 480, 460, 440,
420, 400, 380, 360, 340, 320, 300, 280, 260, 240, 220, 200, 180, 160,
140, 120, 100, 80, 60, 40, 20, 0])
```

4. Create some random noise with the same shape as the desired output:

```py
>>> import torch
```py
>>> import torch

>>> sample_size = model.config.sample_size
>>> noise = torch.randn((1, 3, sample_size, sample_size)).to("cuda")
```
>>> sample_size = model.config.sample_size
>>> noise = torch.randn((1, 3, sample_size, sample_size)).to("cuda")
```

4. Now write a loop to iterate over the timesteps. At each timestep, the model does a [`UNet2DModel.forward`] pass and returns the noisy residual. The scheduler's [`~DDPMScheduler.step`] method takes the noisy residual, timestep, and input and it predicts the image at the previous timestep. This output becomes the next input to the model in the denoising loop, and it'll repeat until it reaches the end of the `timesteps` array.
5. Now write a loop to iterate over the timesteps. At each timestep, the model does a [`UNet2DModel.forward`] pass and returns the noisy residual. The scheduler's [`~DDPMScheduler.step`] method takes the noisy residual, timestep, and input and it predicts the image at the previous timestep. This output becomes the next input to the model in the denoising loop, and it'll repeat until it reaches the end of the `timesteps` array.

```py
>>> input = noise
```py
>>> input = noise

>>> for t in scheduler.timesteps:
... with torch.no_grad():
... noisy_residual = model(input, t).sample
... previous_noisy_sample = scheduler.step(noisy_residual, t, input).prev_sample
... input = previous_noisy_sample
```
>>> for t in scheduler.timesteps:
... with torch.no_grad():
... noisy_residual = model(input, t).sample
... previous_noisy_sample = scheduler.step(noisy_residual, t, input).prev_sample
... input = previous_noisy_sample
```

This is the entire denoising process, and you can use this same pattern to write any diffusion system.
This is the entire denoising process, and you can use this same pattern to write any diffusion system.

5. The last step is to convert the denoised output into an image:
6. The last step is to convert the denoised output into an image:

```py
>>> from PIL import Image
>>> import numpy as np
```py
>>> from PIL import Image
>>> import numpy as np

>>> image = (input / 2 + 0.5).clamp(0, 1)
>>> image = image.cpu().permute(0, 2, 3, 1).numpy()[0]
>>> image = Image.fromarray((image * 255).round().astype("uint8"))
>>> image
```
>>> image = (input / 2 + 0.5).clamp(0, 1)
>>> image = image.cpu().permute(0, 2, 3, 1).numpy()[0]
>>> image = Image.fromarray((image * 255).round().astype("uint8"))
>>> image
```

In the next section, you'll put your skills to the test and breakdown the more complex Stable Diffusion pipeline. The steps are more or less the same. You'll initialize the necessary components, and set the number of timesteps to create a `timestep` array. The `timestep` array is used in the denoising loop, and for each element in this array, the model predicts a less noisy image. The denoising loop iterates over the `timestep`'s, and at each timestep, it outputs a noisy residual and the scheduler uses it to predict a less noisy image at the previous timestep. This process is repeated until you reach the end of the `timestep` array.

Expand Down