diff --git a/docs/source/en/quicktour.mdx b/docs/source/en/quicktour.mdx index 2a2a5a3ad903..b3ac68ca9611 100644 --- a/docs/source/en/quicktour.mdx +++ b/docs/source/en/quicktour.mdx @@ -32,8 +32,9 @@ The quicktour is a simplified version of the introductory ๐Ÿงจ Diffusers [notebo Before you begin, make sure you have all the necessary libraries installed: -```bash -!pip install --upgrade diffusers accelerate transformers +```py +# uncomment to install the necessary libraries in Colab +#!pip install --upgrade diffusers accelerate transformers ``` - [๐Ÿค— Accelerate](https://huggingface.co/docs/accelerate/index) speeds up model loading for inference and training. diff --git a/docs/source/en/training/dreambooth.mdx b/docs/source/en/training/dreambooth.mdx index c26762d4a75d..6ca9c4531b82 100644 --- a/docs/source/en/training/dreambooth.mdx +++ b/docs/source/en/training/dreambooth.mdx @@ -12,8 +12,6 @@ specific language governing permissions and limitations under the License. # DreamBooth -[[open-in-colab]] - [DreamBooth](https://arxiv.org/abs/2208.12242) is a method to personalize text-to-image models like Stable Diffusion given just a few (3-5) images of a subject. It allows the model to generate contextualized images of the subject in different scenes, poses, and views. ![Dreambooth examples from the project's blog](https://dreambooth.github.io/DreamBooth_files/teaser_static.jpg) diff --git a/docs/source/en/training/lora.mdx b/docs/source/en/training/lora.mdx index 1208178810a5..dfb31c7ef87a 100644 --- a/docs/source/en/training/lora.mdx +++ b/docs/source/en/training/lora.mdx @@ -12,8 +12,6 @@ specific language governing permissions and limitations under the License. # Low-Rank Adaptation of Large Language Models (LoRA) -[[open-in-colab]] - Currently, LoRA is only supported for the attention layers of the [`UNet2DConditionalModel`]. We also diff --git a/docs/source/en/training/text_inversion.mdx b/docs/source/en/training/text_inversion.mdx index a4fe4c2c4e5b..050b0ca3d403 100644 --- a/docs/source/en/training/text_inversion.mdx +++ b/docs/source/en/training/text_inversion.mdx @@ -14,8 +14,6 @@ specific language governing permissions and limitations under the License. # Textual Inversion -[[open-in-colab]] - [Textual Inversion](https://arxiv.org/abs/2208.01618) is a technique for capturing novel concepts from a small number of example images. While the technique was originally demonstrated with a [latent diffusion model](https://github.com/CompVis/latent-diffusion), it has since been applied to other model variants like [Stable Diffusion](https://huggingface.co/docs/diffusers/main/en/conceptual/stable_diffusion). The learned concepts can be used to better control the images generated from text-to-image pipelines. It learns new "words" in the text encoder's embedding space, which are used within text prompts for personalized image generation. ![Textual Inversion example](https://textual-inversion.github.io/static/images/editing/colorful_teapot.JPG) diff --git a/docs/source/en/tutorials/basic_training.mdx b/docs/source/en/tutorials/basic_training.mdx index 99221274f745..c8f5c7fac780 100644 --- a/docs/source/en/tutorials/basic_training.mdx +++ b/docs/source/en/tutorials/basic_training.mdx @@ -26,8 +26,9 @@ This tutorial will teach you how to train a [`UNet2DModel`] from scratch on a su Before you begin, make sure you have ๐Ÿค— Datasets installed to load and preprocess image datasets, and ๐Ÿค— Accelerate, to simplify training on any number of GPUs. The following command will also install [TensorBoard](https://www.tensorflow.org/tensorboard) to visualize training metrics (you can also use [Weights & Biases](https://docs.wandb.ai/) to track your training). -```bash -!pip install diffusers[training] +```py +# uncomment to install the necessary libraries in Colab +#!pip install diffusers[training] ``` We encourage you to share your model with the community, and in order to do that, you'll need to login to your Hugging Face account (create one [here](https://hf.co/join) if you don't already have one!). You can login from a notebook and enter your token when prompted: diff --git a/docs/source/en/using-diffusers/custom_pipeline_examples.mdx b/docs/source/en/using-diffusers/custom_pipeline_examples.mdx index 93ac6d1f782c..f97a9ad09ac5 100644 --- a/docs/source/en/using-diffusers/custom_pipeline_examples.mdx +++ b/docs/source/en/using-diffusers/custom_pipeline_examples.mdx @@ -12,6 +12,8 @@ specific language governing permissions and limitations under the License. # Community pipelines +[[open-in-colab]] + > **For more information about community pipelines, please have a look at [this issue](https://github.com/huggingface/diffusers/issues/841).** **Community** examples consist of both inference and training examples that have been added by the community. diff --git a/docs/source/en/using-diffusers/custom_pipeline_overview.mdx b/docs/source/en/using-diffusers/custom_pipeline_overview.mdx index 3c5df7c0dd6e..78a64b6bcb96 100644 --- a/docs/source/en/using-diffusers/custom_pipeline_overview.mdx +++ b/docs/source/en/using-diffusers/custom_pipeline_overview.mdx @@ -12,6 +12,8 @@ specific language governing permissions and limitations under the License. # Load community pipelines +[[open-in-colab]] + Community pipelines are any [`DiffusionPipeline`] class that are different from the original implementation as specified in their paper (for example, the [`StableDiffusionControlNetPipeline`] corresponds to the [Text-to-Image Generation with ControlNet Conditioning](https://arxiv.org/abs/2302.05543) paper). They provide additional functionality or extend the original implementation of a pipeline. There are many cool community pipelines like [Speech to Image](https://github.com/huggingface/diffusers/tree/main/examples/community#speech-to-image) or [Composable Stable Diffusion](https://github.com/huggingface/diffusers/tree/main/examples/community#composable-stable-diffusion), and you can find all the official community pipelines [here](https://github.com/huggingface/diffusers/tree/main/examples/community). diff --git a/docs/source/en/using-diffusers/img2img.mdx b/docs/source/en/using-diffusers/img2img.mdx index 71540fbf5dd9..5b881b311a6a 100644 --- a/docs/source/en/using-diffusers/img2img.mdx +++ b/docs/source/en/using-diffusers/img2img.mdx @@ -18,8 +18,9 @@ The [`StableDiffusionImg2ImgPipeline`] lets you pass a text prompt and an initia Before you begin, make sure you have all the necessary libraries installed: -```bash -!pip install diffusers transformers ftfy accelerate +```py +# uncomment to install the necessary libraries in Colab +#!pip install diffusers transformers ftfy accelerate ``` Get started by creating a [`StableDiffusionImg2ImgPipeline`] with a pretrained Stable Diffusion model like [`nitrosocke/Ghibli-Diffusion`](https://huggingface.co/nitrosocke/Ghibli-Diffusion). diff --git a/docs/source/en/using-diffusers/loading.mdx b/docs/source/en/using-diffusers/loading.mdx index 24dd1dd04cd1..8ebd3569e4b0 100644 --- a/docs/source/en/using-diffusers/loading.mdx +++ b/docs/source/en/using-diffusers/loading.mdx @@ -12,6 +12,8 @@ specific language governing permissions and limitations under the License. # Load pipelines, models, and schedulers +[[open-in-colab]] + Having an easy way to use a diffusion system for inference is essential to ๐Ÿงจ Diffusers. Diffusion systems often consist of multiple components like parameterized models, tokenizers, and schedulers that interact in complex ways. That is why we designed the [`DiffusionPipeline`] to wrap the complexity of the entire diffusion system into an easy-to-use API, while remaining flexible enough to be adapted for other use cases, such as loading each component individually as building blocks to assemble your own diffusion system. Everything you need for inference or training is accessible with the `from_pretrained()` method. diff --git a/docs/source/en/using-diffusers/other-formats.mdx b/docs/source/en/using-diffusers/other-formats.mdx index 8e606f13469d..2aeb9f3ae204 100644 --- a/docs/source/en/using-diffusers/other-formats.mdx +++ b/docs/source/en/using-diffusers/other-formats.mdx @@ -12,6 +12,8 @@ specific language governing permissions and limitations under the License. # Load different Stable Diffusion formats +[[open-in-colab]] + Stable Diffusion models are available in different formats depending on the framework they're trained and saved with, and where you download them from. Converting these formats for use in ๐Ÿค— Diffusers allows you to use all the features supported by the library, such as [using different schedulers](schedulers) for inference, [building your custom pipeline](write_own_pipeline), and a variety of techniques and methods for [optimizing inference speed](./optimization/opt_overview). @@ -141,8 +143,9 @@ pipeline.scheduler = UniPCMultistepScheduler.from_config(pipeline.scheduler.conf Download a LoRA checkpoint from Civitai; this example uses the [Howls Moving Castle,Interior/Scenery LoRA (Ghibli Stlye)](https://civitai.com/models/14605?modelVersionId=19998) checkpoint, but feel free to try out any LoRA checkpoint! -```bash -!wget https://civitai.com/api/download/models/19998 -O howls_moving_castle.safetensors +```py +# uncomment to download the safetensor weights +#!wget https://civitai.com/api/download/models/19998 -O howls_moving_castle.safetensors ``` Load the LoRA checkpoint into the pipeline with the [`~loaders.LoraLoaderMixin.load_lora_weights`] method: diff --git a/docs/source/en/using-diffusers/reproducibility.mdx b/docs/source/en/using-diffusers/reproducibility.mdx index b666dac72cbf..1594e967c847 100644 --- a/docs/source/en/using-diffusers/reproducibility.mdx +++ b/docs/source/en/using-diffusers/reproducibility.mdx @@ -12,6 +12,8 @@ specific language governing permissions and limitations under the License. # Create reproducible pipelines +[[open-in-colab]] + Reproducibility is important for testing, replicating results, and can even be used to [improve image quality](reusing_seeds). However, the randomness in diffusion models is a desired property because it allows the pipeline to generate different images every time it is run. While you can't expect to get the exact same results across platforms, you can expect results to be reproducible across releases and platforms within a certain tolerance range. Even then, tolerance varies depending on the diffusion pipeline and checkpoint. This is why it's important to understand how to control sources of randomness in diffusion models or use deterministic algorithms. diff --git a/docs/source/en/using-diffusers/reusing_seeds.mdx b/docs/source/en/using-diffusers/reusing_seeds.mdx index eea0fd7e3e9d..1ff84f02596e 100644 --- a/docs/source/en/using-diffusers/reusing_seeds.mdx +++ b/docs/source/en/using-diffusers/reusing_seeds.mdx @@ -12,6 +12,8 @@ specific language governing permissions and limitations under the License. # Improve image quality with deterministic generation +[[open-in-colab]] + A common way to improve the quality of generated images is with *deterministic batch generation*, generate a batch of images and select one image to improve with a more detailed prompt in a second round of inference. The key is to pass a list of [`torch.Generator`](https://pytorch.org/docs/stable/generated/torch.Generator.html#generator)'s to the pipeline for batched image generation, and tie each `Generator` to a seed so you can reuse it for an image. Let's use [`runwayml/stable-diffusion-v1-5`](runwayml/stable-diffusion-v1-5) for example, and generate several versions of the following prompt: diff --git a/docs/source/en/using-diffusers/schedulers.mdx b/docs/source/en/using-diffusers/schedulers.mdx index 741d92bdd90d..c2395c106c15 100644 --- a/docs/source/en/using-diffusers/schedulers.mdx +++ b/docs/source/en/using-diffusers/schedulers.mdx @@ -12,6 +12,8 @@ specific language governing permissions and limitations under the License. # Schedulers +[[open-in-colab]] + Diffusion pipelines are inherently a collection of diffusion models and schedulers that are partly independent from each other. This means that one is able to switch out parts of the pipeline to better customize a pipeline to one's use case. The best example of this is the [Schedulers](../api/schedulers/overview.mdx). diff --git a/docs/source/en/using-diffusers/stable_diffusion_jax_how_to.mdx b/docs/source/en/using-diffusers/stable_diffusion_jax_how_to.mdx index e0332fdc6496..2150f2f769fd 100644 --- a/docs/source/en/using-diffusers/stable_diffusion_jax_how_to.mdx +++ b/docs/source/en/using-diffusers/stable_diffusion_jax_how_to.mdx @@ -14,9 +14,10 @@ Note that JAX is not exclusive to TPUs, but it shines on that hardware because e First make sure diffusers is installed. -```bash -!pip install jax==0.3.25 jaxlib==0.3.25 flax transformers ftfy -!pip install diffusers +```py +# uncomment to install the necessary libraries in Colab +#!pip install jax==0.3.25 jaxlib==0.3.25 flax transformers ftfy +#!pip install diffusers ``` ```python diff --git a/docs/source/en/using-diffusers/using_safetensors.mdx b/docs/source/en/using-diffusers/using_safetensors.mdx index 2015f2faf85a..c312ab597075 100644 --- a/docs/source/en/using-diffusers/using_safetensors.mdx +++ b/docs/source/en/using-diffusers/using_safetensors.mdx @@ -1,11 +1,14 @@ # Load safetensors +[[open-in-colab]] + [safetensors](https://github.com/huggingface/safetensors) is a safe and fast file format for storing and loading tensors. Typically, PyTorch model weights are saved or *pickled* into a `.bin` file with Python's [`pickle`](https://docs.python.org/3/library/pickle.html) utility. However, `pickle` is not secure and pickled files may contain malicious code that can be executed. safetensors is a secure alternative to `pickle`, making it ideal for sharing model weights. This guide will show you how you load `.safetensor` files, and how to convert Stable Diffusion model weights stored in other formats to `.safetensor`. Before you start, make sure you have safetensors installed: -```bash -!pip install safetensors +```py +# uncomment to install the necessary libraries in Colab +#!pip install safetensors ``` If you look at the [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5/tree/main) repository, you'll see weights inside the `text_encoder`, `unet` and `vae` subfolders are stored in the `.safetensors` format. By default, ๐Ÿค— Diffusers automatically loads these `.safetensors` files from their subfolders if they're available in the model repository. diff --git a/docs/source/en/using-diffusers/weighted_prompts.mdx b/docs/source/en/using-diffusers/weighted_prompts.mdx index 58e670fbafe9..5e6371d0116a 100644 --- a/docs/source/en/using-diffusers/weighted_prompts.mdx +++ b/docs/source/en/using-diffusers/weighted_prompts.mdx @@ -12,6 +12,8 @@ specific language governing permissions and limitations under the License. # Weighting prompts +[[open-in-colab]] + Text-guided diffusion models generate images based on a given text prompt. The text prompt can include multiple concepts that the model should generate and it's often desirable to weight certain parts of the prompt more or less. diff --git a/docs/source/en/using-diffusers/write_own_pipeline.mdx b/docs/source/en/using-diffusers/write_own_pipeline.mdx index be92980118b1..c7e257f4fa36 100644 --- a/docs/source/en/using-diffusers/write_own_pipeline.mdx +++ b/docs/source/en/using-diffusers/write_own_pipeline.mdx @@ -42,63 +42,63 @@ To recreate the pipeline with the model and scheduler separately, let's write ou 1. Load the model and scheduler: - ```py - >>> from diffusers import DDPMScheduler, UNet2DModel +```py +>>> from diffusers import DDPMScheduler, UNet2DModel - >>> scheduler = DDPMScheduler.from_pretrained("google/ddpm-cat-256") - >>> model = UNet2DModel.from_pretrained("google/ddpm-cat-256").to("cuda") - ``` +>>> scheduler = DDPMScheduler.from_pretrained("google/ddpm-cat-256") +>>> model = UNet2DModel.from_pretrained("google/ddpm-cat-256").to("cuda") +``` 2. Set the number of timesteps to run the denoising process for: - ```py - >>> scheduler.set_timesteps(50) - ``` +```py +>>> scheduler.set_timesteps(50) +``` 3. Setting the scheduler timesteps creates a tensor with evenly spaced elements in it, 50 in this example. Each element corresponds to a timestep at which the model denoises an image. When you create the denoising loop later, you'll iterate over this tensor to denoise an image: - ```py - >>> scheduler.timesteps - tensor([980, 960, 940, 920, 900, 880, 860, 840, 820, 800, 780, 760, 740, 720, - 700, 680, 660, 640, 620, 600, 580, 560, 540, 520, 500, 480, 460, 440, - 420, 400, 380, 360, 340, 320, 300, 280, 260, 240, 220, 200, 180, 160, - 140, 120, 100, 80, 60, 40, 20, 0]) - ``` +```py +>>> scheduler.timesteps +tensor([980, 960, 940, 920, 900, 880, 860, 840, 820, 800, 780, 760, 740, 720, + 700, 680, 660, 640, 620, 600, 580, 560, 540, 520, 500, 480, 460, 440, + 420, 400, 380, 360, 340, 320, 300, 280, 260, 240, 220, 200, 180, 160, + 140, 120, 100, 80, 60, 40, 20, 0]) +``` 4. Create some random noise with the same shape as the desired output: - ```py - >>> import torch +```py +>>> import torch - >>> sample_size = model.config.sample_size - >>> noise = torch.randn((1, 3, sample_size, sample_size)).to("cuda") - ``` +>>> sample_size = model.config.sample_size +>>> noise = torch.randn((1, 3, sample_size, sample_size)).to("cuda") +``` -4. Now write a loop to iterate over the timesteps. At each timestep, the model does a [`UNet2DModel.forward`] pass and returns the noisy residual. The scheduler's [`~DDPMScheduler.step`] method takes the noisy residual, timestep, and input and it predicts the image at the previous timestep. This output becomes the next input to the model in the denoising loop, and it'll repeat until it reaches the end of the `timesteps` array. +5. Now write a loop to iterate over the timesteps. At each timestep, the model does a [`UNet2DModel.forward`] pass and returns the noisy residual. The scheduler's [`~DDPMScheduler.step`] method takes the noisy residual, timestep, and input and it predicts the image at the previous timestep. This output becomes the next input to the model in the denoising loop, and it'll repeat until it reaches the end of the `timesteps` array. - ```py - >>> input = noise +```py +>>> input = noise - >>> for t in scheduler.timesteps: - ... with torch.no_grad(): - ... noisy_residual = model(input, t).sample - ... previous_noisy_sample = scheduler.step(noisy_residual, t, input).prev_sample - ... input = previous_noisy_sample - ``` +>>> for t in scheduler.timesteps: +... with torch.no_grad(): +... noisy_residual = model(input, t).sample +... previous_noisy_sample = scheduler.step(noisy_residual, t, input).prev_sample +... input = previous_noisy_sample +``` - This is the entire denoising process, and you can use this same pattern to write any diffusion system. +This is the entire denoising process, and you can use this same pattern to write any diffusion system. -5. The last step is to convert the denoised output into an image: +6. The last step is to convert the denoised output into an image: - ```py - >>> from PIL import Image - >>> import numpy as np +```py +>>> from PIL import Image +>>> import numpy as np - >>> image = (input / 2 + 0.5).clamp(0, 1) - >>> image = image.cpu().permute(0, 2, 3, 1).numpy()[0] - >>> image = Image.fromarray((image * 255).round().astype("uint8")) - >>> image - ``` +>>> image = (input / 2 + 0.5).clamp(0, 1) +>>> image = image.cpu().permute(0, 2, 3, 1).numpy()[0] +>>> image = Image.fromarray((image * 255).round().astype("uint8")) +>>> image +``` In the next section, you'll put your skills to the test and breakdown the more complex Stable Diffusion pipeline. The steps are more or less the same. You'll initialize the necessary components, and set the number of timesteps to create a `timestep` array. The `timestep` array is used in the denoising loop, and for each element in this array, the model predicts a less noisy image. The denoising loop iterates over the `timestep`'s, and at each timestep, it outputs a noisy residual and the scheduler uses it to predict a less noisy image at the previous timestep. This process is repeated until you reach the end of the `timestep` array.