huggingface
diff --git a/‎.github/workflows/pr_tests.yml‎
Lines changed: 1 addition & 1 deletion b/‎.github/workflows/pr_tests.yml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎README.md‎
Lines changed: 96 additions & 17 deletions b/‎README.md‎
Lines changed: 96 additions & 17 deletions
diff --git a/‎docs/source/_toctree.yml‎
Lines changed: 10 additions & 6 deletions b/‎docs/source/_toctree.yml‎
Lines changed: 10 additions & 6 deletions
diff --git a/‎docs/source/api/models.mdx‎
Lines changed: 3 additions & 0 deletions b/‎docs/source/api/models.mdx‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎docs/source/api/pipelines/dance_diffusion.mdx‎
Lines changed: 33 additions & 0 deletions b/‎docs/source/api/pipelines/dance_diffusion.mdx‎
Lines changed: 33 additions & 0 deletions
diff --git a/‎docs/source/api/pipelines/ddim.mdx‎
Lines changed: 12 additions & 0 deletions b/‎docs/source/api/pipelines/ddim.mdx‎
Lines changed: 12 additions & 0 deletions
diff --git a/‎docs/source/api/pipelines/ddpm.mdx‎
Lines changed: 12 additions & 0 deletions b/‎docs/source/api/pipelines/ddpm.mdx‎
Lines changed: 12 additions & 0 deletions
diff --git a/‎docs/source/api/pipelines/latent_diffusion.mdx‎
Lines changed: 12 additions & 0 deletions b/‎docs/source/api/pipelines/latent_diffusion.mdx‎
Lines changed: 12 additions & 0 deletions
diff --git a/‎docs/source/api/pipelines/latent_diffusion_uncond.mdx‎
Lines changed: 12 additions & 0 deletions b/‎docs/source/api/pipelines/latent_diffusion_uncond.mdx‎
Lines changed: 12 additions & 0 deletions
diff --git a/‎docs/source/api/pipelines/overview.mdx‎
Lines changed: 4 additions & 4 deletions b/‎docs/source/api/pipelines/overview.mdx‎
Lines changed: 4 additions & 4 deletions
@@ -89,7 +89,7 @@ jobs:
     - name: Run all fast tests on MPS
       shell: arch -arch arm64 bash {0}
       run: |
-        ${CONDA_RUN} python -m pytest -n 4 --max-worker-restart=0 --dist=loadfile -s -v --make-reports=tests_torch_mps tests/
+        ${CONDA_RUN} python -m pytest -n 1 -s -v --make-reports=tests_torch_mps tests/
 
     - name: Failure short reports
       if: ${{ failure() }}
 
@@ -64,44 +64,54 @@ In order to get started, we recommend taking a look at two notebooks:
 - The [Training a diffusers model](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) notebook summarizes diffusion models training methods. This notebook takes a step-by-step approach to training your
   diffusion models on an image dataset, with explanatory graphics. 
 
-## **New** Stable Diffusion is now fully compatible with `diffusers`!  
+## Stable Diffusion is fully compatible with `diffusers`!  
 
-Stable Diffusion is a text-to-image latent diffusion model created by the researchers and engineers from [CompVis](https://github.com/CompVis), [Stability AI](https://stability.ai/) and [LAION](https://laion.ai/). It's trained on 512x512 images from a subset of the [LAION-5B](https://laion.ai/blog/laion-5b/) database. This model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM.
+Stable Diffusion is a text-to-image latent diffusion model created by the researchers and engineers from [CompVis](https://github.com/CompVis), [Stability AI](https://stability.ai/), [LAION](https://laion.ai/) and [RunwayML](https://runwayml.com/). It's trained on 512x512 images from a subset of the [LAION-5B](https://laion.ai/blog/laion-5b/) database. This model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 4GB VRAM.
 See the [model card](https://huggingface.co/CompVis/stable-diffusion) for more information.
 
-You need to accept the model license before downloading or using the Stable Diffusion weights. Please, visit the [model card](https://huggingface.co/CompVis/stable-diffusion-v1-4), read the license and tick the checkbox if you agree. You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section](https://huggingface.co/docs/hub/security-tokens) of the documentation.
+You need to accept the model license before downloading or using the Stable Diffusion weights. Please, visit the [model card](https://huggingface.co/runwayml/stable-diffusion-v1-5), read the license carefully and tick the checkbox if you agree. You have to be a registered user in 🤗 Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section](https://huggingface.co/docs/hub/security-tokens) of the documentation.
 
 
 ### Text-to-Image generation with Stable Diffusion
 
+First let's install
+```bash
+pip install --upgrade diffusers transformers scipy
+```
+
+Run this command to log in with your HF Hub token if you haven't before (you can skip this step if you prefer to run the model locally, follow [this](#running-the-model-locally) instead)
+```bash
+huggingface-cli login
+```
+
 We recommend using the model in [half-precision (`fp16`)](https://pytorch.org/blog/accelerating-training-on-nvidia-gpus-with-pytorch-automatic-mixed-precision/) as it gives almost always the same results as full
 precision while being roughly twice as fast and requiring half the amount of GPU RAM.
 
 ```python
-# make sure you're logged in with `huggingface-cli login`
 from diffusers import StableDiffusionPipeline
 
-pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_type=torch.float16, revision="fp16")
+pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, revision="fp16")
 pipe = pipe.to("cuda")
 
 prompt = "a photo of an astronaut riding a horse on mars"
 image = pipe(prompt).images[0]  
 ```
 
-**Note**: If you don't want to use the token, you can also simply download the model weights
-(after having [accepted the license](https://huggingface.co/CompVis/stable-diffusion-v1-4)) and pass
+#### Running the model locally
+If you don't want to login to Hugging Face, you can also simply download the model folder
+(after having [accepted the license](https://huggingface.co/runwayml/stable-diffusion-v1-5)) and pass
 the path to the local folder to the `StableDiffusionPipeline`.
 
 ```
 git lfs install
-git clone https://huggingface.co/CompVis/stable-diffusion-v1-4
+git clone https://huggingface.co/runwayml/stable-diffusion-v1-5
 ```
 
-Assuming the folder is stored locally under `./stable-diffusion-v1-4`, you can also run stable diffusion
+Assuming the folder is stored locally under `./stable-diffusion-v1-5`, you can also run stable diffusion
 without requiring an authentication token:
 
 ```python
-pipe = StableDiffusionPipeline.from_pretrained("./stable-diffusion-v1-4")
+pipe = StableDiffusionPipeline.from_pretrained("./stable-diffusion-v1-5")
 pipe = pipe.to("cuda")
 
 prompt = "a photo of an astronaut riding a horse on mars"
@@ -114,7 +124,7 @@ The following snippet should result in less than 4GB VRAM.
 
 ```python
 pipe = StableDiffusionPipeline.from_pretrained(
-    "CompVis/stable-diffusion-v1-4", 
+    "runwayml/stable-diffusion-v1-5", 
     revision="fp16", 
     torch_dtype=torch.float16,
 )
@@ -125,7 +135,7 @@ pipe.enable_attention_slicing()
 image = pipe(prompt).images[0]  
 ```
 
-If you wish to use a different scheduler, you can simply instantiate
+If you wish to use a different scheduler (e.g.: DDIM, LMS, PNDM/PLMS), you can instantiate
 it before the pipeline and pass it to `from_pretrained`.
 
 ```python
@@ -138,7 +148,7 @@ lms = LMSDiscreteScheduler(
 )
 
 pipe = StableDiffusionPipeline.from_pretrained(
-    "CompVis/stable-diffusion-v1-4", 
+    "runwayml/stable-diffusion-v1-5", 
     revision="fp16", 
     torch_dtype=torch.float16,
     scheduler=lms,
@@ -158,7 +168,7 @@ please run the model in the default *full-precision* setting:
 # make sure you're logged in with `huggingface-cli login`
 from diffusers import StableDiffusionPipeline
 
-pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
+pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
 
 # disable the following line if you run on CPU
 pipe = pipe.to("cuda")
@@ -169,6 +179,75 @@ image = pipe(prompt).images[0]
 image.save("astronaut_rides_horse.png")
 ```
 
+### JAX/Flax
+
+To use StableDiffusion on TPUs and GPUs for faster inference you can leverage JAX/Flax.
+
+Running the pipeline with default PNDMScheduler
+
+```python
+import jax
+import numpy as np
+from flax.jax_utils import replicate
+from flax.training.common_utils import shard
+
+from diffusers import FlaxStableDiffusionPipeline
+
+pipeline, params = FlaxStableDiffusionPipeline.from_pretrained(
+    "runwayml/stable-diffusion-v1-5", revision="flax", dtype=jax.numpy.bfloat16
+)
+
+prompt = "a photo of an astronaut riding a horse on mars"
+
+prng_seed = jax.random.PRNGKey(0)
+num_inference_steps = 50
+
+num_samples = jax.device_count()
+prompt = num_samples * [prompt]
+prompt_ids = pipeline.prepare_inputs(prompt)
+
+# shard inputs and rng
+params = replicate(params)
+prng_seed = jax.random.split(prng_seed, jax.device_count())
+prompt_ids = shard(prompt_ids)
+
+images = pipeline(prompt_ids, params, prng_seed, num_inference_steps, jit=True).images
+images = pipeline.numpy_to_pil(np.asarray(images.reshape((num_samples,) + images.shape[-3:])))
+```
+
+**Note**:
+If you are limited by TPU memory, please make sure to load the `FlaxStableDiffusionPipeline` in `bfloat16` precision instead of the default `float32` precision as done above. You can do so by telling diffusers to load the weights from "bf16" branch.
+
+```python
+import jax
+import numpy as np
+from flax.jax_utils import replicate
+from flax.training.common_utils import shard
+
+from diffusers import FlaxStableDiffusionPipeline
+
+pipeline, params = FlaxStableDiffusionPipeline.from_pretrained(
+    "runwayml/stable-diffusion-v1-5", revision="bf16", dtype=jax.numpy.bfloat16
+)
+
+prompt = "a photo of an astronaut riding a horse on mars"
+
+prng_seed = jax.random.PRNGKey(0)
+num_inference_steps = 50
+
+num_samples = jax.device_count()
+prompt = num_samples * [prompt]
+prompt_ids = pipeline.prepare_inputs(prompt)
+
+# shard inputs and rng
+params = replicate(params)
+prng_seed = jax.random.split(prng_seed, jax.device_count())
+prompt_ids = shard(prompt_ids)
+
+images = pipeline(prompt_ids, params, prng_seed, num_inference_steps, jit=True).images
+images = pipeline.numpy_to_pil(np.asarray(images.reshape((num_samples,) + images.shape[-3:])))
+```
+
 ### Image-to-Image text-guided generation with Stable Diffusion
 
 The `StableDiffusionImg2ImgPipeline` lets you pass a text prompt and an initial image to condition the generation of new images.
@@ -183,14 +262,14 @@ from diffusers import StableDiffusionImg2ImgPipeline
 
 # load the pipeline
 device = "cuda"
-model_id_or_path = "CompVis/stable-diffusion-v1-4"
+model_id_or_path = "runwayml/stable-diffusion-v1-5"
 pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
     model_id_or_path,
     revision="fp16", 
     torch_dtype=torch.float16,
 )
-# or download via git clone https://huggingface.co/CompVis/stable-diffusion-v1-4
-# and pass `model_id_or_path="./stable-diffusion-v1-4"`.
+# or download via git clone https://huggingface.co/runwayml/stable-diffusion-v1-5
+# and pass `model_id_or_path="./stable-diffusion-v1-5"`.
 pipe = pipe.to(device)
 
 # let's download an initial image
 
@@ -12,9 +12,9 @@
       title: "Loading Pipelines, Models, and Schedulers"
     - local: using-diffusers/configuration
       title: "Configuring Pipelines, Models, and Schedulers"
-    - local: using-diffusers/custom_pipelines
-      title: "Loading and Creating Custom Pipelines"
-    title: "Loading"
+    - local: using-diffusers/custom_pipeline_overview
+      title: "Loading and Adding Custom Pipelines"
+    title: "Loading & Hub"
   - sections:
     - local: using-diffusers/unconditional_image_generation
       title: "Unconditional Image Generation"
@@ -24,8 +24,10 @@
       title: "Text-Guided Image-to-Image"
     - local: using-diffusers/inpaint
       title: "Text-Guided Image-Inpainting"
-    - local: using-diffusers/custom
-      title: "Create a custom pipeline"
+    - local: using-diffusers/custom_pipeline_examples
+      title: "Community Pipelines"
+    - local: using-diffusers/contribute_pipeline
+      title: "How to contribute a Pipeline"
     title: "Pipelines for Inference"
   title: "Using Diffusers"
 - sections:
@@ -34,7 +36,7 @@
   - local: optimization/onnx
     title: "ONNX"
   - local: optimization/open_vino
-    title: "Open Vino"
+    title: "OpenVINO"
   - local: optimization/mps
     title: "MPS"
   title: "Optimization/Special Hardware"
@@ -90,5 +92,7 @@
       title: "Stable Diffusion"
     - local: api/pipelines/stochastic_karras_ve
       title: "Stochastic Karras VE"
+    - local: api/pipelines/dance_diffusion
+      title: "Dance Diffusion"
     title: "Pipelines"
   title: "API"
@@ -22,6 +22,9 @@ The models are built on the base class ['ModelMixin'] that is a `torch.nn.module
 ## UNet2DOutput
 [[autodoc]] models.unet_2d.UNet2DOutput
 
+## UNet1DModel
+[[autodoc]] UNet1DModel
+
 ## UNet2DModel
 [[autodoc]] UNet2DModel
 
 
@@ -0,0 +1,33 @@
+<!--Copyright 2022 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# Dance Diffusion
+
+## Overview
+
+[Dance Diffusion](https://github.com/Harmonai-org/sample-generator) by Zach Evans.
+
+Dance Diffusion is the first in a suite of generative audio tools for producers and musicians to be released by Harmonai.
+For more info or to get involved in the development of these tools, please visit https://harmonai.org and fill out the form on the front page.
+
+The original codebase of this implementation can be found [here](https://github.com/Harmonai-org/sample-generator).
+
+## Available Pipelines:
+
+| Pipeline | Tasks | Colab
+|---|---|:---:|
+| [pipeline_dance_diffusion.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/dance_diffusion/pipeline_dance_diffusion.py) | *Unconditional Audio Generation* | - |
+
+
+## DanceDiffusionPipeline
+[[autodoc]] DanceDiffusionPipeline
+    - __call__
@@ -1,3 +1,15 @@
+<!--Copyright 2022 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
 # DDIM
 
 ## Overview
 
@@ -1,3 +1,15 @@
+<!--Copyright 2022 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
 # DDPM
 
 ## Overview
 
@@ -1,3 +1,15 @@
+<!--Copyright 2022 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
 # Latent Diffusion
 
 ## Overview
 
@@ -1,3 +1,15 @@
+<!--Copyright 2022 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
 # Unconditional Latent Diffusion
 
 ## Overview
 
@@ -67,8 +67,8 @@ Diffusion models often consist of multiple independently-trained models or other
 Each model has been trained independently on a different task and the scheduler can easily be swapped out and replaced with a different one. 
 During inference, we however want to be able to easily load all components and use them in inference - even if one component, *e.g.* CLIP's text encoder, originates from a different library, such as [Transformers](https://github.com/huggingface/transformers). To that end, all pipelines provide the following functionality:
 
-- [`from_pretrained` method](../diffusion_pipeline) that accepts a Hugging Face Hub repository id, *e.g.* [CompVis/stable-diffusion-v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4) or a path to a local directory, *e.g.*
-"./stable-diffusion". To correctly retrieve which models and components should be loaded, one has to provide a `model_index.json` file, *e.g.* [CompVis/stable-diffusion-v1-4/model_index.json](https://huggingface.co/CompVis/stable-diffusion-v1-4/blob/main/model_index.json), which defines all components that should be 
+- [`from_pretrained` method](../diffusion_pipeline) that accepts a Hugging Face Hub repository id, *e.g.* [runwayml/stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5) or a path to a local directory, *e.g.*
+"./stable-diffusion". To correctly retrieve which models and components should be loaded, one has to provide a `model_index.json` file, *e.g.* [runwayml/stable-diffusion-v1-5/model_index.json](https://huggingface.co/runwayml/stable-diffusion-v1-5/blob/main/model_index.json), which defines all components that should be
 loaded into the pipelines. More specifically, for each model/component one needs to define the format `<name>: ["<library>", "<class name>"]`. `<name>` is the attribute name given to the loaded instance of `<class name>` which can be found in the library or pipeline folder called `"<library>"`.
 - [`save_pretrained`](../diffusion_pipeline) that accepts a local path, *e.g.* `./stable-diffusion` under which all models/components of the pipeline will be saved. For each component/model a folder is created inside the local path that is named after the given attribute name, *e.g.* `./stable_diffusion/unet`. 
 In addition, a `model_index.json` file is created at the root of the local path, *e.g.* `./stable_diffusion/model_index.json` so that the complete pipeline can again be instantiated 
@@ -100,7 +100,7 @@ logic including pre-processing, an unrolled diffusion loop, and post-processing
 # make sure you're logged in with `huggingface-cli login`
 from diffusers import StableDiffusionPipeline, LMSDiscreteScheduler
 
-pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
+pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
 pipe = pipe.to("cuda")
 
 prompt = "a photo of an astronaut riding a horse on mars"
@@ -123,7 +123,7 @@ from diffusers import StableDiffusionImg2ImgPipeline
 # load the pipeline
 device = "cuda"
 pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
-    "CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16
+    "runwayml/stable-diffusion-v1-5", revision="fp16", torch_dtype=torch.float16
 ).to(device)
 
 # let's download an initial image