Skip to content

Commit 717e453

Browse files
committed
update more sections
1 parent b741e3e commit 717e453

File tree

7 files changed

+129
-193
lines changed

7 files changed

+129
-193
lines changed

docs/source/en/optimization/fp16.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,9 @@ In many cases, optimizing for speed or memory leads to improved performance in t
2020

2121
</Tip>
2222

23-
The results below are obtained from generating a single 512x512 image from the prompt `a photo of an astronaut riding a horse on mars` with 50 DDIM steps on a Nvidia Titan RTX, demonstrating the speed up you can expect.
23+
The results below are obtained from generating a single 512x512 image from the prompt `a photo of an astronaut riding a horse on mars` with 50 DDIM steps on a Nvidia Titan RTX, demonstrating the speed-up you can expect.
2424

25-
| | Latency | Speedup |
25+
| | latency | speed-up |
2626
| ---------------- | ------- | ------- |
2727
| original | 9.50s | x1 |
2828
| fp16 | 3.61s | x2.63 |
@@ -32,7 +32,7 @@ The results below are obtained from generating a single 512x512 image from the p
3232

3333
## Use TensorFloat-32
3434

35-
On Ampere and later CUDA devices, matrix multiplications and convolutions can use the [TensorFloat-32 (TF32)](https://blogs.nvidia.com/blog/2020/05/14/tensorfloat-32-precision-format/) mode for faster, but slightly less accurate computations. By default, PyTorch enables TF32 mode for convolutions but not matrix multiplications. Unless your network requires full float32 precision, we recommend enabling TF32 for matrix multiplications. It can significantly speed up computations with typically negligible loss in numerical accuracy.
35+
On Ampere and later CUDA devices, matrix multiplications and convolutions can use the [TensorFloat-32 (TF32)](https://blogs.nvidia.com/blog/2020/05/14/tensorfloat-32-precision-format/) mode for faster, but slightly less accurate computations. By default, PyTorch enables TF32 mode for convolutions but not matrix multiplications. Unless your network requires full float32 precision, we recommend enabling TF32 for matrix multiplications. It can significantly speeds up computations with typically negligible loss in numerical accuracy.
3636

3737
```python
3838
import torch

docs/source/en/optimization/habana.md

Lines changed: 19 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -10,25 +10,22 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
1010
specific language governing permissions and limitations under the License.
1111
-->
1212

13-
# How to use Stable Diffusion on Habana Gaudi
13+
# Habana Gaudi
1414

15-
🤗 Diffusers is compatible with Habana Gaudi through 🤗 [Optimum Habana](https://huggingface.co/docs/optimum/habana/usage_guides/stable_diffusion).
15+
🤗 Diffusers is compatible with Habana Gaudi through 🤗 [Optimum](https://huggingface.co/docs/optimum/habana/usage_guides/stable_diffusion). Follow the [installation](https://docs.habana.ai/en/latest/Installation_Guide/index.html) guide to install the SynapseAI and Gaudi drivers, and then install Optimum Habana:
1616

17-
## Requirements
17+
```bash
18+
python -m pip install --upgrade-strategy eager optimum[habana]
19+
```
1820

19-
- Optimum Habana 1.6 or later, [here](https://huggingface.co/docs/optimum/habana/installation) is how to install it.
20-
- SynapseAI 1.10.
21+
To generate images with Stable Diffusion 1 and 2 on Gaudi, you need to instantiate two instances:
2122

23+
- [`~optimum.habana.diffusers.GaudiStableDiffusionPipeline`], a pipeline for text-to-image generation.
24+
- [`~optimum.habana.diffusers.GaudiDDIMScheduler`], a Gaudi-optimized scheduler.
2225

23-
## Inference Pipeline
26+
When you initialize the pipeline, you have to specify `use_habana=True` to deploy it on HPUs and to get the fastest possible generation, you should enable **HPU graphs** with `use_hpu_graphs=True`.
2427

25-
To generate images with Stable Diffusion 1 and 2 on Gaudi, you need to instantiate two instances:
26-
- A pipeline with [`GaudiStableDiffusionPipeline`](https://huggingface.co/docs/optimum/habana/package_reference/stable_diffusion_pipeline). This pipeline supports *text-to-image generation*.
27-
- A scheduler with [`GaudiDDIMScheduler`](https://huggingface.co/docs/optimum/habana/package_reference/stable_diffusion_pipeline#optimum.habana.diffusers.GaudiDDIMScheduler). This scheduler has been optimized for Habana Gaudi.
28-
29-
When initializing the pipeline, you have to specify `use_habana=True` to deploy it on HPUs.
30-
Furthermore, in order to get the fastest possible generations you should enable **HPU graphs** with `use_hpu_graphs=True`.
31-
Finally, you will need to specify a [Gaudi configuration](https://huggingface.co/docs/optimum/habana/package_reference/gaudi_config) which can be downloaded from the [Hugging Face Hub](https://huggingface.co/Habana).
28+
Finally, specify a [`~optimum.habana.GaudiConfig`] which can be downloaded from the [Habana](https://huggingface.co/Habana) organization on the Hub.
3229

3330
```python
3431
from optimum.habana import GaudiConfig
@@ -45,7 +42,8 @@ pipeline = GaudiStableDiffusionPipeline.from_pretrained(
4542
)
4643
```
4744

48-
You can then call the pipeline to generate images by batches from one or several prompts:
45+
Now you can call the pipeline to generate images by batches from one or several prompts:
46+
4947
```python
5048
outputs = pipeline(
5149
prompt=[
@@ -57,21 +55,21 @@ outputs = pipeline(
5755
)
5856
```
5957

60-
For more information, check out Optimum Habana's [documentation](https://huggingface.co/docs/optimum/habana/usage_guides/stable_diffusion) and the [example](https://github.com/huggingface/optimum-habana/tree/main/examples/stable-diffusion) provided in the official Github repository.
58+
For more information, check out 🤗 Optimum Habana's [documentation](https://huggingface.co/docs/optimum/habana/usage_guides/stable_diffusion) and the [example](https://github.com/huggingface/optimum-habana/tree/main/examples/stable-diffusion) provided in the official Github repository.
6159

6260

6361
## Benchmark
6462

65-
Here are the latencies for Habana first-generation Gaudi and Gaudi2 with the [Habana/stable-diffusion](https://huggingface.co/Habana/stable-diffusion) and [Habana/stable-diffusion-2](https://huggingface.co/Habana/stable-diffusion-2) Gaudi configurations (mixed precision bf16/fp32):
63+
We benchmarked Habana's first-generation Gaudi and Gaudi2 with the [Habana/stable-diffusion](https://huggingface.co/Habana/stable-diffusion) and [Habana/stable-diffusion-2](https://huggingface.co/Habana/stable-diffusion-2) Gaudi configurations (mixed precision bf16/fp32) to demonstrate their performance.
6664

67-
- [Stable Diffusion v1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5) (512x512 resolution):
65+
For [Stable Diffusion v1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5) on 512x512 images:
6866

69-
| | Latency (batch size = 1) | Throughput (batch size = 8) |
67+
| | Latency (batch size = 1) | Throughput |
7068
| ---------------------- |:------------------------:|:---------------------------:|
71-
| first-generation Gaudi | 3.80s | 0.308 images/s |
72-
| Gaudi2 | 1.33s | 1.081 images/s |
69+
| first-generation Gaudi | 3.80s | 0.308 images/s (batch size = 8) |
70+
| Gaudi2 | 1.33s | 1.081 images/s (batch size = 8) |
7371

74-
- [Stable Diffusion v2.1](https://huggingface.co/stabilityai/stable-diffusion-2-1) (768x768 resolution):
72+
For [Stable Diffusion v2.1](https://huggingface.co/stabilityai/stable-diffusion-2-1) on 768x768 images:
7573

7674
| | Latency (batch size = 1) | Throughput |
7775
| ---------------------- |:------------------------:|:-------------------------------:|

docs/source/en/optimization/memory.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -333,14 +333,14 @@ The table below details the speed-ups from a few different Nvidia GPUs when runn
333333
| A100-SXM4-40GB | 18.6it/s | 29.it/s |
334334
| A100-SXM-80GB | 18.7it/s | 29.5it/s |
335335

336-
To use Flash Attention, install the following:
337-
338336
<Tip warning={true}>
339337

340338
If you have PyTorch 2.0 installed, you shouldn't use xFormers!
341339

342340
</Tip>
343341

342+
To use Flash Attention, install the following:
343+
344344
- PyTorch > 1.12
345345
- CUDA available
346346
- [xFormers](xformers)

docs/source/en/optimization/mps.md

Lines changed: 34 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -10,29 +10,16 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
1010
specific language governing permissions and limitations under the License.
1111
-->
1212

13-
# How to use Stable Diffusion in Apple Silicon (M1/M2)
13+
# Metal Performance Shaders (MPS)
1414

15-
🤗 Diffusers is compatible with Apple silicon for Stable Diffusion inference, using the PyTorch `mps` device. These are the steps you need to follow to use your M1 or M2 computer with Stable Diffusion.
15+
🤗 Diffusers is compatible with Apple silicon (M1/M2 chips) using the PyTorch [`mps`](https://pytorch.org/docs/stable/notes/mps.html) device, which uses the Metal framework to leverage the GPU on MacOS devices. You'll need to have:
1616

17-
## Requirements
17+
- macOS computer with Apple silicon (M1/M2) hardware
18+
- macOS 12.6 or later (13.0 or later recommended)
19+
- arm64 version of Python
20+
- [PyTorch 2.0](https://pytorch.org/get-started/locally/) (recommended) or 1.13 (minimum version supported for `mps`)
1821

19-
- Mac computer with Apple silicon (M1/M2) hardware.
20-
- macOS 12.6 or later (13.0 or later recommended).
21-
- arm64 version of Python.
22-
- PyTorch 2.0 (recommended) or 1.13 (minimum version supported for `mps`). You can install it with `pip` or `conda` using the instructions in https://pytorch.org/get-started/locally/.
23-
24-
25-
## Inference Pipeline
26-
27-
The snippet below demonstrates how to use the `mps` backend using the familiar `to()` interface to move the Stable Diffusion pipeline to your M1 or M2 device.
28-
29-
<Tip warning={true}>
30-
31-
**If you are using PyTorch 1.13** you need to "prime" the pipeline using an additional one-time pass through it. This is a temporary workaround for a weird issue we detected: the first inference pass produces slightly different results than subsequent ones. You only need to do this pass once, and it's ok to use just one inference step and discard the result.
32-
33-
</Tip>
34-
35-
We strongly recommend you use PyTorch 2 or better, as it solves a number of problems like the one described in the previous tip.
22+
The `mps` backend uses PyTorch's `.to()` interface to move the Stable Diffusion pipeline on to your M1 or M2 device:
3623

3724
```python
3825
from diffusers import DiffusionPipeline
@@ -44,24 +31,41 @@ pipe = pipe.to("mps")
4431
pipe.enable_attention_slicing()
4532

4633
prompt = "a photo of an astronaut riding a horse on mars"
34+
```
35+
36+
<Tip warning={true}>
37+
38+
Generating multiple prompts in a batch can [crash](https://github.com/huggingface/diffusers/issues/363) or fail to work reliably. We believe this is related to the [`mps`](https://github.com/pytorch/pytorch/issues/84039) backend in PyTorch. While this is being investigated, you should iterate instead of batching.
39+
40+
</Tip>
41+
42+
If you're using **PyTorch 1.13**, you need to "prime" the pipeline with an additional one-time pass through it. This is a temporary workaround for an issue where the first inference pass produces slightly different results than subsequent ones. You only need to do this pass once, and after just one inference step you can discard the result.
43+
44+
```diff
45+
from diffusers import DiffusionPipeline
4746

48-
# First-time "warmup" pass if PyTorch version is 1.13 (see explanation above)
49-
_ = pipe(prompt, num_inference_steps=1)
47+
pipe = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5").to("mps")
48+
pipe.enable_attention_slicing()
49+
50+
prompt = "a photo of an astronaut riding a horse on mars"
51+
# First-time "warmup" pass if PyTorch version is 1.13
52+
+ _ = pipe(prompt, num_inference_steps=1)
5053

5154
# Results match those from the CPU device after the warmup pass.
52-
image = pipe(prompt).images[0]
55+
image = pipe(prompt).images[0]
5356
```
5457

55-
## Performance Recommendations
58+
## Recommendation
5659

57-
M1/M2 performance is very sensitive to memory pressure. The system will automatically swap if it needs to, but performance will degrade significantly when it does.
60+
M1/M2 performance is very sensitive to memory pressure. When this occurs, the system automatically swaps if it needs to which significantly degrades performance.
5861

59-
We recommend you use _attention slicing_ to reduce memory pressure during inference and prevent swapping, particularly if your computer has less than 64 GB of system RAM, or if you generate images at non-standard resolutions larger than 512 × 512 pixels. Attention slicing performs the costly attention operation in multiple steps instead of all at once. It usually has a performance impact of ~20% in computers without universal memory, but we have observed _better performance_ in most Apple Silicon computers, unless you have 64 GB or more.
62+
To prevent this from happening, we recommend *attention slicing* to reduce memory pressure during inference and prevent swapping. This is especially relevant if your computer has less than 64GB of system RAM, or if you generate images at non-standard resolutions larger than 512×512 pixels. Call the [`~DiffusionPipeline.enable_attention_slicing`] function on your pipeline:
6063

61-
```python
64+
```py
65+
from diffusers import DiffusionPipeline
66+
67+
pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True).to("mps")
6268
pipeline.enable_attention_slicing()
6369
```
6470

65-
## Known Issues
66-
67-
- Generating multiple prompts in a batch [crashes or doesn't work reliably](https://github.com/huggingface/diffusers/issues/363). We believe this is related to the [`mps` backend in PyTorch](https://github.com/pytorch/pytorch/issues/84039). This is being resolved, but for now we recommend to iterate instead of batching.
71+
Attention slicing performs the costly attention operation in multiple steps instead of all at once. It usually improves performance by ~20% in computers without universal memory, but we've observed *better performance* in most Apple silicon computers unless you have 64GB of RAM or more.

docs/source/en/optimization/onnx.md

Lines changed: 22 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -11,23 +11,19 @@ specific language governing permissions and limitations under the License.
1111
-->
1212

1313

14-
# How to use ONNX Runtime for inference
14+
# ONNX Runtime
1515

16-
🤗 [Optimum](https://github.com/huggingface/optimum) provides a Stable Diffusion pipeline compatible with ONNX Runtime.
16+
🤗 [Optimum](https://github.com/huggingface/optimum) provides a Stable Diffusion pipeline compatible with ONNX Runtime. You'll need to install 🤗 Optimum with the following command for ONNX Runtime support:
1717

18-
## Installation
19-
20-
Install 🤗 Optimum with the following command for ONNX Runtime support:
21-
22-
```
18+
```bash
2319
pip install optimum["onnxruntime"]
2420
```
2521

26-
## Stable Diffusion
22+
This guide will show you how to use the Stable Diffusion and Stable Diffusion XL (SDXL) pipelines with ONNX Runtime.
2723

28-
### Inference
24+
## Stable Diffusion
2925

30-
To load an ONNX model and run inference with ONNX Runtime, you need to replace [`StableDiffusionPipeline`] with `ORTStableDiffusionPipeline`. In case you want to load a PyTorch model and convert it to the ONNX format on-the-fly, you can set `export=True`.
26+
To load and run inference, use the [`~optimum.onnxruntime.ORTStableDiffusionPipeline`]. If you want to load a PyTorch model and convert it to the ONNX format on-the-fly, set `export=True`:
3127

3228
```python
3329
from optimum.onnxruntime import ORTStableDiffusionPipeline
@@ -39,14 +35,20 @@ image = pipeline(prompt).images[0]
3935
pipeline.save_pretrained("./onnx-stable-diffusion-v1-5")
4036
```
4137

42-
If you want to export the pipeline in the ONNX format offline and later use it for inference,
43-
you can use the [`optimum-cli export`](https://huggingface.co/docs/optimum/main/en/exporters/onnx/usage_guides/export_a_model#exporting-a-model-to-onnx-using-the-cli) command:
38+
<Tip warning={true}>
39+
40+
Generating multiple prompts in a batch seems to take too much memory. While we look into it, you may need to iterate instead of batching.
41+
42+
</Tip>
43+
44+
To export the pipeline in the ONNX format offline and use it later for inference,
45+
use the [`optimum-cli export`](https://huggingface.co/docs/optimum/main/en/exporters/onnx/usage_guides/export_a_model#exporting-a-model-to-onnx-using-the-cli) command:
4446

4547
```bash
4648
optimum-cli export onnx --model runwayml/stable-diffusion-v1-5 sd_v15_onnx/
4749
```
4850

49-
Then perform inference:
51+
Then to perform inference (you don't have to specify `export=True` again):
5052

5153
```python
5254
from optimum.onnxruntime import ORTStableDiffusionPipeline
@@ -57,36 +59,15 @@ prompt = "sailing ship in storm by Leonardo da Vinci"
5759
image = pipeline(prompt).images[0]
5860
```
5961

60-
Notice that we didn't have to specify `export=True` above.
61-
6262
<div class="flex justify-center">
6363
<img src="https://huggingface.co/datasets/optimum/documentation-images/resolve/main/onnxruntime/stable_diffusion_v1_5_ort_sail_boat.png">
6464
</div>
6565

66-
You can find more examples in [optimum documentation](https://huggingface.co/docs/optimum/).
67-
68-
69-
### Supported tasks
70-
71-
| Task | Loading Class |
72-
|--------------------------------------|--------------------------------------|
73-
| `text-to-image` | `ORTStableDiffusionPipeline` |
74-
| `image-to-image` | `ORTStableDiffusionImg2ImgPipeline` |
75-
| `inpaint` | `ORTStableDiffusionInpaintPipeline` |
66+
You can find more examples in 🤗 Optimum [documentation](https://huggingface.co/docs/optimum/), and Stable Diffusion is supported for text-to-image, image-to-image, and inpainting.
7667

7768
## Stable Diffusion XL
7869

79-
### Export
80-
81-
To export your model to ONNX, you can use the [Optimum CLI](https://huggingface.co/docs/optimum/main/en/exporters/onnx/usage_guides/export_a_model#exporting-a-model-to-onnx-using-the-cli) as follows :
82-
83-
```bash
84-
optimum-cli export onnx --model stabilityai/stable-diffusion-xl-base-1.0 --task stable-diffusion-xl sd_xl_onnx/
85-
```
86-
87-
### Inference
88-
89-
Here is an example of how you can load a SDXL ONNX model from [stabilityai/stable-diffusion-xl-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) and run inference with ONNX Runtime :
70+
To load and run inference with SDXL, use the [`~optimum.onnxruntime.ORTStableDiffusionXLPipeline`]:
9071

9172
```python
9273
from optimum.onnxruntime import ORTStableDiffusionXLPipeline
@@ -97,13 +78,10 @@ prompt = "sailing ship in storm by Leonardo da Vinci"
9778
image = pipeline(prompt).images[0]
9879
```
9980

100-
### Supported tasks
101-
102-
| Task | Loading Class |
103-
|--------------------------------------|--------------------------------------|
104-
| `text-to-image` | `ORTStableDiffusionXLPipeline` |
105-
| `image-to-image` | `ORTStableDiffusionXLImg2ImgPipeline`|
81+
To export the pipeline in the ONNX format and use it later for inference, use the [`optimum-cli export`](https://huggingface.co/docs/optimum/main/en/exporters/onnx/usage_guides/export_a_model#exporting-a-model-to-onnx-using-the-cli) command:
10682

107-
## Known Issues
83+
```bash
84+
optimum-cli export onnx --model stabilityai/stable-diffusion-xl-base-1.0 --task stable-diffusion-xl sd_xl_onnx/
85+
```
10886

109-
- Generating multiple prompts in a batch seems to take too much memory. While we look into it, you may need to iterate instead of batching.
87+
SDXL is supported for text-to-image and image-to-image.

0 commit comments

Comments
 (0)