Skip to content

Commit 48d0123

Browse files
add AudioDiffusionPipeline and LatentAudioDiffusionPipeline #1334 (#1426)
* add AudioDiffusionPipeline and LatentAudioDiffusionPipeline * add docs to toc * fix tests * fix tests * fix tests * fix tests * fix tests * Update pr_tests.yml Fix tests * parent 499ff34 author teticio <[email protected]> 1668765652 +0000 committer teticio <[email protected]> 1669041721 +0000 parent 499ff34 author teticio <[email protected]> 1668765652 +0000 committer teticio <[email protected]> 1669041704 +0000 add colab notebook [Flax] Fix loading scheduler from subfolder (#1319) [FLAX] Fix loading scheduler from subfolder Fix/Enable all schedulers for in-painting (#1331) * inpaint fix k lms * onnox as well * up Correct path to schedlure (#1322) * [Examples] Correct path * uP Avoid nested fix-copies (#1332) * Avoid nested `# Copied from` statements during `make fix-copies` * style Fix img2img speed with LMS-Discrete Scheduler (#896) Casting `self.sigmas` into a different dtype (the one of original_samples) is not advisable. In my img2img pipeline this leads to a long running time in the `integrate.quad` call later on- by long I mean more than 10x slower. Co-authored-by: Anton Lozhkov <[email protected]> Fix the order of casts for onnx inpainting (#1338) Legacy Inpainting Pipeline for Onnx Models (#1237) * Add legacy inpainting pipeline compatibility for onnx * remove commented out line * Add onnx legacy inpainting test * Fix slow decorators * pep8 styling * isort styling * dummy object * ordering consistency * style * docstring styles * Refactor common prompt encoding pattern * Update tests to permanent repository home * support all available schedulers until ONNX IO binding is available Co-authored-by: Anton Lozhkov <[email protected]> * updated styling from PR suggested feedback Co-authored-by: Anton Lozhkov <[email protected]> Jax infer support negative prompt (#1337) * support negative prompts in sd jax pipeline * pass batched neg_prompt * only encode when negative prompt is None Co-authored-by: Juan Acevedo <[email protected]> Update README.md: Minor change to Imagic code snippet, missing dir error (#1347) Minor change to Imagic Readme Missing dir causes an error when running the example code. make style change the sample model (#1352) * Update alt_diffusion.mdx * Update alt_diffusion.mdx Add bit diffusion [WIP] (#971) * Create bit_diffusion.py Bit diffusion based on the paper, arXiv:2208.04202, Chen2022AnalogBG * adding bit diffusion to new branch ran tests * tests * tests * tests * tests * removed test folders + added to README * Update README.md Co-authored-by: Patrick von Platen <[email protected]> * move Mel to module in pipeline construction, make librosa optional * fix imports * fix copy & paste error in comment * fix style * add missing register_to_config * fix class docstrings * fix class docstrings * tweak docstrings * tweak docstrings * update slow test * put trailing commas back * respect alphabetical order * remove LatentAudioDiffusion, make vqvae optional * move Mel from models back to pipelines :-) * allow loading of pretrained audiodiffusion models * fix tests * fix dummies * remove reference to latent_audio_diffusion in docs * unused import * inherit from SchedulerMixin to make loadable * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: Patrick von Platen <[email protected]>
1 parent 459b8ca commit 48d0123

File tree

25 files changed

+781
-5
lines changed

25 files changed

+781
-5
lines changed

.github/workflows/pr_tests.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@ jobs:
5757

5858
- name: Install dependencies
5959
run: |
60+
apt-get update && apt-get install libsndfile1-dev -y
6061
python -m pip install -e .[quality,test]
6162
python -m pip install git+https://github.com/huggingface/accelerate
6263
python -m pip install -U git+https://github.com/huggingface/transformers

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -165,4 +165,4 @@ tags
165165
# DS_Store (MacOS)
166166
.DS_Store
167167
# RL pipelines may produce mp4 outputs
168-
*.mp4
168+
*.mp4

docker/diffusers-flax-cpu/Dockerfile

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ RUN apt update && \
1111
git-lfs \
1212
curl \
1313
ca-certificates \
14+
libsndfile1-dev \
1415
python3.8 \
1516
python3-pip \
1617
python3.8-venv && \
@@ -33,6 +34,7 @@ RUN python3 -m pip install --no-cache-dir --upgrade pip && \
3334
datasets \
3435
hf-doc-builder \
3536
huggingface-hub \
37+
librosa \
3638
modelcards \
3739
numpy \
3840
scipy \

docker/diffusers-flax-tpu/Dockerfile

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ RUN apt update && \
1111
git-lfs \
1212
curl \
1313
ca-certificates \
14+
libsndfile1-dev \
1415
python3.8 \
1516
python3-pip \
1617
python3.8-venv && \
@@ -35,6 +36,7 @@ RUN python3 -m pip install --no-cache-dir --upgrade pip && \
3536
datasets \
3637
hf-doc-builder \
3738
huggingface-hub \
39+
librosa \
3840
modelcards \
3941
numpy \
4042
scipy \

docker/diffusers-onnxruntime-cpu/Dockerfile

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ RUN apt update && \
1111
git-lfs \
1212
curl \
1313
ca-certificates \
14+
libsndfile1-dev \
1415
python3.8 \
1516
python3-pip \
1617
python3.8-venv && \
@@ -33,6 +34,7 @@ RUN python3 -m pip install --no-cache-dir --upgrade pip && \
3334
datasets \
3435
hf-doc-builder \
3536
huggingface-hub \
37+
librosa \
3638
modelcards \
3739
numpy \
3840
scipy \

docker/diffusers-onnxruntime-cuda/Dockerfile

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ RUN apt update && \
1111
git-lfs \
1212
curl \
1313
ca-certificates \
14+
libsndfile1-dev \
1415
python3.8 \
1516
python3-pip \
1617
python3.8-venv && \
@@ -33,6 +34,7 @@ RUN python3 -m pip install --no-cache-dir --upgrade pip && \
3334
datasets \
3435
hf-doc-builder \
3536
huggingface-hub \
37+
librosa \
3638
modelcards \
3739
numpy \
3840
scipy \

docker/diffusers-pytorch-cpu/Dockerfile

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ RUN apt update && \
1111
git-lfs \
1212
curl \
1313
ca-certificates \
14+
libsndfile1-dev \
1415
python3.8 \
1516
python3-pip \
1617
python3.8-venv && \
@@ -32,6 +33,7 @@ RUN python3 -m pip install --no-cache-dir --upgrade pip && \
3233
datasets \
3334
hf-doc-builder \
3435
huggingface-hub \
36+
librosa \
3537
modelcards \
3638
numpy \
3739
scipy \

docker/diffusers-pytorch-cuda/Dockerfile

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ RUN apt update && \
1111
git-lfs \
1212
curl \
1313
ca-certificates \
14+
libsndfile1-dev \
1415
python3.8 \
1516
python3-pip \
1617
python3.8-venv && \
@@ -32,6 +33,7 @@ RUN python3 -m pip install --no-cache-dir --upgrade pip && \
3233
datasets \
3334
hf-doc-builder \
3435
huggingface-hub \
36+
librosa \
3537
modelcards \
3638
numpy \
3739
scipy \

docs/source/_toctree.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,8 @@
122122
title: "VQ Diffusion"
123123
- local: api/pipelines/repaint
124124
title: "RePaint"
125+
- local: api/pipelines/audio_diffusion
126+
title: "Audio Diffusion"
125127
title: "Pipelines"
126128
- sections:
127129
- local: api/experimental/rl
Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# Audio Diffusion
14+
15+
## Overview
16+
17+
[Audio Diffusion](https://github.com/teticio/audio-diffusion) by Robert Dargavel Smith.
18+
19+
Audio Diffusion leverages the recent advances in image generation using diffusion models by converting audio samples to
20+
and from mel spectrogram images.
21+
22+
The original codebase of this implementation can be found [here](https://github.com/teticio/audio-diffusion), including
23+
training scripts and example notebooks.
24+
25+
## Available Pipelines:
26+
27+
| Pipeline | Tasks | Colab
28+
|---|---|:---:|
29+
| [pipeline_audio_diffusion.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/audio_diffusion/pipeline_audio_diffusion.py) | *Unconditional Audio Generation* | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/teticio/audio-diffusion/blob/master/notebooks/audio_diffusion_pipeline.ipynb) |
30+
31+
32+
## Examples:
33+
34+
### Audio Diffusion
35+
36+
```python
37+
import torch
38+
from IPython.display import Audio
39+
from diffusers import DiffusionPipeline
40+
41+
device = "cuda" if torch.cuda.is_available() else "cpu"
42+
pipe = DiffusionPipeline.from_pretrained("teticio/audio-diffusion-256").to(device)
43+
44+
output = pipe()
45+
display(output.images[0])
46+
display(Audio(output.audios[0], rate=mel.get_sample_rate()))
47+
```
48+
49+
### Latent Audio Diffusion
50+
51+
```python
52+
import torch
53+
from IPython.display import Audio
54+
from diffusers import DiffusionPipeline
55+
56+
device = "cuda" if torch.cuda.is_available() else "cpu"
57+
pipe = DiffusionPipeline.from_pretrained("teticio/latent-audio-diffusion-256").to(device)
58+
59+
output = pipe()
60+
display(output.images[0])
61+
display(Audio(output.audios[0], rate=pipe.mel.get_sample_rate()))
62+
```
63+
64+
### Audio Diffusion with DDIM (faster)
65+
66+
```python
67+
import torch
68+
from IPython.display import Audio
69+
from diffusers import DiffusionPipeline
70+
71+
device = "cuda" if torch.cuda.is_available() else "cpu"
72+
pipe = DiffusionPipeline.from_pretrained("teticio/audio-diffusion-ddim-256").to(device)
73+
74+
output = pipe()
75+
display(output.images[0])
76+
display(Audio(output.audios[0], rate=pipe.mel.get_sample_rate()))
77+
```
78+
79+
### Variations, in-painting, out-painting etc.
80+
81+
```python
82+
output = pipe(
83+
raw_audio=output.audios[0, 0],
84+
start_step=int(pipe.get_default_steps() / 2),
85+
mask_start_secs=1,
86+
mask_end_secs=1,
87+
)
88+
display(output.images[0])
89+
display(Audio(output.audios[0], rate=pipe.mel.get_sample_rate()))
90+
```
91+
92+
## AudioDiffusionPipeline
93+
[[autodoc]] AudioDiffusionPipeline
94+
- __call__
95+
- encode
96+
- slerp
97+
98+
99+
## Mel
100+
[[autodoc]] Mel
101+
- audio_slice_to_image
102+
- image_to_audio

0 commit comments

Comments
 (0)