From c4adefe54ec1a2246194cae20c0ff4e6e07ebb04 Mon Sep 17 00:00:00 2001
From: Vincent Quenneville-Belair <vincentqb@gmail.com>
Date: Fri, 26 Jul 2019 07:55:17 -0700
Subject: [PATCH 01/15] adding manifesto to readme.

---
 README.md | 28 ++++++++++++++++++++++++++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index 51e5bbafa4..0bf4ae101b 100644
--- a/README.md
+++ b/README.md
@@ -1,5 +1,5 @@
 torchaudio: an audio library for PyTorch
-================================================
+========================================
 
 [![Build Status](https://travis-ci.org/pytorch/audio.svg?branch=master)](https://travis-ci.org/pytorch/audio)
 
@@ -54,6 +54,30 @@ torchaudio.save('foo_save.mp3', sound, sample_rate) # saves tensor to file
 ```
 
 API Reference
------------
+-------------
 
 API Reference is located here: http://pytorch.org/audio/
+
+Conventions
+-----------
+
+Torchaudio is standardized around the following conventions. The following variables are used with their corresponding definitions.
+
+* waveform: a tensor of audio samples with shape (channels, time)
+* sample_rate: the rate of audio samples (samples per second)
+* specgram: a tensor of spectrogram with shape (channels, frequency, time)
+* mel_specgram: a mel spectrogram with shape (channels, frequency, time)
+* hop_length: the number of samples between the starts of consecutive frames
+* n_freqs: the number of bins in a linear spectrogram
+* min_freq: the lowest frequency of the lowest band in a spectrogram
+* max_freq: the highest frequency of the highest band in a spectrogram
+* n_fft: the number of fourier bins
+* n_mfcc, n_mels: to be consistent with other similarly named variables, with shape (channel, n_mfcc, time) and (channel, n_mels, times)
+* win_length: the length of the STFT window
+* window_fn: for functions that creates windows e.g. torch.hann_window
+
+A spectrogram can be converted to DB scale or Mel scale, using AmplitudeToDB and AmplitudetoMel.
+
+The input (Spectrogram, MFCC, MelSpectrogram, Resample, etc.) of all transforms and functions assumes channel first. The output of STFT is (channel, frequency, time, 2).
+
+The Kaldi compliance interface follow Kaldi's interface.

From 117b43b4664a88fe8b8927ea77a652dd9d69070f Mon Sep 17 00:00:00 2001
From: Vincent Quenneville-Belair <vincentqb@gmail.com>
Date: Fri, 26 Jul 2019 08:40:03 -0700
Subject: [PATCH 02/15] shape of transforms

---
 README.md | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/README.md b/README.md
index 0bf4ae101b..58be8722fd 100644
--- a/README.md
+++ b/README.md
@@ -61,12 +61,12 @@ API Reference is located here: http://pytorch.org/audio/
 Conventions
 -----------
 
-Torchaudio is standardized around the following conventions. The following variables are used with their corresponding definitions.
+Torchaudio is standardized around the following naming conventions.
 
 * waveform: a tensor of audio samples with shape (channels, time)
 * sample_rate: the rate of audio samples (samples per second)
-* specgram: a tensor of spectrogram with shape (channels, frequency, time)
-* mel_specgram: a mel spectrogram with shape (channels, frequency, time)
+* specgram: a tensor of spectrogram with shape (channels, time)
+* mel_specgram: a mel spectrogram with shape (channels, time)
 * hop_length: the number of samples between the starts of consecutive frames
 * n_freqs: the number of bins in a linear spectrogram
 * min_freq: the lowest frequency of the lowest band in a spectrogram
@@ -76,8 +76,17 @@ Torchaudio is standardized around the following conventions. The following varia
 * win_length: the length of the STFT window
 * window_fn: for functions that creates windows e.g. torch.hann_window
 
-A spectrogram can be converted to DB scale or Mel scale, using AmplitudeToDB and AmplitudetoMel.
+Transforms expect the following shapes. In particular, the input of all transforms and functions assumes channel first.
 
-The input (Spectrogram, MFCC, MelSpectrogram, Resample, etc.) of all transforms and functions assumes channel first. The output of STFT is (channel, frequency, time, 2).
+* Spectrogram: (channel, time) -> (channel, frequency, time, 2)
+* MelScale: (channel, time) -> (channel, n_mels, time)
+* MFCC: (channel, time) -> (channel, n_mfcc, time)
+* MuLawEncode: (channel, time) -> (channel, n_mulaw, time)
+* MuLawDecode: (channel, n_mulaw, time) -> (channel, time)
+* Resample: (channel, time) -> (channel, time)
+* STFT: (channel, time) -> (channel, frequency, time, 2).
+* ISTFT: (channel, frequency, time) -> (channel, time, 2).
+
+A spectrogram can be converted to DB scale or Mel scale, using AmplitudeToDB and AmplitudeToMel.
 
 The Kaldi compliance interface follow Kaldi's interface.

From fa72cf4a310482a3a418f67535276475284b8962 Mon Sep 17 00:00:00 2001
From: Vincent Quenneville-Belair <vincentqb@gmail.com>
Date: Fri, 26 Jul 2019 08:46:10 -0700
Subject: [PATCH 03/15] typo.

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 58be8722fd..1c2cfb9ae7 100644
--- a/README.md
+++ b/README.md
@@ -89,4 +89,4 @@ Transforms expect the following shapes. In particular, the input of all transfor
 
 A spectrogram can be converted to DB scale or Mel scale, using AmplitudeToDB and AmplitudeToMel.
 
-The Kaldi compliance interface follow Kaldi's interface.
+The Kaldi compliance interface follows Kaldi's interface.

From 416dfafd5832fb1f1b25252ca00ccca63cecc1dd Mon Sep 17 00:00:00 2001
From: Vincent Quenneville-Belair <vincentqb@gmail.com>
Date: Fri, 26 Jul 2019 09:00:07 -0700
Subject: [PATCH 04/15] complex input too.

---
 README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index 1c2cfb9ae7..44e95f9177 100644
--- a/README.md
+++ b/README.md
@@ -84,8 +84,8 @@ Transforms expect the following shapes. In particular, the input of all transfor
 * MuLawEncode: (channel, time) -> (channel, n_mulaw, time)
 * MuLawDecode: (channel, n_mulaw, time) -> (channel, time)
 * Resample: (channel, time) -> (channel, time)
-* STFT: (channel, time) -> (channel, frequency, time, 2).
-* ISTFT: (channel, frequency, time) -> (channel, time, 2).
+* STFT: (channel, time, 2) -> (channel, frequency, time, 2).
+* ISTFT: (channel, frequency, time, 2) -> (channel, time, 2).
 
 A spectrogram can be converted to DB scale or Mel scale, using AmplitudeToDB and AmplitudeToMel.
 

From d9de34688bb53e155f87355b71f754cfbfb270fe Mon Sep 17 00:00:00 2001
From: Vincent Quenneville-Belair <vincentqb@gmail.com>
Date: Fri, 26 Jul 2019 09:29:05 -0700
Subject: [PATCH 05/15] listing kaldi's function.

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 44e95f9177..c26fb36bde 100644
--- a/README.md
+++ b/README.md
@@ -89,4 +89,4 @@ Transforms expect the following shapes. In particular, the input of all transfor
 
 A spectrogram can be converted to DB scale or Mel scale, using AmplitudeToDB and AmplitudeToMel.
 
-The Kaldi compliance interface follows Kaldi's interface.
+The Kaldi compliance interface follows Kaldi's interface, and provides access to: Kaldi's `fbank`, `spectrogram`, and `resample_waveform`.

From 81ce38409a609a8a97906d904ba86ab2572e1a3c Mon Sep 17 00:00:00 2001
From: Vincent Quenneville-Belair <vincentqb@gmail.com>
Date: Fri, 26 Jul 2019 11:59:23 -0700
Subject: [PATCH 06/15] mulaw shape.

---
 README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index c26fb36bde..8b7f858cb4 100644
--- a/README.md
+++ b/README.md
@@ -81,8 +81,8 @@ Transforms expect the following shapes. In particular, the input of all transfor
 * Spectrogram: (channel, time) -> (channel, frequency, time, 2)
 * MelScale: (channel, time) -> (channel, n_mels, time)
 * MFCC: (channel, time) -> (channel, n_mfcc, time)
-* MuLawEncode: (channel, time) -> (channel, n_mulaw, time)
-* MuLawDecode: (channel, n_mulaw, time) -> (channel, time)
+* MuLawEncode: (channel, time) -> (channel, time)
+* MuLawDecode: (channel, time) -> (channel, time)
 * Resample: (channel, time) -> (channel, time)
 * STFT: (channel, time, 2) -> (channel, frequency, time, 2).
 * ISTFT: (channel, frequency, time, 2) -> (channel, time, 2).

From 9cecc6d8012822c39eef73bc6a0045b564637d73 Mon Sep 17 00:00:00 2001
From: Vincent Quenneville-Belair <vincentqb@gmail.com>
Date: Fri, 26 Jul 2019 12:12:29 -0700
Subject: [PATCH 07/15] +AmplitudeToDB -Kaldi.

---
 README.md | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/README.md b/README.md
index 8b7f858cb4..ec20b8d41f 100644
--- a/README.md
+++ b/README.md
@@ -79,6 +79,7 @@ Torchaudio is standardized around the following naming conventions.
 Transforms expect the following shapes. In particular, the input of all transforms and functions assumes channel first.
 
 * Spectrogram: (channel, time) -> (channel, frequency, time, 2)
+* AmplitudeToDB: (channel, frequency, time, 2) -> (channel, frequency, time, 2)
 * MelScale: (channel, time) -> (channel, n_mels, time)
 * MFCC: (channel, time) -> (channel, n_mfcc, time)
 * MuLawEncode: (channel, time) -> (channel, time)
@@ -86,7 +87,3 @@ Transforms expect the following shapes. In particular, the input of all transfor
 * Resample: (channel, time) -> (channel, time)
 * STFT: (channel, time, 2) -> (channel, frequency, time, 2).
 * ISTFT: (channel, frequency, time, 2) -> (channel, time, 2).
-
-A spectrogram can be converted to DB scale or Mel scale, using AmplitudeToDB and AmplitudeToMel.
-
-The Kaldi compliance interface follows Kaldi's interface, and provides access to: Kaldi's `fbank`, `spectrogram`, and `resample_waveform`.

From 0602d6c234ff28bdb7205d526dc7ea233dbb440f Mon Sep 17 00:00:00 2001
From: Vincent Quenneville-Belair <vincentqb@gmail.com>
Date: Fri, 26 Jul 2019 12:29:01 -0700
Subject: [PATCH 08/15] Shape of spectrogram.

---
 README.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/README.md b/README.md
index ec20b8d41f..6ee665a792 100644
--- a/README.md
+++ b/README.md
@@ -63,10 +63,10 @@ Conventions
 
 Torchaudio is standardized around the following naming conventions.
 
-* waveform: a tensor of audio samples with shape (channels, time)
+* waveform: a tensor of audio samples with shape (channel, time)
 * sample_rate: the rate of audio samples (samples per second)
-* specgram: a tensor of spectrogram with shape (channels, time)
-* mel_specgram: a mel spectrogram with shape (channels, time)
+* specgram: a tensor of spectrogram with shape (channel, frequency, time)
+* mel_specgram: a mel spectrogram with shape (channel, frequency, time)
 * hop_length: the number of samples between the starts of consecutive frames
 * n_freqs: the number of bins in a linear spectrogram
 * min_freq: the lowest frequency of the lowest band in a spectrogram

From 5dbbad1350d6ab276bde4909bf2ebc59c91f16cd Mon Sep 17 00:00:00 2001
From: Vincent Quenneville-Belair <vincentqb@gmail.com>
Date: Fri, 26 Jul 2019 12:48:50 -0700
Subject: [PATCH 09/15] Fourier, n_mels, n_freqs.

---
 README.md | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/README.md b/README.md
index 6ee665a792..4161992834 100644
--- a/README.md
+++ b/README.md
@@ -65,25 +65,26 @@ Torchaudio is standardized around the following naming conventions.
 
 * waveform: a tensor of audio samples with shape (channel, time)
 * sample_rate: the rate of audio samples (samples per second)
-* specgram: a tensor of spectrogram with shape (channel, frequency, time)
-* mel_specgram: a mel spectrogram with shape (channel, frequency, time)
+* specgram: a tensor of spectrogram with shape (channel, n_freqs, time)
+* mel_specgram: a mel spectrogram with shape (channel, n_mels, time)
 * hop_length: the number of samples between the starts of consecutive frames
 * n_freqs: the number of bins in a linear spectrogram
 * min_freq: the lowest frequency of the lowest band in a spectrogram
 * max_freq: the highest frequency of the highest band in a spectrogram
-* n_fft: the number of fourier bins
+* n_fft: the number of Fourier bins
 * n_mfcc, n_mels: to be consistent with other similarly named variables, with shape (channel, n_mfcc, time) and (channel, n_mels, times)
 * win_length: the length of the STFT window
 * window_fn: for functions that creates windows e.g. torch.hann_window
 
 Transforms expect the following shapes. In particular, the input of all transforms and functions assumes channel first.
 
-* Spectrogram: (channel, time) -> (channel, frequency, time, 2)
-* AmplitudeToDB: (channel, frequency, time, 2) -> (channel, frequency, time, 2)
+* Spectrogram: (channel, time) -> (channel, n_freqs, time, 2)
+* AmplitudeToDB: (channel, n_freqs, time, 2) -> (channel, n_freqs, time, 2)
 * MelScale: (channel, time) -> (channel, n_mels, time)
+* MelSpectrogram: (channel, time) -> (channel, n_mels, time, 2)
 * MFCC: (channel, time) -> (channel, n_mfcc, time)
 * MuLawEncode: (channel, time) -> (channel, time)
 * MuLawDecode: (channel, time) -> (channel, time)
 * Resample: (channel, time) -> (channel, time)
-* STFT: (channel, time, 2) -> (channel, frequency, time, 2).
-* ISTFT: (channel, frequency, time, 2) -> (channel, time, 2).
+* STFT: (channel, time, 2) -> (channel, n_freqs, time, 2).
+* ISTFT: (channel, n_freqs, time, 2) -> (channel, time, 2).

From cf3cfab436ec172aad6f1327c07efd3355327039 Mon Sep 17 00:00:00 2001
From: Vincent Quenneville-Belair <vincentqb@gmail.com>
Date: Fri, 26 Jul 2019 12:51:45 -0700
Subject: [PATCH 10/15] time.

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 4161992834..43f81a241c 100644
--- a/README.md
+++ b/README.md
@@ -72,7 +72,7 @@ Torchaudio is standardized around the following naming conventions.
 * min_freq: the lowest frequency of the lowest band in a spectrogram
 * max_freq: the highest frequency of the highest band in a spectrogram
 * n_fft: the number of Fourier bins
-* n_mfcc, n_mels: to be consistent with other similarly named variables, with shape (channel, n_mfcc, time) and (channel, n_mels, times)
+* n_mfcc, n_mels: to be consistent with other similarly named variables, with shape (channel, n_mfcc, time) and (channel, n_mels, time)
 * win_length: the length of the STFT window
 * window_fn: for functions that creates windows e.g. torch.hann_window
 

From 17b5667cac2eaed426b205e9713c77bdd7e78742 Mon Sep 17 00:00:00 2001
From: Vincent Quenneville-Belair <vincentqb@gmail.com>
Date: Fri, 26 Jul 2019 14:10:49 -0700
Subject: [PATCH 11/15] dimensions (or dimension names) vs number of them.

---
 README.md | 30 +++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/README.md b/README.md
index 43f81a241c..b44ca2655d 100644
--- a/README.md
+++ b/README.md
@@ -63,28 +63,28 @@ Conventions
 
 Torchaudio is standardized around the following naming conventions.
 
-* waveform: a tensor of audio samples with shape (channel, time)
-* sample_rate: the rate of audio samples (samples per second)
-* specgram: a tensor of spectrogram with shape (channel, n_freqs, time)
-* mel_specgram: a mel spectrogram with shape (channel, n_mels, time)
+* waveform: a tensor of audio samples with dimensions (channel, time)
+* sample_rate: the rate of audio dimensions (samples per second)
+* specgram: a tensor of spectrogram with dimensions (channel, freq, time)
+* mel_specgram: a mel spectrogram with dimensions (channel, freq, time)
 * hop_length: the number of samples between the starts of consecutive frames
-* n_freqs: the number of bins in a linear spectrogram
+* n_fft: the number of Fourier bins
+* n_mfcc, n_mel: the number of mel and MFCC bins,
+* n_freq: the number of bins in a linear spectrogram
 * min_freq: the lowest frequency of the lowest band in a spectrogram
 * max_freq: the highest frequency of the highest band in a spectrogram
-* n_fft: the number of Fourier bins
-* n_mfcc, n_mels: to be consistent with other similarly named variables, with shape (channel, n_mfcc, time) and (channel, n_mels, time)
 * win_length: the length of the STFT window
 * window_fn: for functions that creates windows e.g. torch.hann_window
 
-Transforms expect the following shapes. In particular, the input of all transforms and functions assumes channel first.
+Transforms expect the following dimensions. In particular, the input of all transforms and functions assumes channel first.
 
-* Spectrogram: (channel, time) -> (channel, n_freqs, time, 2)
-* AmplitudeToDB: (channel, n_freqs, time, 2) -> (channel, n_freqs, time, 2)
-* MelScale: (channel, time) -> (channel, n_mels, time)
-* MelSpectrogram: (channel, time) -> (channel, n_mels, time, 2)
-* MFCC: (channel, time) -> (channel, n_mfcc, time)
+* Spectrogram: (channel, time) -> (channel, freq, time, 2)
+* AmplitudeToDB: (channel, freq, time, 2) -> (channel, freq, time, 2)
+* MelScale: (channel, time) -> (channel, mel, time)
+* MelSpectrogram: (channel, time) -> (channel, mel, time, 2)
+* MFCC: (channel, time) -> (channel, mfcc, time)
 * MuLawEncode: (channel, time) -> (channel, time)
 * MuLawDecode: (channel, time) -> (channel, time)
 * Resample: (channel, time) -> (channel, time)
-* STFT: (channel, time, 2) -> (channel, n_freqs, time, 2).
-* ISTFT: (channel, n_freqs, time, 2) -> (channel, time, 2).
+* STFT: (channel, time, 2) -> (channel, freq, time, 2)
+* ISTFT: (channel, freq, time, 2) -> (channel, time, 2)

From ac29e50a009bb4164a7e7be3c87bd5401df62a76 Mon Sep 17 00:00:00 2001
From: Vincent Quenneville-Belair <vincentqb@gmail.com>
Date: Fri, 26 Jul 2019 14:14:30 -0700
Subject: [PATCH 12/15] typo.

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index b44ca2655d..31e1490124 100644
--- a/README.md
+++ b/README.md
@@ -69,7 +69,7 @@ Torchaudio is standardized around the following naming conventions.
 * mel_specgram: a mel spectrogram with dimensions (channel, freq, time)
 * hop_length: the number of samples between the starts of consecutive frames
 * n_fft: the number of Fourier bins
-* n_mfcc, n_mel: the number of mel and MFCC bins,
+* n_mfcc, n_mel: the number of mel and MFCC bins
 * n_freq: the number of bins in a linear spectrogram
 * min_freq: the lowest frequency of the lowest band in a spectrogram
 * max_freq: the highest frequency of the highest band in a spectrogram

From c36ed5162b22ced92bf5cddab1ddc331c311c25c Mon Sep 17 00:00:00 2001
From: Vincent Quenneville-Belair <vincentqb@gmail.com>
Date: Fri, 26 Jul 2019 14:15:25 -0700
Subject: [PATCH 13/15] mel.

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 31e1490124..199f50c0ed 100644
--- a/README.md
+++ b/README.md
@@ -66,7 +66,7 @@ Torchaudio is standardized around the following naming conventions.
 * waveform: a tensor of audio samples with dimensions (channel, time)
 * sample_rate: the rate of audio dimensions (samples per second)
 * specgram: a tensor of spectrogram with dimensions (channel, freq, time)
-* mel_specgram: a mel spectrogram with dimensions (channel, freq, time)
+* mel_specgram: a mel spectrogram with dimensions (channel, mel, time)
 * hop_length: the number of samples between the starts of consecutive frames
 * n_fft: the number of Fourier bins
 * n_mfcc, n_mel: the number of mel and MFCC bins

From 9a52757df9fd5c053085e39f729229977f22d63a Mon Sep 17 00:00:00 2001
From: Vincent Quenneville-Belair <vincentqb@gmail.com>
Date: Fri, 26 Jul 2019 14:36:27 -0700
Subject: [PATCH 14/15] order, and complex.

---
 README.md | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/README.md b/README.md
index 199f50c0ed..0eb23bb9a1 100644
--- a/README.md
+++ b/README.md
@@ -69,7 +69,7 @@ Torchaudio is standardized around the following naming conventions.
 * mel_specgram: a mel spectrogram with dimensions (channel, mel, time)
 * hop_length: the number of samples between the starts of consecutive frames
 * n_fft: the number of Fourier bins
-* n_mfcc, n_mel: the number of mel and MFCC bins
+* n_mel, n_mfcc: the number of mel and MFCC bins
 * n_freq: the number of bins in a linear spectrogram
 * min_freq: the lowest frequency of the lowest band in a spectrogram
 * max_freq: the highest frequency of the highest band in a spectrogram
@@ -78,13 +78,15 @@ Torchaudio is standardized around the following naming conventions.
 
 Transforms expect the following dimensions. In particular, the input of all transforms and functions assumes channel first.
 
-* Spectrogram: (channel, time) -> (channel, freq, time, 2)
-* AmplitudeToDB: (channel, freq, time, 2) -> (channel, freq, time, 2)
+* Spectrogram: (channel, time) -> (channel, freq, time, complex)
+* AmplitudeToDB: (channel, freq, time, complex) -> (channel, freq, time, complex)
 * MelScale: (channel, time) -> (channel, mel, time)
-* MelSpectrogram: (channel, time) -> (channel, mel, time, 2)
+* MelSpectrogram: (channel, time) -> (channel, mel, time, complex)
 * MFCC: (channel, time) -> (channel, mfcc, time)
 * MuLawEncode: (channel, time) -> (channel, time)
 * MuLawDecode: (channel, time) -> (channel, time)
 * Resample: (channel, time) -> (channel, time)
-* STFT: (channel, time, 2) -> (channel, freq, time, 2)
-* ISTFT: (channel, freq, time, 2) -> (channel, time, 2)
+* STFT: (channel, time, complex) -> (channel, freq, time, complex)
+* ISTFT: (channel, freq, time, complex) -> (channel, time, complex)
+
+where complex refers to the 2 dimensions required to represent a complex number using real numbers.

From da66e1d73f31dc0d5f3f79837d6aa15f2b6c186d Mon Sep 17 00:00:00 2001
From: Vincent Quenneville-Belair <vincentqb@gmail.com>
Date: Fri, 26 Jul 2019 14:40:13 -0700
Subject: [PATCH 15/15] no complex in transforms.

---
 README.md | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/README.md b/README.md
index 0eb23bb9a1..cbbfa87853 100644
--- a/README.md
+++ b/README.md
@@ -78,15 +78,11 @@ Torchaudio is standardized around the following naming conventions.
 
 Transforms expect the following dimensions. In particular, the input of all transforms and functions assumes channel first.
 
-* Spectrogram: (channel, time) -> (channel, freq, time, complex)
-* AmplitudeToDB: (channel, freq, time, complex) -> (channel, freq, time, complex)
+* Spectrogram: (channel, time) -> (channel, freq, time)
+* AmplitudeToDB: (channel, freq, time) -> (channel, freq, time)
 * MelScale: (channel, time) -> (channel, mel, time)
-* MelSpectrogram: (channel, time) -> (channel, mel, time, complex)
+* MelSpectrogram: (channel, time) -> (channel, mel, time)
 * MFCC: (channel, time) -> (channel, mfcc, time)
 * MuLawEncode: (channel, time) -> (channel, time)
 * MuLawDecode: (channel, time) -> (channel, time)
 * Resample: (channel, time) -> (channel, time)
-* STFT: (channel, time, complex) -> (channel, freq, time, complex)
-* ISTFT: (channel, freq, time, complex) -> (channel, time, complex)
-
-where complex refers to the 2 dimensions required to represent a complex number using real numbers.