@@ -195,16 +195,16 @@ Conventions
195195
196196With torchaudio being a machine learning library and built on top of PyTorch,
197197torchaudio is standardized around the following naming conventions. Tensors are
198- assumed to have channels as the first dimension and time as the last
198+ assumed to have "channel" as the first dimension and time as the last
199199dimension (when applicable). This makes it consistent with PyTorch's dimensions.
200200For size names, the prefix ` n_ ` is used (e.g. "a tensor of size (` n_freq ` , ` n_mel ` )")
201201whereas dimension names do not have this prefix (e.g. "a tensor of
202- dimension (channels , time)")
202+ dimension (channel , time)")
203203
204- * ` waveform ` : a tensor of audio samples with dimensions (channels , time)
204+ * ` waveform ` : a tensor of audio samples with dimensions (channel , time)
205205* ` sample_rate ` : the rate of audio dimensions (samples per second)
206- * ` specgram ` : a tensor of spectrogram with dimensions (channels , freq, time)
207- * ` mel_specgram ` : a mel spectrogram with dimensions (channels , mel, time)
206+ * ` specgram ` : a tensor of spectrogram with dimensions (channel , freq, time)
207+ * ` mel_specgram ` : a mel spectrogram with dimensions (channel , mel, time)
208208* ` hop_length ` : the number of samples between the starts of consecutive frames
209209* ` n_fft ` : the number of Fourier bins
210210* ` n_mel ` , ` n_mfcc ` : the number of mel and MFCC bins
@@ -216,16 +216,16 @@ dimension (channels, time)")
216216
217217Transforms expect and return the following dimensions.
218218
219- * ` Spectrogram ` : (channels , time) -> (channels , freq, time)
220- * ` AmplitudeToDB ` : (channels , freq, time) -> (channels , freq, time)
221- * ` MelScale ` : (channels , freq, time) -> (channels , mel, time)
222- * ` MelSpectrogram ` : (channels , time) -> (channels , mel, time)
223- * ` MFCC ` : (channels , time) -> (channel, mfcc, time)
224- * ` MuLawEncode ` : (channels , time) -> (channels , time)
225- * ` MuLawDecode ` : (channels , time) -> (channels , time)
226- * ` Resample ` : (channels , time) -> (channels , time)
227- * ` Fade ` : (channels , time) -> (channels , time)
228- * ` Vol ` : (channels , time) -> (channels , time)
219+ * ` Spectrogram ` : (channel , time) -> (channel , freq, time)
220+ * ` AmplitudeToDB ` : (channel , freq, time) -> (channel , freq, time)
221+ * ` MelScale ` : (channel , freq, time) -> (channel , mel, time)
222+ * ` MelSpectrogram ` : (channel , time) -> (channel , mel, time)
223+ * ` MFCC ` : (channel , time) -> (channel, mfcc, time)
224+ * ` MuLawEncode ` : (channel , time) -> (channel , time)
225+ * ` MuLawDecode ` : (channel , time) -> (channel , time)
226+ * ` Resample ` : (channel , time) -> (channel , time)
227+ * ` Fade ` : (channel , time) -> (channel , time)
228+ * ` Vol ` : (channel , time) -> (channel , time)
229229
230230Complex numbers are supported via tensors of dimension (..., 2), and torchaudio provides ` complex_norm ` and ` angle ` to convert such a tensor into its magnitude and phase. Here, and in the documentation, we use an ellipsis "..." as a placeholder for the rest of the dimensions of a tensor, e.g. optional batching and channel dimensions.
231231
0 commit comments