diff --git a/README.md b/README.md index c60b2662f7..1a21b49a97 100644 --- a/README.md +++ b/README.md @@ -3,6 +3,15 @@ torchaudio: an audio library for PyTorch [![Build Status](https://travis-ci.org/pytorch/audio.svg?branch=master)](https://travis-ci.org/pytorch/audio) +The aim of torchaudio is to apply [PyTorch](https://github.com/pytorch/pytorch) to +the audio domain. By supporting PyTorch, torchaudio will follow the same philosophy +of providing strong GPU acceleration, having a focus on trainable features through +the autograd system, and having consistent style (tensor names and dimension names). +Therefore, it will be primarily a machine learning library and not a general signal +processing library. The benefits of Pytorch will be seen in torchaudio through +having all the computations be through Pytorch operations which makes it easy +to use and feel like a natural extension. + - [Support audio I/O (Load files, Save files)](http://pytorch.org/audio/) - Load the following formats into a torch Tensor - mp3, wav, aac, ogg, flac, avr, cdda, cvs/vms, @@ -63,28 +72,47 @@ API Reference is located here: http://pytorch.org/audio/ Conventions ----------- -Torchaudio is standardized around the following naming conventions. - -* waveform: a tensor of audio samples with dimensions (channel, time) -* sample_rate: the rate of audio dimensions (samples per second) -* specgram: a tensor of spectrogram with dimensions (channel, freq, time) -* mel_specgram: a mel spectrogram with dimensions (channel, mel, time) -* hop_length: the number of samples between the starts of consecutive frames -* n_fft: the number of Fourier bins -* n_mel, n_mfcc: the number of mel and MFCC bins -* n_freq: the number of bins in a linear spectrogram -* min_freq: the lowest frequency of the lowest band in a spectrogram -* max_freq: the highest frequency of the highest band in a spectrogram -* win_length: the length of the STFT window -* window_fn: for functions that creates windows e.g. torch.hann_window - -Transforms expect the following dimensions. In particular, the input of all transforms and functions assumes channel first. - -* Spectrogram: (channel, time) -> (channel, freq, time) -* AmplitudeToDB: (channel, freq, time) -> (channel, freq, time) -* MelScale: (channel, time) -> (channel, mel, time) -* MelSpectrogram: (channel, time) -> (channel, mel, time) -* MFCC: (channel, time) -> (channel, mfcc, time) -* MuLawEncode: (channel, time) -> (channel, time) -* MuLawDecode: (channel, time) -> (channel, time) -* Resample: (channel, time) -> (channel, time) +With torchaudio being a machine learning library and built on top of PyTorch, +torchaudio is standardized around the following naming conventions. In particular, +tensors are assumed to have channel as the first dimension and time as the last +dimension (when applicable). This makes it consistent with PyTorch's dimensions. +For size names, the prefix `n_` is used (e.g. "a tensor of size (`n_freq`, `n_mel`)") +whereas dimension names do not have this prefix (e.g. "a tensor of +dimension (channel, time)") + +* `waveform`: a tensor of audio samples with dimensions (channel, time) +* `sample_rate`: the rate of audio dimensions (samples per second) +* `specgram`: a tensor of spectrogram with dimensions (channel, freq, time) +* `mel_specgram`: a mel spectrogram with dimensions (channel, mel, time) +* `hop_length`: the number of samples between the starts of consecutive frames +* `n_fft`: the number of Fourier bins +* `n_mel`, `n_mfcc`: the number of mel and MFCC bins +* `n_freq`: the number of bins in a linear spectrogram +* `min_freq`: the lowest frequency of the lowest band in a spectrogram +* `max_freq`: the highest frequency of the highest band in a spectrogram +* `win_length`: the length of the STFT window +* `window_fn`: for functions that creates windows e.g. torch.hann_window + +Transforms expect the following dimensions. + +* `Spectrogram`: (channel, time) -> (channel, freq, time) +* `AmplitudeToDB`: (channel, freq, time) -> (channel, freq, time) +* `MelScale`: (channel, time) -> (channel, mel, time) +* `MelSpectrogram`: (channel, time) -> (channel, mel, time) +* `MFCC`: (channel, time) -> (channel, mfcc, time) +* `MuLawEncode`: (channel, time) -> (channel, time) +* `MuLawDecode`: (channel, time) -> (channel, time) +* `Resample`: (channel, time) -> (channel, time) + +Contributing Guidelines +----------------------- + +Please let us know if you encounter a bug by filing an [issue](https://github.com/pytorch/audio/issues). + +We appreciate all contributions. If you are planning to contribute back +bug-fixes, please do so without any further discussion. + +If you plan to contribute new features, utility functions or extensions to the +core, please first open an issue and discuss the feature with us. Sending a PR +without discussion might end up resulting in a rejected PR, because we might be +taking the core in a different direction than you might be aware of.