Skip to content

Commit c4dafdb

Browse files
committed
Update docstrings and add examples
1 parent 4ef41ad commit c4dafdb

File tree

2 files changed

+82
-9
lines changed

2 files changed

+82
-9
lines changed

docs/source/sox_effects.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,17 @@ Create SoX effects chain for preprocessing audio.
88

99
.. currentmodule:: torchaudio.sox_effects
1010

11+
:hidden:`apply_effects_tensor`
12+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
13+
14+
.. autofunction:: apply_effects_tensor
15+
16+
:hidden:`apply_effects_file`
17+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
18+
19+
.. autofunction:: apply_effects_file
20+
21+
1122
:hidden:`SoxEffect`
1223
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1324

torchaudio/sox_effects/sox_effects.py

Lines changed: 71 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -67,14 +67,49 @@ def apply_effects_tensor(
6767
"""Apply sox effects to given Tensor
6868
6969
Args:
70-
tensor: Input 2D Tensor.
71-
sample_rate: Sample rate
72-
effects: List of effects.
73-
channels_first: Indicates if the input Tensor's dimension is
70+
tensor (torch.Tensor): Input 2D Tensor.
71+
sample_rate (int): Sample rate
72+
effects (List[List[str]]): List of effects.
73+
channels_first (bool): Indicates if the input Tensor's dimension is
7474
``[channels, time]`` or ``[time, channels]``
7575
76+
Returns:
77+
Tuple[torch.Tensor, int]: Resulting Tensor and sample rate.
78+
The resulting Tensor has the same ``dtype`` as the input Tensor, and
79+
the same channels order. The shape of the Tensor can be different based on the
80+
effects applied. Sample rate can also be different based on the effects applied.
81+
82+
Examples:
83+
>>> # Defines the effects to apply
84+
>>> effects = [
85+
... ['gain', '-n'], # normalises to 0dB
86+
... ['pitch', '5'], # 5 cent pitch shift
87+
... ['rate', '8000'], # resample to 8000 Hz
88+
... ]
89+
>>> # Generate pseudo wave:
90+
>>> # normalized, channels first, 2ch, sampling rate 16000, 1 second
91+
>>> sample_rate = 16000
92+
>>> waveform = 2 * torch.rand([2, sample_rate * 1]) - 1
93+
>>> waveform.shape
94+
torch.Size([2, 16000])
95+
>>> waveform
96+
tensor([[ 0.3138, 0.7620, -0.9019, ..., -0.7495, -0.4935, 0.5442],
97+
[-0.0832, 0.0061, 0.8233, ..., -0.5176, -0.9140, -0.2434]])
98+
>>> # Apply effects
99+
>>> waveform, sample_rate = apply_effects_tensor(
100+
... wave_form, sample_rate, effects, channels_first=True)
101+
>>> # The new waveform his sampling rate 8000, 1 second.
102+
>>> # normalization and channel order are preserved
103+
>>> waveform.shape
104+
torch.Size([2, 8000])
105+
>>> waveform
106+
tensor([[ 0.5054, -0.5518, -0.4800, ..., -0.0076, 0.0096, -0.0110],
107+
[ 0.1331, 0.0436, -0.3783, ..., -0.0035, 0.0012, 0.0008]])
108+
>>> sample_rate
109+
8000
110+
76111
Notes:
77-
This function works in the way very similar to ```sox``` command, however there are slight
112+
This function works in the way very similar to ``sox`` command, however there are slight
78113
differences. For example, ``sox`` commnad adds certain effects automatically (such as
79114
``rate`` effect after ``speed`` and ``pitch`` and other effects), but this function does
80115
only applies the given effects. (Therefore, to actually apply ``speed`` effect, you also
@@ -95,15 +130,42 @@ def apply_effects_file(
95130
"""Apply sox effects to the audio file and load Tensor
96131
97132
Args:
98-
path: Path to the audio file.
99-
effects: List of effects.
100-
normalize: When ``True``, this function always return ``float32``, and sample values are
133+
path (str): Path to the audio file.
134+
effects (List[List[str]]): List of effects.
135+
normalize (bool): When ``True``, this function always return ``float32``, and sample values are
101136
normalized to ``[-1.0, 1.0]``. If input file is integer WAV, giving ``False`` will change
102137
the resulting Tensor type to integer type. This argument has no effect for formats other
103138
than integer WAV type.
104-
channels_first: When True, the returned Tensor has dimension ``[channel, time]``.
139+
channels_first (bool): When True, the returned Tensor has dimension ``[channel, time]``.
105140
Otherwise, the returned Tensor's dimension is ``[time, channel]``.
106141
142+
Returns:
143+
Tuple[torch.Tensor, int]: Resulting Tensor and sample rate.
144+
If ``normalize=True``, the resulting Tensor is always ``float32`` type.
145+
If ``normalize=False`` and the input audio file is of integer WAV file, then the
146+
resulting Tensor has corresponding integer type. (Note 24 bit integer type is not supported)
147+
If ``channels_first=True``, the resulting Tensor has dimension ``[channel, time]``,
148+
otherwise ``[time, channel]``.
149+
150+
Examples:
151+
>>> # Defines the effects to apply
152+
>>> effects = [
153+
... ['gain', '-n'], # normalises to 0dB
154+
... ['pitch', '5'], # 5 cent pitch shift
155+
... ['rate', '8000'], # resample to 8000 Hz
156+
... ]
157+
>>> # Apply effects and load data with channels_first=True
158+
>>> waveform, sample_rate = apply_effects_file("data.wav", effects, channels_first=True)
159+
>>> waveform.shape
160+
torch.Size([2, 8000])
161+
>>> waveform
162+
tensor([[ 5.1151e-03, 1.8073e-02, 2.2188e-02, ..., 1.0431e-07,
163+
-1.4761e-07, 1.8114e-07],
164+
[-2.6924e-03, 2.1860e-03, 1.0650e-02, ..., 6.4122e-07,
165+
-5.6159e-07, 4.8103e-07]])
166+
>>> sample_rate
167+
8000
168+
107169
Notes:
108170
This function works in the way very similar to ``sox`` command, however there are slight
109171
differences. For example, ``sox`` commnad adds certain effects automatically (such as

0 commit comments

Comments
 (0)