Skip to content

Commit 9bb0523

Browse files
committed
Add note about normalize argument (#2449)
Summary: `load` function has `normalize` argument, which converts the native sample type to `torch.float32`. This argument is confusing for audio practitioners as it seems to perform [volume normalization](https://en.wikipedia.org/wiki/Audio_normalization). See #2253 Due to the BC-breaking concern, we cannot easily change the argument name. This commit adds warnings to documentations. Fix #2253 Pull Request resolved: #2449 Reviewed By: nateanl Differential Revision: D36995756 Pulled By: carolineechen fbshipit-source-id: 0b7db2758a355f6aafe06a2273bc72a1027690bd
1 parent 68b1127 commit 9bb0523

File tree

3 files changed

+49
-32
lines changed

3 files changed

+49
-32
lines changed

torchaudio/backend/soundfile_backend.py

Lines changed: 21 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -147,19 +147,25 @@ def load(
147147
* SPHERE
148148
149149
By default (``normalize=True``, ``channels_first=True``), this function returns Tensor with
150-
``float32`` dtype and the shape of `[channel, time]`.
151-
The samples are normalized to fit in the range of ``[-1.0, 1.0]``.
150+
``float32`` dtype, and the shape of `[channel, time]`.
152151
153-
When the input format is WAV with integer type, such as 32-bit signed integer, 16-bit
154-
signed integer and 8-bit unsigned integer (24-bit signed integer is not supported),
155-
by providing ``normalize=False``, this function can return integer Tensor, where the samples
156-
are expressed within the whole range of the corresponding dtype, that is, ``int32`` tensor
157-
for 32-bit signed PCM, ``int16`` for 16-bit signed PCM and ``uint8`` for 8-bit unsigned PCM.
152+
.. warning::
158153
159-
``normalize`` parameter has no effect on 32-bit floating-point WAV and other formats, such as
160-
``flac`` and ``mp3``.
161-
For these formats, this function always returns ``float32`` Tensor with values normalized to
162-
``[-1.0, 1.0]``.
154+
``normalize`` argument does not perform volume normalization.
155+
It only converts the sample type to `torch.float32` from the native sample
156+
type.
157+
158+
When the input format is WAV with integer type, such as 32-bit signed integer, 16-bit
159+
signed integer, 24-bit signed integer, and 8-bit unsigned integer, by providing ``normalize=False``,
160+
this function can return integer Tensor, where the samples are expressed within the whole range
161+
of the corresponding dtype, that is, ``int32`` tensor for 32-bit signed PCM,
162+
``int16`` for 16-bit signed PCM and ``uint8`` for 8-bit unsigned PCM. Since torch does not
163+
support ``int24`` dtype, 24-bit signed PCM are converted to ``int32`` tensors.
164+
165+
``normalize`` argument has no effect on 32-bit floating-point WAV and other formats, such as
166+
``flac`` and ``mp3``.
167+
168+
For these formats, this function always returns ``float32`` Tensor with values.
163169
164170
Note:
165171
``filepath`` argument is intentionally annotated as ``str`` only, even though it accepts
@@ -177,11 +183,13 @@ def load(
177183
This function may return the less number of frames if there is not enough
178184
frames in the given file.
179185
normalize (bool, optional):
180-
When ``True``, this function always return ``float32``, and sample values are
181-
normalized to ``[-1.0, 1.0]``.
186+
When ``True``, this function converts the native sample type to ``float32``.
187+
Default: ``True``.
188+
182189
If input file is integer WAV, giving ``False`` will change the resulting Tensor type to
183190
integer type.
184191
This argument has no effect for formats other than integer WAV type.
192+
185193
channels_first (bool, optional):
186194
When True, the returned Tensor has dimension `[channel, time]`.
187195
Otherwise, the returned Tensor's dimension is `[time, channel]`.

torchaudio/backend/sox_io_backend.py

Lines changed: 22 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -130,20 +130,25 @@ def load(
130130
and corresponding codec libraries such as ``libmad`` or ``libmp3lame`` etc.
131131
132132
By default (``normalize=True``, ``channels_first=True``), this function returns Tensor with
133-
``float32`` dtype and the shape of `[channel, time]`.
134-
The samples are normalized to fit in the range of ``[-1.0, 1.0]``.
133+
``float32`` dtype, and the shape of `[channel, time]`.
135134
136-
When the input format is WAV with integer type, such as 32-bit signed integer, 16-bit
137-
signed integer, 24-bit signed integer, and 8-bit unsigned integer, by providing ``normalize=False``,
138-
this function can return integer Tensor, where the samples are expressed within the whole range
139-
of the corresponding dtype, that is, ``int32`` tensor for 32-bit signed PCM,
140-
``int16`` for 16-bit signed PCM and ``uint8`` for 8-bit unsigned PCM. Since torch does not
141-
support ``int24`` dtype, 24-bit signed PCM are converted to ``int32`` tensors.
135+
.. warning::
142136
143-
``normalize`` parameter has no effect on 32-bit floating-point WAV and other formats, such as
144-
``flac`` and ``mp3``.
145-
For these formats, this function always returns ``float32`` Tensor with values normalized to
146-
``[-1.0, 1.0]``.
137+
``normalize`` argument does not perform volume normalization.
138+
It only converts the sample type to `torch.float32` from the native sample
139+
type.
140+
141+
When the input format is WAV with integer type, such as 32-bit signed integer, 16-bit
142+
signed integer, 24-bit signed integer, and 8-bit unsigned integer, by providing ``normalize=False``,
143+
this function can return integer Tensor, where the samples are expressed within the whole range
144+
of the corresponding dtype, that is, ``int32`` tensor for 32-bit signed PCM,
145+
``int16`` for 16-bit signed PCM and ``uint8`` for 8-bit unsigned PCM. Since torch does not
146+
support ``int24`` dtype, 24-bit signed PCM are converted to ``int32`` tensors.
147+
148+
``normalize`` argument has no effect on 32-bit floating-point WAV and other formats, such as
149+
``flac`` and ``mp3``.
150+
151+
For these formats, this function always returns ``float32`` Tensor with values.
147152
148153
Args:
149154
filepath (path-like object or file-like object):
@@ -166,11 +171,13 @@ def load(
166171
This function may return the less number of frames if there is not enough
167172
frames in the given file.
168173
normalize (bool, optional):
169-
When ``True``, this function always return ``float32``, and sample values are
170-
normalized to ``[-1.0, 1.0]``.
174+
When ``True``, this function converts the native sample type to ``float32``.
175+
Default: ``True``.
176+
171177
If input file is integer WAV, giving ``False`` will change the resulting Tensor type to
172178
integer type.
173179
This argument has no effect for formats other than integer WAV type.
180+
174181
channels_first (bool, optional):
175182
When True, the returned Tensor has dimension `[channel, time]`.
176183
Otherwise, the returned Tensor's dimension is `[time, channel]`.
@@ -181,7 +188,7 @@ def load(
181188
182189
Returns:
183190
(torch.Tensor, int): Resulting Tensor and sample rate.
184-
If the input file has integer wav format and normalization is off, then it has
191+
If the input file has integer wav format and ``normalize=False``, then it has
185192
integer type, else ``float32`` type. If ``channels_first=True``, it has
186193
`[channel, time]` else `[time, channel]`.
187194
"""

torchaudio/sox_effects/sox_effects.py

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -192,11 +192,13 @@ def apply_effects_file(
192192
TorchScript compiler compatibility.
193193
effects (List[List[str]]): List of effects.
194194
normalize (bool, optional):
195-
When ``True``, this function always return ``float32``, and sample values are
196-
normalized to ``[-1.0, 1.0]``.
195+
When ``True``, this function converts the native sample type to ``float32``.
196+
Default: ``True``.
197+
197198
If input file is integer WAV, giving ``False`` will change the resulting Tensor type to
198-
integer type. This argument has no effect for formats other
199-
than integer WAV type.
199+
integer type.
200+
This argument has no effect for formats other than integer WAV type.
201+
200202
channels_first (bool, optional): When True, the returned Tensor has dimension `[channel, time]`.
201203
Otherwise, the returned Tensor's dimension is `[time, channel]`.
202204
format (str or None, optional):

0 commit comments

Comments
 (0)