-
Notifications
You must be signed in to change notification settings - Fork 739
Description
🐛 Bug
From my understanding (based on e.g. #337), all transforms should be able to operate on tensors with dimensions (batch, channels, ...) or (channels, ...), with ... being dependent on the type of data being processed, e.g.time for waveform data and freq, time for spectrograms. In this way, we can pass multiple chunks of data (waveforms, spectrograms…) at once and expect to get the same results as if we would pass them one by one.
However, this is not the case transforms.AmplitudeToDB: As easily traceable in the source code of the corresponding functional, this transform blindly operates on the passed tensor without taking its dimensionality and the related semantics into account in any way.
This becomes a problem in the calculation of the cut-off. The purpose of this step is to clamp low dB values to a minimum value some fixed amount of decibels below the maximum value in the respective spectrogram. However, amplitude_to_db uses the single global maximum in the passed tensor to calculate the cut-off for all contained spectrograms. Thus, when passing batched data, the results for one spectrogram are dependent on all the other spectrograms in the same batch, which to my understanding is not the correct behavior. My conclusion is that AmplitudeToDB silently outputs wrong (in the sense of transforms general interface contract) data for batched or multi-channel data, which I would consider really dangerous for applications.
Ideally, this should be fixed directly in functional.amplitude_to_DB, so that we can also pass batched data there.
Environment
TorchAudio 0.7.0