Skip to content

transforms.AmplitudeToDB does not handle cut-off correctly for multi-channel or batched data #994

@Stehsegler

Description

@Stehsegler

🐛 Bug

From my understanding (based on e.g. #337), all transforms should be able to operate on tensors with dimensions (batch, channels, ...) or (channels, ...), with ... being dependent on the type of data being processed, e.g.time for waveform data and freq, time for spectrograms. In this way, we can pass multiple chunks of data (waveforms, spectrograms…) at once and expect to get the same results as if we would pass them one by one.

However, this is not the case transforms.AmplitudeToDB: As easily traceable in the source code of the corresponding functional, this transform blindly operates on the passed tensor without taking its dimensionality and the related semantics into account in any way.

This becomes a problem in the calculation of the cut-off. The purpose of this step is to clamp low dB values to a minimum value some fixed amount of decibels below the maximum value in the respective spectrogram. However, amplitude_to_db uses the single global maximum in the passed tensor to calculate the cut-off for all contained spectrograms. Thus, when passing batched data, the results for one spectrogram are dependent on all the other spectrograms in the same batch, which to my understanding is not the correct behavior. My conclusion is that AmplitudeToDB silently outputs wrong (in the sense of transforms general interface contract) data for batched or multi-channel data, which I would consider really dangerous for applications.

Ideally, this should be fixed directly in functional.amplitude_to_DB, so that we can also pass batched data there.

Environment

TorchAudio 0.7.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions