Skip to content

Consistency between torchvision/torchaudio #29

@bchamand

Description

@bchamand

When we load a sound file with torchaudio, we get an output Tensor of size (L x C) (L the number of audio frames and C the number of channels).
It's not a better idea to get a Tensor of shape (C x L) ?

Because, with torchvision, when I load an image and I use the function ToTensor, the dimension of the output tensor is (C x H x W) with the channel in first dimension. Is not it more coherent if the output of the load function in torchaudio use a similar output shape to the ToTensor function ?

In this case, the functions LC2CL and BLC2CBL are no longer necessary.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions