Consistency between torchvision/torchaudio

When we load a sound file with torchaudio, we get an output Tensor of size (L x C) (L the number of audio frames and C the number of channels).
It's not a better idea to get a Tensor of shape (C x L) ?

Because, with torchvision, when I load an image and I use the function _ToTensor_, the dimension of the output tensor is (C x H x W) with the channel in first dimension. Is not it more coherent if the output of the _load_ function in torchaudio use a similar output shape to the _ToTensor_ function ?

In this case, the functions _LC2CL_ and BLC2CBL are no longer necessary.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Consistency between torchvision/torchaudio #29

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Consistency between torchvision/torchaudio #29

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions