When we load a sound file with torchaudio, we get an output Tensor of size (L x C) (L the number of audio frames and C the number of channels).
It's not a better idea to get a Tensor of shape (C x L) ?
Because, with torchvision, when I load an image and I use the function ToTensor, the dimension of the output tensor is (C x H x W) with the channel in first dimension. Is not it more coherent if the output of the load function in torchaudio use a similar output shape to the ToTensor function ?
In this case, the functions LC2CL and BLC2CBL are no longer necessary.