-
Notifications
You must be signed in to change notification settings - Fork 738
Open
Description
🐛 Bug
The output of the fbank feature calculations differs from that of kaldi.
To Reproduce
Steps to reproduce the behavior:
using the following or even the defaults parameters:
torchaudio.compliance.kaldi.fbank(waveform, blackman_coeff=0.42, channel=-1, dither=1.0, energy_floor=0.0, frame_length=25.0, frame_shift=10.0, high_freq=0.0, htk_compat=True, low_freq=20.0, min_duration=0.0, num_mel_bins=40, preemphasis_coefficient=0.97, raw_energy=True, remove_dc_offset=True, round_to_power_of_two=True, sample_frequency=16000.0, snip_edges=True, subtract_mean=False, use_energy=False, use_log_fbank=True,use_power=True, vtln_high=-500.0, vtln_low=100.0, vtln_warp=1.0, window_type='hamming')[0]
produce this output:
tensor([-0.7616, -0.4791, 0.2155, 0.7661, 2.0723, 1.4565, 2.9888, 3.2548,
1.8460, 3.5807, 3.8290, 4.1785, 4.6776, 4.5801, 5.3610, 4.4910,
5.1519, 5.3534, 5.2783, 5.6159, 6.0689, 5.5961, 5.8068, 5.0957,
6.5200, 6.9314, 6.1741, 7.0430, 7.9394, 8.2380, 8.7115, 8.4105,
8.3154, 8.2186, 7.9444, 8.4468, 8.4293, 8.9476, 9.1008, 9.2495])
with compute_fbank_feats of Kaldi
tensor([12.9911, 12.9795, 12.9127, 13.6171, 13.7416, 15.1579, 15.1996, 14.9468,
14.1368, 14.8717, 14.8265, 13.8715, 15.2716, 15.0743, 15.2439, 15.3904,
13.9460, 13.5932, 14.0038, 14.8721, 13.9944, 15.8337, 14.8682, 13.8247,
15.0769, 15.1141, 15.1482, 14.7864, 13.6259, 14.4092, 14.1771, 13.6139,
13.8014, 12.5796, 9.1051, 8.3382, 8.3738, 8.7829, 9.2973, 9.4913])
Metadata
Metadata
Assignees
Labels
No labels