Skip to content

Fbank features are different from Kaldi Fbank #400

@jooan84

Description

@jooan84

🐛 Bug

The output of the fbank feature calculations differs from that of kaldi.

To Reproduce

Steps to reproduce the behavior:

using the following or even the defaults parameters:

 torchaudio.compliance.kaldi.fbank(waveform, blackman_coeff=0.42, channel=-1, dither=1.0, energy_floor=0.0, frame_length=25.0, frame_shift=10.0, high_freq=0.0, htk_compat=True, low_freq=20.0, min_duration=0.0, num_mel_bins=40, preemphasis_coefficient=0.97, raw_energy=True, remove_dc_offset=True, round_to_power_of_two=True, sample_frequency=16000.0, snip_edges=True, subtract_mean=False, use_energy=False, use_log_fbank=True,use_power=True, vtln_high=-500.0, vtln_low=100.0, vtln_warp=1.0, window_type='hamming')[0]

produce this output:

tensor([-0.7616, -0.4791,  0.2155,  0.7661,  2.0723,  1.4565,  2.9888,  3.2548,
         1.8460,  3.5807,  3.8290,  4.1785,  4.6776,  4.5801,  5.3610,  4.4910,
         5.1519,  5.3534,  5.2783,  5.6159,  6.0689,  5.5961,  5.8068,  5.0957,
         6.5200,  6.9314,  6.1741,  7.0430,  7.9394,  8.2380,  8.7115,  8.4105,
         8.3154,  8.2186,  7.9444,  8.4468,  8.4293,  8.9476,  9.1008,  9.2495])

with compute_fbank_feats of Kaldi

tensor([12.9911, 12.9795, 12.9127, 13.6171, 13.7416, 15.1579, 15.1996, 14.9468,
        14.1368, 14.8717, 14.8265, 13.8715, 15.2716, 15.0743, 15.2439, 15.3904,
        13.9460, 13.5932, 14.0038, 14.8721, 13.9944, 15.8337, 14.8682, 13.8247,
        15.0769, 15.1141, 15.1482, 14.7864, 13.6259, 14.4092, 14.1771, 13.6139,
        13.8014, 12.5796,  9.1051,  8.3382,  8.3738,  8.7829,  9.2973,  9.4913])

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions