-
Notifications
You must be signed in to change notification settings - Fork 873
add audio doc #5299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add audio doc #5299
Changes from all commits
9a75a1c
b296fb4
ee6cddd
4c7d598
ddb2e0d
1a18879
d710ee4
d869c6c
d56b1b0
b9d3488
fc6bde2
d3f8ee7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,41 @@ | ||
| .. _cn_overview_callbacks: | ||
|
|
||
| paddle.audio | ||
| --------------------- | ||
|
|
||
| paddle.audio 目录是飞桨在语音领域的高层 API。具体如下: | ||
|
|
||
| - :ref:`音频特征相关 API <about_features>` | ||
| - :ref:`音频处理基础函数相关 API <about_functional>` | ||
|
|
||
| .. _about_features: | ||
|
|
||
| 音频特征相关 API | ||
| :::::::::::::::::::: | ||
|
|
||
| .. csv-table:: | ||
| :header: "API 名称", "API 功能" | ||
| :widths: 10, 30 | ||
|
|
||
| " :ref:`LogMelSpectrogram <cn_api_audio_features_LogMelSpectrogram>` ", "计算语音特征 LogMelSpectrogram" | ||
| " :ref:`MelSpectrogram <cn_api_audio_features_MelSpectrogram>` ", "计算语音特征 MelSpectrogram" | ||
| " :ref:`MFCC <cn_api_audio_features_MFCC>` ", "计算语音特征 MFCC" | ||
| " :ref:`Spectrogram <cn_api_audio_features_Spectrogram>` ", "计算语音特征 Spectrogram" | ||
|
|
||
| .. _about_functional: | ||
|
|
||
| 音频处理基础函数相关 API | ||
| :::::::::::::::::::: | ||
|
|
||
| .. csv-table:: | ||
| :header: "API 名称", "API 功能" | ||
| :widths: 10, 30 | ||
|
|
||
| " :ref:`compute_fbank_matrix <cn_api_audio_functional_compute_fbank_matrix>` ", "计算 fbank 矩阵" | ||
| " :ref:`create_dct <cn_api_audio_functional_create_dct>` ", "计算离散余弦变化矩阵" | ||
| " :ref:`fft_frequencies <cn_api_audio_functional_fft_frequencies>` ", "计算离散傅里叶采样频率" | ||
| " :ref:`hz_to_mel<cn_api_audio_functional_hz_to_mel>` ", "转换 hz 频率为 mel 频率" | ||
| " :ref:`mel_to_hz<cn_api_audio_functional_mel_to_hz>` ", "转换 mel 频率为 hz 频率" | ||
| " :ref:`mel_frequencies<cn_api_audio_functional_mel_frequencies>` ", "计算 mel 频率" | ||
| " :ref:`power_to_db<cn_api_audio_functional_power_to_db>` ", "转换能量谱为分贝" | ||
| " :ref:`get_window<cn_api_audio_functional_get_window>` ", "得到各种窗函数" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,40 @@ | ||
| .. _cn_api_audio_features_LogMelSpectrogram: | ||
|
|
||
| LogMelSpectrogram | ||
| ------------------------------- | ||
|
|
||
| .. py:class:: paddle.audio.features.LogMelSpectrogram(sr=22050, n_fft=2048, hop_length=512, win_length=None, window='hann', power=2.0, center=True, pad_mode='reflect', n_mels=64, f_min=50.0, f_max=None, htk=False, norm='slaney', ref_value=1.0, amin=1e-10, top_db=None, dtype='float32') | ||
|
|
||
| 计算给定信号的 log-mel 谱。 | ||
|
|
||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 建议不要加了,这是信号处理常用特征,直接看源码,比公式更加直接。 |
||
| 参数 | ||
SmileGoat marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| :::::::::::: | ||
|
|
||
| - **sr** (int) - 采样率,默认 22050。 | ||
| - **n_fft** (int) - 离散傅里叶变换中频率窗大小,默认 512。 | ||
| - **hop_length** (int,可选) - 帧移,默认 512。 | ||
| - **win_length** (int,可选) - 短时 FFT 的窗长,默认为 None。 | ||
| - **window** (str) - 窗函数名,默认'hann'。 | ||
| - **power** (float) - 幅度谱的指数。 | ||
| - **center** (bool) - 对输入信号填充,如果 True,那么 t 以 t*hop_length 为中心,如果为 False,则 t 以 t*hop_length 开始。 | ||
| - **pad_mode** (str) - 如果 center 是 True,选择填充的方式,默认值是'reflect'。 | ||
| - **n_mels** (int) - mel bins 的数目。 | ||
| - **f_min** (float,可选) - 最小频率(hz),默认 50.0。 | ||
| - **f_max** (float,可选) - 最大频率(hz),默认为 None。 | ||
| - **htk** (bool,可选) - 在计算 fbank 矩阵时是否用在 HTK 公式缩放. | ||
| - **norm** (Union[str,float],可选) - 计算 fbank 矩阵时正则化的种类,默认是'slaney',你也可以 norm=0.5,使用 p-norm 正则化. | ||
| - **ref_value** (float) - 参照值,如果小于 1.0,信号的 db 会被提升,相反 db 会下降,默认值为 1.0. | ||
| - **amin** (float) - 输入的幅值的最小值. | ||
| - **top_db** (float,可选) - log-mel 谱的最大值(db). | ||
| - **dtype** (str) - 输入和窗的数据类型,默认是'float32'. | ||
|
|
||
|
|
||
| 返回 | ||
| ::::::::: | ||
|
|
||
| 计算``LogMelSpectrogram``的可调用对象. | ||
|
|
||
| 代码示例 | ||
| ::::::::: | ||
|
|
||
| COPY-FROM: paddle.audio.features.layers.LogMelSpectrogram | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,40 @@ | ||
| .. _cn_api_audio_features_MFCC: | ||
|
|
||
| MFCC | ||
| ------------------------------- | ||
|
|
||
| .. py:class:: paddle.audio.features.MFCC(sr=22050, n_mfcc=40, n_fft=2048, hop_length=512, win_length=None, window='hann', power=2.0, center=True, pad_mode='reflect', n_mels=64, f_min=50.0, f_max=None, htk=False, norm='slaney', ref_value=1.0, amin=1e-10, top_db=None, dtype='float32') | ||
|
|
||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 有公式,请补充公式。方便用户理解这个方法 |
||
| 计算给定信号的 MFCC。 | ||
|
|
||
| 参数 | ||
SmileGoat marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| :::::::::::: | ||
|
|
||
| - **sr** (int,可选) - 采样率,默认 22050。 | ||
| - **n_mfcc** (int,可选) - mfcc 的维度,默认 40。 | ||
| - **n_fft** (int) - 离散傅里叶变换中频率窗大小,默认 512。 | ||
| - **hop_length** (int,可选) - 帧移,默认 512。 | ||
| - **win_length** (int,可选) - 短时 FFT 的窗长,默认为 None。 | ||
| - **window** (str) - 窗函数名,默认'hann'。 | ||
| - **power** (float) - 幅度谱的指数。 | ||
| - **center** (bool) - 对输入信号填充,如果 True,那么 t 以 t*hop_length 为中心,如果为 False,则 t 以 t*hop_length 开始。 | ||
| - **pad_mode** (str) - 如果 center 是 True,选择填充的方式,默认值是'reflect'. | ||
| - **n_mels** (int) - mel bins 的数目。 | ||
| - **f_min** (float,可选) - 最小频率(hz),默认 50.0。 | ||
| - **f_max** (float,可选) - 最大频率(hz),默认为 None。 | ||
| - **htk** (bool,可选) - 在计算 fbank 矩阵时是否用在 HTK 公式缩放。 | ||
| - **norm** (Union[str, float], optional) - 计算 fbank 矩阵时正则化的种类,默认是'slaney',你也可以 norm=0.5,使用 p-norm 正则化。 | ||
| - **ref_value** (float) - 参照值, 如果小于 1.0,信号的 db 会被提升, 相反 db 会下降, 默认值为 1.0。 | ||
| - **amin** (float) - 输入的幅值的最小值。 | ||
| - **top_db** (float,可选) - log-mel 谱的最大值(db)。 | ||
| - **dtype** (str) - 输入和窗的数据类型,默认是'float32'。 | ||
|
|
||
| 返回 | ||
| ::::::::: | ||
|
|
||
| 计算``MFCC``的可调用对象。 | ||
|
|
||
| 代码示例 | ||
| ::::::::: | ||
|
|
||
| COPY-FROM: paddle.audio.features.layers.MFCC | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,37 @@ | ||
| .. _cn_api_audio_features_MelSpectrogram: | ||
|
|
||
| MelSpectrogram | ||
| ------------------------------- | ||
|
|
||
| .. py:class:: paddle.audio.features.MelSpectrogram(sr=22050, n_fft=2048, hop_length=512, win_length=None, window='hann', power=2.0, center=True, pad_mode='reflect', n_mels=64, f_min=50.0, f_max=None, htk=False, norm='slaney', dtype='float32') | ||
|
|
||
| 求得给定信号的 Mel 谱。 | ||
|
|
||
| 参数 | ||
| :::::::::::: | ||
|
|
||
| - **sr** (int,可选) - 采样率,默认 22050。 | ||
| - **n_fft** (int) - 离散傅里叶变换中频率窗大小,默认 512。 | ||
| - **hop_length** (int,可选) - 帧移,默认 512。 | ||
| - **win_length** (int,可选) - 短时 FFT 的窗长,默认为 None。 | ||
| - **window** (str) - 窗函数名,默认'hann'。 | ||
| - **power** (float) - 幅度谱的指数。 | ||
| - **center** (bool) - 对输入信号填充,如果 True,那么 t 以 t*hop_length 为中心,如果为 False,则 t 以 t*hop_length 开始。 | ||
| - **pad_mode** (str) - 如果 center 是 True,选择填充的方式.默认值是'reflect'。 | ||
| - **n_mels** (int) - mel bins 的数目。 | ||
| - **f_min** (float,可选) - 最小频率(hz),默认 50.0。 | ||
| - **f_max** (float,可选) - 最大频率(hz),默认为 None。 | ||
| - **htk** (bool,可选) - 在计算 fbank 矩阵时是否用在 HTK 公式缩放。 | ||
| - **norm** (Union[str,float],可选) -计算 fbank 矩阵时正则化的种类,默认是'slaney',你也可以 norm=0.5,使用 p-norm 正则化。 | ||
| - **dtype** (str) - 输入和窗的数据类型,默认是'float32'。 | ||
|
|
||
|
|
||
| 返回 | ||
| ::::::::: | ||
|
|
||
| 计算``MelSpectrogram``的可调用对象。 | ||
|
|
||
| 代码示例 | ||
| ::::::::: | ||
|
|
||
| COPY-FROM: paddle.audio.features.MelSpectrogram |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,30 @@ | ||
| .. _cn_api_audio_features_Spectrogram: | ||
|
|
||
| Spectrogram | ||
| ------------------------------- | ||
|
|
||
| .. py:class:: paddle.audio.features.Spectrogram(n_fft=512, hop_length=512, win_length=None, window='hann', power=1.0, center=True, pad_mode='reflect', dtype='float32') | ||
|
|
||
| 通过给定信号的短时傅里叶变换得到频谱。 | ||
|
|
||
| 参数 | ||
| :::::::::::: | ||
|
|
||
| - **n_fft** (int) - 离散傅里叶变换中频率窗大小,默认 512。 | ||
| - **hop_length** (int,可选) - 帧移,默认 512。 | ||
| - **win_length** (int,可选) - 短时 FFT 的窗长,默认为 None。 | ||
| - **window** (str) - 窗函数名,默认'hann'。 | ||
| - **power** (float) - 幅度谱的指数。 | ||
| - **center** (bool) - 对输入信号填充,如果 True,那么 t 以 t*hop_length 为中心,如果为 False,则 t 以 t*hop_length 开始。 | ||
| - **pad_mode** (str) - 如果 center 是 True,选择填充的方式.默认值是'reflect'。 | ||
| - **dtype** (str) - 输入和窗的数据类型,默认是'float32'。 | ||
|
|
||
|
|
||
| 返回 | ||
| ::::::::: | ||
|
|
||
| 计算``Spectrogram``的可调用对象. | ||
|
|
||
| 代码示例 | ||
| ::::::::: | ||
| COPY-FROM: paddle.audio.features.Spectrogram |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,30 @@ | ||
| .. _cn_api_audio_functional_compute_fbank_matrix: | ||
|
|
||
| compute_fbank_matrix | ||
| ------------------------------- | ||
|
|
||
| .. py:function:: paddle.audio.functional.compute_fbank_matrix(sr, n_fft, n_mels=64, f_min=0.0, f_max=None, htk=False, nrom='slaney', dtype='float32') | ||
|
|
||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| 计算 mel 变换矩阵。 | ||
|
|
||
| 参数 | ||
| :::::::::::: | ||
|
|
||
| - **sr** (int) - 采样率。 | ||
| - **n_fft** (int) - fft bins 的数目。 | ||
| - **n_mels** (float) - mels bins 的数目。 | ||
| - **f_min** (float) - 最小频率(hz)。 | ||
| - **f_max** (Optional[float]) -最大频率(hz)。 | ||
| - **htk** (bool) -是否使用 htk 缩放。 | ||
| - **norm** (Union[str,float]) -norm 的类型,默认是'slaney'。 | ||
| - **dtype** (str) - 返回矩阵的数据类型,默认'float32'。 | ||
|
|
||
| 返回 | ||
| ::::::::: | ||
|
|
||
| ``paddle.Tensor``,Tensor shape (n_mels, n_fft//2 + 1)。 | ||
|
|
||
| 代码示例 | ||
| ::::::::: | ||
|
|
||
| COPY-FROM: paddle.audio.functional.compute_fbank_matrix | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,26 @@ | ||
| .. _cn_api_audio_functional_create_dct: | ||
|
|
||
| create_dct | ||
| ------------------------------- | ||
|
|
||
| .. py:function:: paddle.audio.functional.create_dct(n_mfcc, n_mels, norm='ortho', dtype='float32') | ||
|
|
||
| 计算离散余弦变换矩阵。 | ||
|
|
||
| 参数 | ||
| :::::::::::: | ||
|
|
||
| - **n_mfcc** (float) - mel 倒谱系数数目。 | ||
| - **n_mels** (int) - mel 的 fliterbank 数。 | ||
| - **norm** (float) - 正则化类型, 默认值是'ortho'。 | ||
| - **dtype** (str) - 默认'float32'。 | ||
|
|
||
| 返回 | ||
| ::::::::: | ||
|
|
||
| ``paddle.Tensor``,Tensor shape (n_mels, n_mfcc)。 | ||
|
|
||
| 代码示例 | ||
| ::::::::: | ||
|
|
||
| COPY-FROM: paddle.audio.functional.create_dct |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,25 @@ | ||
| .. _cn_api_audio_functional_fft_frequencies: | ||
|
|
||
| fft_frequencies | ||
| ------------------------------- | ||
|
|
||
| .. py:function:: paddle.audio.functional.fft_frequencies(sr, n_fft, dtype='float32') | ||
|
|
||
| 计算 fft 频率。 | ||
|
|
||
| 参数 | ||
| :::::::::::: | ||
|
|
||
| - **sr** (int) - 采样率。 | ||
| - **n_fft** (int) - fft bins 的数目。 | ||
| - **dtype** (str) - 默认'float32'。 | ||
|
|
||
| 返回 | ||
| ::::::::: | ||
|
|
||
| ``paddle.Tensor``,Tensor shape (n_fft//2 + 1,)。 | ||
|
|
||
| 代码示例 | ||
| ::::::::: | ||
|
|
||
| COPY-FROM: paddle.audio.functional.fft_frequencies |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,26 @@ | ||
| .. _cn_api_audio_functional_get_window: | ||
|
|
||
| get_window | ||
| ------------------------------- | ||
|
|
||
| .. py:function:: paddle.audio.functional.get_window(window, win_length, fftbins=True, dtype='float64') | ||
|
|
||
| 根据参数给出对应长度和类型的窗函数。 | ||
|
|
||
| 参数 | ||
| :::::::::::: | ||
|
|
||
| - **window** (str 或者 Tuple[str, float]) - 窗函数类型,或者(窗参数类型, 窗函数参数), 支持的窗函数类型'hamming', 'hann', 'kaiser', 'gaussian', 'exponential', 'triang', 'bohman', 'blackman', 'cosine', 'tukey', 'taylor'。 | ||
| - **win_length** (int) - 采样点数。 | ||
| - **fftbins** (bool) - 如果是 True,给出一个周期性的窗, 如果是 False 给出一个对称性的窗,默认是 True。 | ||
| - **dtype** (str) - 默认'float64'。 | ||
|
|
||
| 返回 | ||
| ::::::::: | ||
|
|
||
| ``paddle.Tensor``,对应窗表征的 Tensor 。 | ||
|
|
||
| 代码示例 | ||
| ::::::::: | ||
|
|
||
| COPY-FROM: paddle.audio.functional.get_window |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,24 @@ | ||
| .. _cn_api_audio_functional_hz_to_mel: | ||
|
|
||
| hz_to_mel | ||
| ------------------------------- | ||
|
|
||
| .. py:function:: paddle.audio.functional.hz_to_mel(feq, htk=False) | ||
|
|
||
| 转换 Hz 为 Mels。 | ||
|
|
||
| 参数 | ||
| :::::::::::: | ||
|
|
||
| - **freq** (Tensor, float) - 输入 tensor。 | ||
| - **htk** (bool) - 是否使用 htk 缩放, 默认 False。 | ||
|
|
||
| 返回 | ||
| ::::::::: | ||
|
|
||
| ``paddle.Tensor 或 float``, mels 值。 | ||
|
|
||
| 代码示例 | ||
| ::::::::: | ||
|
|
||
| COPY-FROM: paddle.audio.functional.hz_to_mel |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,27 @@ | ||
| .. _cn_api_audio_functional_mel_frequencies: | ||
|
|
||
| mel_frequencies | ||
| ------------------------------- | ||
|
|
||
| .. py:function:: paddle.audio.functional.mel_frequencies(n_mels=64, f_min=0.0, f_max=11025, htk=False, dtype='float32') | ||
|
|
||
| 计算 Mels 频率。 | ||
|
|
||
| 参数 | ||
| :::::::::::: | ||
|
|
||
| - **n_mels** (int) - 输入 tensor, 默认 64。 | ||
| - **f_min** (float) - 最小频率(hz), 默认 0.0。 | ||
| - **f_max** (float) - 最大频率(hz), 默认 11025.0。 | ||
| - **htk** (bool) - 是否使用 htk 缩放, 默认 False。 | ||
| - **dtype** (str) - 默认'float32'。 | ||
|
|
||
| 返回 | ||
| ::::::::: | ||
|
|
||
| ``paddle.Tensor``,Tensor shape (n_mels,)。 | ||
|
|
||
| 代码示例 | ||
| ::::::::: | ||
|
|
||
| COPY-FROM: paddle.audio.functional.mel_frequencies |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,24 @@ | ||
| .. _cn_api_audio_functional_mel_to_hz: | ||
|
|
||
| mel_to_hz | ||
| ------------------------------- | ||
|
|
||
| .. py:function:: paddle.audio.functional.mel_to_hz(feq, htk=False) | ||
|
|
||
| 转换 Mels 为 Hz。 | ||
|
|
||
| 参数 | ||
| :::::::::::: | ||
|
|
||
| - **mel** (Tensor, float) - 输入 tensor。 | ||
| - **htk** (bool) - 是否使用 htk 缩放, 默认 False。 | ||
|
|
||
| 返回 | ||
| ::::::::: | ||
|
|
||
| ``paddle.Tensor 或 float``, hz 为单位的频率。 | ||
|
|
||
| 代码示例 | ||
| ::::::::: | ||
|
|
||
| COPY-FROM: paddle.audio.functional.mel_to_hz |




There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LogMelSpectrogram 有这么多参数,需要写明,以及源代码。

是class类的话,应该参考这么写:
总之,需要齐全。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
部分是没有默认参数的,有默认参数已经添加,源代码链接不知道是什么回事。