Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
FusedMultiHeadAttention
-------------------------------

.. py:class:: paddle.incubate.nn.FusedMultiHeadAttention(embed_dim, num_heads, dropout_rate=0.5, attn_dropout_rate=0.5, kdim=None, vdim=None, normalize_before=False, need_weights=False, weight_attr=None, bias_attr=None, name=None)
.. py:class:: paddle.incubate.nn.FusedMultiHeadAttention(embed_dim, num_heads, dropout_rate=0.5, attn_dropout_rate=0.5, kdim=None, vdim=None, normalize_before=False, need_weights=False, weight_attr=None, bias_attr=None, epsilon=1e-5, name=None)



Expand Down Expand Up @@ -31,6 +31,7 @@ FusedMultiHeadAttention
- **need_weights** (bool, 可选) - 表明是否返回注意力权重。默认值: ``False`` 。
- **weight_attr** (ParamAttr,可选) - 指定权重参数属性的对象。默认值: ``None`` ,表示使用默认的权重参数属性。具体用法请参见 :ref:`cn_api_fluid_ParamAttr` 。
- **bias_attr** (ParamAttr,可选)- 指定偏置参数属性的对象。默认值: ``None`` ,表示使用默认的偏置参数属性。具体用法请参见 :ref:`cn_api_fluid_ParamAttr` 。
- **epsilon** (float, 可选) - 为了数值稳定加在分母上的值。默认值:1e-05。
- **name** (str,可选) - 操作的名称。默认值为: ``None`` 。更多信息请参见 :ref:`api_guide_Name`。

形状
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
fused_feedforward
-------------------------------

.. py:function:: paddle.incubate.nn.functional.fused_feedforward(x, linear1_weight, linear2_weight, linear1_bias=None, linear2_bias=None, ln1_scale=None, ln1_bias=None, ln2_scale=None, ln2_bias=None, dropout1_rate=0.5, dropout2_rate=0.5,activation="relu", ln1_epsilon=1e-5, ln2_epsilon=1e-5, pre_layer_norm=False, name=None):
.. py:function:: paddle.incubate.nn.functional.fused_feedforward(x, linear1_weight, linear2_weight, linear1_bias=None, linear2_bias=None, ln1_scale=None, ln1_bias=None, ln2_scale=None, ln2_bias=None, dropout1_rate=0.5, dropout2_rate=0.5,activation="relu", ln1_epsilon=1e-5, ln2_epsilon=1e-5, pre_layer_norm=False, training=True, mode='upscale_in_train', name=None):

这是一个融合算子,该算子是对transformer模型中feed forward层的多个算子进行融合,该算子只支持在GPU下运行,该算子与如下伪代码表达一样的功能:

Expand Down Expand Up @@ -33,6 +33,19 @@ fused_feedforward
- **ln1_epsilon** (float, 可选) - 一个很小的浮点数,被第一个layer_norm算子加到分母,避免出现除零的情况。默认值是1e-5。
- **ln2_epsilon** (float, 可选) - 一个很小的浮点数,被第二个layer_norm算子加到分母,避免出现除零的情况。默认值是1e-5。
- **pre_layer_norm** (bool, 可选) - 在预处理阶段加上layer_norm,或者在后处理阶段加上layer_norm。默认值是False。
- **training** (bool): 标记是否为训练阶段。 默认: True。
- **mode** (str): 丢弃单元的方式,有两种'upscale_in_train'和'downscale_in_infer',默认: 'upscale_in_train'。计算方法如下:

1. upscale_in_train, 在训练时增大输出结果。

- train: out = input * mask / ( 1.0 - p )
- inference: out = input

2. downscale_in_infer, 在预测时减小输出结果

- train: out = input * mask
- inference: out = input * (1.0 - p)

- **name** (string, 可选) – fused_feedforward的名称, 默认值为None。更多信息请参见 :ref:`api_guide_Name` 。

返回
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
fused_multi_head_attention
-------------------------------

.. py:function:: paddle.incubate.nn.functional.fused_multi_head_attention(x, qkv_weight, linear_weight, pre_layer_norm=False, pre_ln_scale=None, pre_ln_bias=None, ln_scale=None, ln_bias=None, pre_ln_epsilon=1e-05, qkv_bias=None, linear_bias=None, attn_mask=None, dropout_rate=0.5, attn_dropout_rate=0.5, ln_epsilon=1e-05, name=None)
.. py:function:: paddle.incubate.nn.functional.fused_multi_head_attention(x, qkv_weight, linear_weight, pre_layer_norm=False, pre_ln_scale=None, pre_ln_bias=None, ln_scale=None, ln_bias=None, pre_ln_epsilon=1e-05, qkv_bias=None, linear_bias=None, attn_mask=None, dropout_rate=0.5, attn_dropout_rate=0.5, ln_epsilon=1e-05, traing=True, mode='upscale_in_train', name=None)

**多头注意力机制**

Expand Down Expand Up @@ -33,7 +33,10 @@ fused_multi_head_attention 算子目前只支持在GPU下运行,其包含的
out = out * v
out = transpose(out, perm=[0, 2, 1, 3])
out = out_linear(out)
out = layer_norm(x + dropout(linear_bias + out))
if pre_layer_norm:
out = x + dropout(linear_bias + out)
else:
out = layer_norm(x + dropout(linear_bias + out))


值得注意的是,该API中,q, k, v 的 weight 被统一存储在一个权重张量中,形状为 `[3, num_heads, head_dim, embed_dim]` ,
Expand All @@ -57,6 +60,18 @@ fused_multi_head_attention 算子目前只支持在GPU下运行,其包含的
- **dropout_rate** (float, 可选) - 代表 multi-head attention 之后的 dropout 算子的 dropout 比例,默认为0.5。
- **attn_dropout_rate** (float, 可选) - 代表 multi-head attention 中的 dropout 算子的 dropout 比例,默认为0.5。
- **ln_epsilon** (float, 可选) - 代表 normalize_before 为True 时,multi-head attention 中第二个 (False时的第一个) ``layer_norm`` 为了数值稳定加在分母上的值。默认值为 1e-05 。
- **training** (bool): 标记是否为训练阶段。 默认: True。
- **mode** (str): 丢弃单元的方式,有两种'upscale_in_train'和'downscale_in_infer',默认: 'upscale_in_train'。计算方法如下:

1. upscale_in_train, 在训练时增大输出结果。

- train: out = input * mask / ( 1.0 - p )
- inference: out = input

2. downscale_in_infer, 在预测时减小输出结果

- train: out = input * mask
- inference: out = input * (1.0 - p)
- **name** (str, 可选) - 操作的名称(可选,默认值为 ``None`` )。更多信息请参见 :ref:`api_guide_Name`。

返回
Expand Down