From c5784d09347422f3707baf7dd87331374830ea5d Mon Sep 17 00:00:00 2001
From: Chen Long <1300851984@qq.com>
Date: Sat, 30 Apr 2022 15:57:51 +0800
Subject: [PATCH 01/11] Update release_note_cn.md

---
 docs/release_note_cn.md | 2931 ++++++++++++++++++++++++++-------------
 1 file changed, 1938 insertions(+), 993 deletions(-)

diff --git a/docs/release_note_cn.md b/docs/release_note_cn.md
index 8de497c26dc..0509180df06 100644
--- a/docs/release_note_cn.md
+++ b/docs/release_note_cn.md
@@ -1,1192 +1,2137 @@
-﻿
-# 2.2.2 Release Note
+
+# 2.3.0-rc0 Release Note
 
 ## 1. 重要更新
 
-我们很高兴的发布飞桨框架2.2.2版本，主要是对2.2.1中一些功能和性能问题的修复，并对部分功能点做了增强。
+我们很高兴地发布飞桨框架 2.3.0-rc0 版本，本版本包含如下重要更新。
 
-## 2. 训练框架（含分布式）
+### API
 
-### （1）新功能
+- 新增 100 多个 API，覆盖自动微分、线性代数、概率分布、稀疏张量、框架性能分析、硬件设备管理、视觉领域等方面。
 
-#### API
+- 新增 4 个自动微分 API，11 个线性代数 API，21 个概率分布类 API，更好地支持科学计算、强化学习等场景。
 
- - 新增`paddle.nn.Mish` 和 `paddle.nn.functional.mish`，支持逐元素计算mish激活函数。 ([#38803](https://github.com/PaddlePaddle/Paddle/pull/38803)) 
+- 新增 11 个 稀疏张量计算 API，支持创建 COO、CRS 格式的 Sparse Tensor 以及与 Tensor 互相转换等基础功能。
 
-#### 其他
+- 新增 9 个框架性能分析 API，以`paddle.profiler.Profiler`为核心，提供对训练、推理过程中性能数据的收集、导出和统计的功能。
 
--  `paddle.nn.PReLU` 、 `paddle.nn.functional.prelu`、`paddle.nn.static.prelu` 新增支持 `data_format` 参数，可以设置输入的数据类型。 ([#38495](https://github.com/PaddlePaddle/Paddle/pull/38495)) 
--  `paddle.index_select` 新增支持 `float16` 数据类型。([#38751](https://github.com/PaddlePaddle/Paddle/pull/38751)) 
--  优化 ``paddle.multiplex`` 当``inputs``中张量 `size` 为 0 时的报错信息。([#38757](https://github.com/PaddlePaddle/Paddle/pull/38757)) 
--  `paddle.fluid.contrib.slim.quantization.PostTrainingQuantization` 新增初始化参数`data_loader`，支持传入 `paddle.io.DataLoader` 对象或者`Python Generator` 。([#38729](https://github.com/PaddlePaddle/Paddle/pull/38729)) 
+- 新增 7 个硬件设备管理 API，更好支持硬件相关信息获取。
 
+- 新增多个视觉、文本领域 API，方便复用 MobileNetV3, ResNeXt等骨干网络，实现快速组网。
 
-### （2）问题修复
+### 飞桨高可复用算子库 PHI
 
-#### API
+- 发布飞桨高可复用算子库 PHI (Paddle HIgh reusability operator library)，支持组合式算子功能复用、Primitive算子内核复用、插件式硬件加速库复用。针对飞桨框架原算子库存在的算子接口不清晰、算子复用成本较高、调用性能不够快的问题，我们重构了飞桨框架的算子库，设计了灵活、高效的函数式算子库 Phi，可以通过对函数式算子接口组合调用的方式实现新算子。新算子库提供了 200 余个跟 python 开发接口保持一致的 C++ 运算类 API，以及近500个可供组合调用的前、反向函数式算子内核 Kernel，可大幅降低框架原生算子和自定义算子的开发成本。新算子库支持Primitive API方式开发算子内核，可支持不同硬件（比如GPU和XPU）的算子内核复用。新算子库支持以插件方式接入硬件（比如NPU）的加速库，实现低成本复用硬件加速库。
 
-- 修复`paddle.max`在输入`x.ndim > 6 and axis < 0`时运行出错的问题。([#38070](https://github.com/PaddlePaddle/Paddle/pull/38070)) 
-- 修复`paddle.max`、`paddle.min`的bug：在CPU设备上，当参数axis是list类型且`len(axis) == x.ndim and axis[i] < 0`时，结果出错。([#38478](https://github.com/PaddlePaddle/Paddle/pull/38478)) 
-- 修复``paddle.nn.functional.unfold``在InferShape计算时不区分compile time和runtime的问题。([#38925](https://github.com/PaddlePaddle/Paddle/pull/38925))
-- 修复`paddle.nn.functional.cross_entropy`在对`labels`进行检查时，存在不必要的GPU与CPU同步的问题。（[#38849](https://github.com/PaddlePaddle/Paddle/pull/38849)）
-- 修复`paddle.distributed.split`在沿列切分FC时，反向计算时得到的输入梯度结果异常的问题。([#38724](https://github.com/PaddlePaddle/Paddle/pull/38724)) 
-- 修复 `paddle.nn.Layer.to` 不支持 `paddle.dtype` 类型的问题。([#38108](https://github.com/PaddlePaddle/Paddle/pull/38108)) 
-- 修复静态图下 ``paddle.linalg.svd`` 当 ``full_matrics=True`` 时，输出tensor的shape在动态图和静态图下不同的问题。([#37744](https://github.com/PaddlePaddle/Paddle/pull/37744)) 
-- 修复`Tensor`切片索引使用多个`None`类型索引时结果维度异常的问题。([#37400](https://github.com/PaddlePaddle/Paddle/pull/37400)) 
-- 修复`Tensor`索引赋值在部分场景下显存泄露的问题。([#38098](https://github.com/PaddlePaddle/Paddle/pull/38098)) 
-- 修复模型使用 `save_inference_model` 导出后，添加反向 pass 做训练，`conv2d` 缺失属性报错的问题。 ([#38832](https://github.com/PaddlePaddle/Paddle/pull/38832)) 
+### 分布式训练
 
-#### IR(Intermediate Representation)
+- 全面升级自适应分布式训练架构，含弹性扩缩容、异步流水执行器、异构通信、自动并行等多个模块，支持了多种异构硬件下自动感知的分布式训练及分布式推理。
 
-- 动态图转静态图
-    - 修复了部分初始化相关 API 动静行为不统一的问题。([#37827](https://github.com/PaddlePaddle/Paddle/pull/37827)) 
-    - 修复动转静代码转写时会将 `paddle` 作为变量的问题。([#37999](https://github.com/PaddlePaddle/Paddle/pull/37999))
-    - 修复动转静代码转写时，突出的代码注释导致转写报错的问题。([#38003](https://github.com/PaddlePaddle/Paddle/pull/38003))
-    - 修复 `for ... zip...` 语句在动转静中死循环的问题。([#37846](https://github.com/PaddlePaddle/Paddle/pull/37846)) 
+- 动态图混合并行下新增MoE并行策略、GroupSharded 并行策略、Pure FP16 等，进一步支持了动态图下大模型的高效并行训练。
 
-- 模型量化
-    - 修复动态图量化训练导出模型多余节点问题。([#38122](https://github.com/PaddlePaddle/Paddle/pull/38122)) ([#38025](https://github.com/PaddlePaddle/Paddle/pull/38025)) 
-    - 针对量化模型在Paddle Lite上无法预测的问题，去除量化导出模型的 `clip_extra` 设置。 ([#38343](https://github.com/PaddlePaddle/Paddle/pull/38343)) 
-    - 针对 `flatten_contiguous_range` 算子在量化中输出配置错误的问题，修复 `flatten_contiguous_range` 量化设置。 ([#37741](https://github.com/PaddlePaddle/Paddle/pull/37741)) 
+- 全面升级优化了通用异构参数服务器架构，进行各模块的抽象简化，如通信、存储等，提升了参数服务器的二次开发体验；GPU 参数服务器在千亿参数百亿数据分钟级流式训练下性能提升2.38倍。
 
+### 编译安装
 
-#### 其他
+- 飞桨在 PIP 源上发布的默认安装包 CUDA 架构调整为11.2，如需安装其他 CUDA 版本的 PaddlePaddle，请移步[飞桨官网-安装](https://www.paddlepaddle.org.cn/install/quick)﻿进行下载安装。
 
-- 自定义OP
-    - 修复了自定义算子在多进程下加载Python API 时，可能因文件不完整导致报错的问题。([#38128](https://github.com/PaddlePaddle/Paddle/pull/38128)) 
-    - 修复了在CentOS平台上编译时，`D_GLIBCXX_USE_CXX11_ABI`未按预期生效导致的编译失败问题。([#37878](https://github.com/PaddlePaddle/Paddle/pull/37878)) 
+- 从 2.3.0-rc0 版本开始，飞桨对框架支持的 GPU 架构种类进行了调整和升级。
 
-- 动态图Inplace策略
-    - 修复了多个inplace op连续执行时，accumulator 报错的问题。([#38406](https://github.com/PaddlePaddle/Paddle/pull/38406))
-    - 修复了 `Tensor` 的 `setitem` 方法，对叶子节点进行inplace操作时，导致反向图构建错误的bug。([#38014](https://github.com/PaddlePaddle/Paddle/pull/38014)) 
+### 推理部署
 
-- NHWC  策略
-    -  修复 batchnorm_op 中，当数据类型为 FP32 ，且数据维度 dims = 2，data_layout = NHWC 时，反向 Op 内中间变量未定义问题。 ([#37020](https://github.com/PaddlePaddle/Paddle/pull/37020)) 
+- 新增 Java API 和 ONNX Runtime CPU 后端。
 
+- 支持 TensorRT 8.0 / 8.2 和结构化稀疏，针对 ERNIE 类结构模型性能深度优化。
 
+### 硬件适配
 
+- 新增自定义新硬件接入：提供一种插件式扩展 PaddlePaddle 硬件后端的方式。
 
-## 3. 部署方向（Paddle Inference）
+- 新增对华为昇腾910 / GraphCore IPU / 寒武纪MLU / 昆仑芯2代多种异构芯片的训练/推理支持。
 
+### 框架架构
 
+- 这个版本中，我们在框架的执行器也做了大量工作，详情请见：[新动态图执行机制](https://ku.baidu-int.com/knowledge/HFVrC7hq1Q/pKzJfZczuc/7UhIeLfrn3/0rDW-MD4RXSfkx#anchor-088a55e0-b962-11ec-a8b3-f52dfa102ded) 与 [全新静态图执行器](https://ku.baidu-int.com/knowledge/HFVrC7hq1Q/pKzJfZczuc/7UhIeLfrn3/0rDW-MD4RXSfkx#anchor-e81120c0-c233-11ec-a2f2-c9306d79e3c2)。
 
-### （1）功能优化
+### 其他备注（发布时要删除）
 
-#### 框架及API更新
+> 我这面好像rc没有特别重要的了，性能自动优化得正式版才能发，编译器的部分更是不跟版本走，只到develop了。 by 蓝翔
+> 
+> 高阶自动微分的功能是否统一由一位高T来写？
+> 
+> 里面的术语需要统一起来，并且用标准的用法。例如：有的地方用float16，有的地方用FP16；用的地方用TensorRT，有的地方用tensorrt。
+> 
+> 术语统一（注意大小写与特殊标注）：
+> 
+> Pure FP16、FP32、bfloat16、Tensor、TensorRT、CUDA、cuDNN、GPU、CPU、op(op名称不需要加 )、API(API名称与参数需要加 `paddle.*`，如`NHCW`、`axis`)、 Kernel、seed、pass、inplace、PaddlePaddle/飞桨、shape、MKLDNN、python、conv、cache、dropout、ERNIE、Windows、Mac、Linux（更多统一标准待补充...）
+> 
+> 中英文之间加空格，句尾加句号；标点符号不要中英文混用，尤其注意中英文的逗号。
 
- - C API支持对c++ std::string的处理。([#38667](https://github.com/PaddlePaddle/Paddle/pull/38667)) 
+## 2. 不兼容升级
 
-#### 后端能力增强
+- `paddle.to_tensor` 将一个 python int scalar 转换为 Tensor 时，在 Windows 上的默认数据类型由 int32 变为 int64，从而与 Linux/Mac 保持对齐。([#39662](https://github.com/PaddlePaddle/Paddle/pull/39662)) 
 
-- GPU 及 TensorRT 子图引擎相关更新
-    - 支持 relu、relu6、tanh、sigmoid、pool2d、concat、batch_norm、split、gelu、scale、swish、prelu、clip、reduce_sum、reduce_mean 算子在静态 shape 且2维输入情况下调用 TensorRT 推理。([#37773](https://github.com/PaddlePaddle/Paddle/pull/37773)) 
-    - 支持mish激活函数调用 TensorRT 推理。 ([#38866](https://github.com/PaddlePaddle/Paddle/pull/38866)) 
+- 为了与 python3 下的除法行为保持一致，除法符号 `/` 从 rounding divide 变成 true divide，计算输出结果的数据类型从 int 切换成 float。 ([#40890](https://github.com/PaddlePaddle/Paddle/pull/40890)) 
 
+### Paddle 2.2 version
 
-### （2）问题修复
+import paddle
 
-#### 框架及API修复
+a = paddle.to_tensor([327])
 
-- 算子修复
+b = paddle.to_tensor([80])
 
-    - 修复roi_align算子在使用 TRT 时不兼容的问题。([#38788](https://github.com/PaddlePaddle/Paddle/pull/38788))
-    - 增加elementwise在维度相同时广播的功能。([#37908](https://github.com/PaddlePaddle/Paddle/pull/37908))
+a / b
 
-- 框架功能修复
-    - 修复动态图转静态图时的模型剪裁逻辑，使得包含 subblock 的算子在动态图转静态图时可以正确剪裁。([#37579](https://github.com/PaddlePaddle/Paddle/pull/37579))
-    - 修复多线程下 CreatePredictor 接口的报错问题，当前的 CreatePredictor 接口允许在多线程中调用而不会导致推理异常。([#37894](https://github.com/PaddlePaddle/Paddle/pull/37894))
-    - 配置config时，对于没有权重的模型，支持 params file 传空字符串。([#38579](https://github.com/PaddlePaddle/Paddle/pull/38579))
-    - 修复Paddle-TRT engine直接输入cpu tensor没有进行gpu数据拷贝的问题。([#37427](https://github.com/PaddlePaddle/Paddle/pull/37427)) 
+'''
 
+Tensor(shape=[1], dtype=int64, place=CUDAPlace(0), stop_gradient=True,
 
-#### 后端能力修复
+ [4])
 
-- TensorRT 子图引擎修复
-    - 修复pool2d在某些参数组合的情况下运行TensorRT出错的问题。([#37929](https://github.com/PaddlePaddle/Paddle/pull/37929))
+'''
 
-- MKLDNN引擎修复
-    - 修复 matmul_v2 的 mkldnn kernel 不支持两个输入的shape长度不同的问题。 ([#38733](https://github.com/PaddlePaddle/Paddle/pull/38733)) 
+### Paddle 2.3.0-rc0 version
 
-#### 其他修复
+import paddle
 
-- 修复ERNIE模型在TRT8下可能出现的hang死问题。([#37839](https://github.com/PaddlePaddle/Paddle/pull/37839))
+a = paddle.to_tensor([327])
 
-# 2.2.1 Release Note
+b = paddle.to_tensor([80])
 
-## 1. 重要更新
+a / b
 
-我们很高兴的发布飞桨框架2.2.1版本，主要是对2.2.0中一些功能和性能问题的修复，并对部分功能点做了增强，重点如下：
+'''
 
-- 新增  ``paddle.linalg.triangular_solve``，用于计算带有三角系数矩阵的线性方程组。
-- 新增 `paddle.device.cuda.graphs.CUDAGraph` API，支持NVIDIA的[CUDA Graph](https://developer.nvidia.com/blog/cuda-graphs/)功能，注意目前该API还处于实验阶段，尚未稳定。
-- 修复了基础API、Tensor 索引中的已知问题。
+Tensor(shape=[1], dtype=float32, place=Place(gpu:0), stop_gradient=True,
 
-## 2. 训练框架（含分布式）
+ [4.08750010])
 
-### （1）新功能
+'''
 
-#### API
+- 修正 ELU 的公式，alpha < 0 时的计算方式与原论文对齐，从而修复小部分情况下的计算结果错误。同时，由于在 alpha < 0 无法在数学上仅从输出计算反向梯度，因此 elu_ 在 alpha < 0 时将报错。([#37316](https://github.com/PaddlePaddle/Paddle/pull/37316))
 
-- 新增``paddle.linalg.triangular_solve`` API，用于计算带有三角系数矩阵的线性方程组。([#36714](https://github.com/PaddlePaddle/Paddle/pull/36714))
-- 新增`paddle.device.cuda.graphs.CUDAGraph` API，支持NVIDIA的[CUDA Graph](https://developer.nvidia.com/blog/cuda-graphs/)功能，可以将GPU计算全部捕捉到一张CUDA Graph中，往后多次调用，可以去除框架的额外开销，提升运行性能。注意目前该API还处于实验阶段，尚未稳定。([#37109](https://github.com/PaddlePaddle/Paddle/pull/37109))
-- 新增``paddle.incubate.graph_send_recv`` API，主要应用于图学习领域，目的是为了减少在消息传递过程中带来的中间变量显存或内存的损耗，包含 SUM、MEAN、MIN、MAX 共四种更新模式。([#37205](https://github.com/PaddlePaddle/Paddle/pull/37205))
-- 新增`paddle.incubate.operators.ResNetUnit` API，用于 ResNet 网络里的卷积、批归一化、shortcut/bottleneck操作融合。([#37109](https://github.com/PaddlePaddle/Paddle/pull/37109))
- 
-### （2）功能优化
+### Paddle 2.2 version
 
-#### API
+# elu(x) = max(0, x) + min(0, α ∗ (e^x − 1))
 
-- `paddle.incubate.FusedTransformerEncoderLayer`，添加 `src_mask=None` 的支持，添加pure fp16的支持。 ([#37229](https://github.com/PaddlePaddle/Paddle/pull/37229))
+> > > import paddle
 
-#### IR(Intermediate Representation)
+> > > x = paddle.to_tensor([-1. ,6.])
 
-- 动态图转静态图
-	- 使用`@paddle.jit.to_static`装饰单独的 function 时，提供 `train()、eval()` 函数支持切换到 `train、eval` 模式。([#37383](https://github.com/PaddlePaddle/Paddle/pull/37383))
+> > > m = paddle.nn.ELU(-0.2)
 
+> > > out = m(x)
 
-#### 分布式训练
-- 异构参数服务器完善任意次切图能力，增加流水线训练功能，提升训练吞吐。([#37446](https://github.com/PaddlePaddle/Paddle/pull/37446))
- 
-#### 其他
+> > > out
 
-- 针对 `paddle.scatter` 的 ``index`` 越界导致 core dump 的问题，加强了越界检查，并完善对应的报错信息。([#37431](https://github.com/PaddlePaddle/Paddle/pull/37431))
+Tensor(shape=[2], dtype=float32, place=CUDAPlace(0), stop_gradient=True,
 
-### （3）性能优化
+ [ 0.         , -74.48576355])
 
-- 优化 `paddle.top_k`，根据 ``k`` 的大小和 ``input_width`` 大小进行选择不同的实现方案，当 k>=75% input_width 时选择 cub 实现，否则选择手写 kernel 实现。([#37325](https://github.com/PaddlePaddle/Paddle/pull/37325))
-- 优化`paddle.fluid.optimizer.LarsMomentumOptimizer`，通过 optimizer 算子融合 + [CUDA Cooperative Groups](https://developer.nvidia.com/blog/cooperative-groups/)的方式提高OP性能。([#37109](https://github.com/PaddlePaddle/Paddle/pull/37109))
+> > > out = paddle.nn.functional.elu_(x, alpha=-0.2, name=None)
 
-### （4）问题修复
+> > > out
 
-#### API
-- 修复`paddle.nn.ELU` 与 `paddle.nn.functional.elu` 的计算公式，解决 alpha<0 时结果错误的问题；`paddle.nn.functional.elu_`不支持 alpha<0 的场景，在 alpha<0 时会报错。([#37437](https://github.com/PaddlePaddle/Paddle/pull/37437))
-- 修复`paddle.slice`反向执行时出现 `out_of_range` 的问题。([#37584](https://github.com/PaddlePaddle/Paddle/pull/37584))
-- `paddle.shape` 没有反向，显式设置 ``stop_gradient`` 为 ``True``。([#37412](https://github.com/PaddlePaddle/Paddle/pull/37412))
-- `paddle.arange` 没有反向，显式设置 ``stop_gradient`` 为 ``True``。([#37486](https://github.com/PaddlePaddle/Paddle/pull/37486))
-- `paddle.shard_index` 在输入数据的最后一维不为1时进行报错提示。([#37421](https://github.com/PaddlePaddle/Paddle/pull/37421))
-- 修复 ``paddle.matmul`` 使用int8量化，反量化时维度错误的问题。([#36982](https://github.com/PaddlePaddle/Paddle/pull/36982))
-- 修复 `paddle.nn.Dropout` 在 `eval` 模式下不计算梯度的问题。([#37305](https://github.com/PaddlePaddle/Paddle/pull/37305))
-- 修复 `paddle.nn.functional.dropout` 在静态图下输入 `Tenor` 形状中有 -1 并指定 drop 该维时报错的问题。([#37223](https://github.com/PaddlePaddle/Paddle/pull/37223))
-- 修复RNN类API `paddle.nn.LSTM`,`paddle.nn.GRU`, `paddle.nn.SimpleRNN`在CPU训练时多层RNN（dropout设置为0）反向计算出错的问题。([#37086](https://github.com/PaddlePaddle/Paddle/pull/37086))
-- 修复 `paddle.incubate.FusedTransformerEncoderLayer` 反向计算梯度错误、pre_layer_norm 处理不正确、参数处理不正确，漏传参数、 add_bias 计算错误等问题。 ([#37229](https://github.com/PaddlePaddle/Paddle/pull/37229))
-- 修复 `paddle.incubate.fused_multi_head_attention` 不支持 ``bias`` 为`None` 的问题。([#37411](https://github.com/PaddlePaddle/Paddle/pull/37411), [#37566](https://github.com/PaddlePaddle/Paddle/pull/37566))
-- 修复`paddle.vision.datasets.Cifar10`, `paddle.vision.datasets.Cifar100`加载数据没有顺序的问题。 ([#37528](https://github.com/PaddlePaddle/Paddle/pull/37528))
-- 修复一维`Tensor`在使用省略号(...)索引时维度检测异常报错的问题。([#37192](https://github.com/PaddlePaddle/Paddle/pull/37192))
-- 修复`Tensor`索引赋值(`setitem`)梯度属性无法传播的问题，详见[issue](https://github.com/PaddlePaddle/Paddle/issues/36902)。([#37028](https://github.com/PaddlePaddle/Paddle/pull/37028))
+Tensor(shape=[2], dtype=float32, place=CUDAPlace(0), stop_gradient=True,
 
-#### IR(Intermediate Representation)
+ [ 0.         , -74.48576355])
 
-- 动态图转静态图
-	- 动转静后的模型调用 `paddle.flops` 能够正确统计模型参数。([#36852](https://github.com/PaddlePaddle/Paddle/pull/36852))
-	- 动转静模块能够正确转换`for i in [1, 2, 3]`循环语句。([#37259](https://github.com/PaddlePaddle/Paddle/pull/37259))
+### Paddle 2.3.0-rc0 version
 
-#### 分布式训练
+# elu(x) = x, if x > 0
 
-  - `fleet.load_model`: 修复参数服务器模式下模型加载API不可用问题。([#37461](https://github.com/PaddlePaddle/Paddle/pull/37461))
-  -  `fleet.save_inference_model`: 修复参数服务器模式下模型保存 dense 参数前，未从 server 端拉取参数的问题。([#37461](https://github.com/PaddlePaddle/Paddle/pull/37461))
- 
+# elu(x) = α ∗ (e^x − 1), if x <= 0
 
-#### 其他
+> > > import paddle
 
-- 修复动态图 inplace 操作的问题：对一个非叶子节点进行 inplace 操作后，立即执行 backward，该节点及更前的节点的梯度计算错误。([#37420](https://github.com/PaddlePaddle/Paddle/pull/37420))
+> > > x = paddle.to_tensor([-1. ,6.])
 
-## 4. 部署方向（Paddle Inference）
+> > > m = paddle.nn.ELU(-0.2)
 
-### （1）问题修复
+> > > out = m(x)
 
-- 在明确关闭日志的情况下，进一步去除冗余的调试日志。([#37212](https://github.com/PaddlePaddle/Paddle/pull/37212))
-- 修复内存/显存优化策略，避免因不当的内存/显存优化导致预测结果有误或崩溃。([#37324](https://github.com/PaddlePaddle/Paddle/pull/37324), [#37123](https://github.com/PaddlePaddle/Paddle/pull/37123))
-- 修复 Transformer 模型的 MultiHead 结构中融合后 QkvToContextPluginDynamicscale 的 scale 计算错误问题，这是由于 cuda 函数的 block 和 thread 设置错误引起的。([#37096](https://github.com/PaddlePaddle/Paddle/pull/37096))
-- 将所有的推理OP在int8量化的功能中注册：解决因历史原因有些推理OP没有在int8量化中注册的问题。([#37266](https://github.com/PaddlePaddle/Paddle/pull/37266))
+> > > out
 
+Tensor(shape=[2], dtype=float32, place=CUDAPlace(0), stop_gradient=True,
 
-# 2.2.0 Release Note
+ [0.12642412,  6.        ])
 
-## 1. 重要更新
+> > > out = paddle.nn.functional.elu_(x, alpha=-0.2, name=None)
 
-我们很高兴的发布飞桨框架2.2.0版本，本版本包含如下重要更新。
+Traceback (most recent call last):
 
-### API
+ File "<stdin>", line 1, in <module>
 
-- 新增100+个API，包含24个傅里叶变换API、17个线性代数计算 API 等，更好地支持科学计算类、信号处理类模型。
-- 新增多种索引类型的支持，新增的索引类型包括：省略号（…）、维度扩增（None）、布尔类型数组（Bool Mask）、整数数组(（list)，以及张量（Tensor) ），可以更加方便的对张量（Tensor）进行操作。
-- 新增 `paddle.einsum` API，可以以更加简洁的方式来表达多维张量（Tensor）的计算。
-- 动态图混合精度功能增强，新增整个任务使用半精度（float16）训练的方式，主要任务下的计算效率提升20%左右。												
+ File "/usr/local/lib/python3.7/dist-packages/decorator.py", line 232, in fun
 
-### IR(Intermediate Representation)
+ return caller(func, *(extras + args), **kw)
 
-- 动态图转静态图：进一步扩充了动静转换支持的语法和场景，现在使用混合精度训练的动态图模型也可以通过 `to_static` 接口一键转换为静态图进行训练或推理部署；另外，对转换后训练的性能进行了优化，通过引入缓存和开启 Pass 等策略，转换后的训练性能相对动态图方式有明显提升。
-- Pass 开发：新增 Python 端对静态图IR的改写接口，针对 OP fusion 等子图替换场景可以在 python 中快速完成开发。
-- 对算子 Kernel 实现中的底层代码进行了抽象与功能封装，提供高性能的 Block 级 IO 运算和 Compute 运算（Kernel Primitive API）。使用 Kernel Primitive API 进行 Kernel 开发可以更加专注计算逻辑的实现，在保证性能的同时大幅减少代码量，同时实现了算子计算与硬件解耦。
+ File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/wrapped_decorator.py", line 25, in __impl__
 
-### 分布式
+ return wrapped_func(*args, **kwargs)
 
-- 混合并行：在静态图已有  4D  混合并行的基础上，进行了流水线执行器等性能优化，千亿模型下训练算力利用达到GPU理论性能峰值的51%；动态图支持了 4D 混合并行能力，千亿模型下功能和性能与静态图持平；增加了自动补全、自动切分等基础功能，具备了基于用户标记的半自动并行能力。
-- GPU 参数服务器：千亿模型下，优化数据读取、GPU-PS 构建、SSD 性能，完善流水线等功能，使得整体性能提升一倍，内存占用减少一倍，一台 GPU 机器可替代百台 CPU 机器训练千亿模型。
+ File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/inplace_utils.py", line 34, in __impl__
 
-### 推理部署
-- 推理加速：支持最新的 TensorRT 8.x，适配 Nvidia 的硬件新特性进行加速。
-- 推理易用性：增加 TensorRT 子图中的动态 Shape 配置的自动推导功能，可选从数据推导出 Shape 的范围，无需琐碎的手动配置，简化了动态 Shape 的使用。
+ return func(*args, **kwargs)
 
+ File "/usr/local/lib/python3.7/dist-packages/paddle/nn/functional/activation.py", line 89, in elu_
 
-## 2. 不兼容升级
-
-- 针对 `grad` 在路径(`paddle.autograd,grad`, `paddle.grad`) 公开暴露的问题，推荐使用 `paddle.grad`，移除了 `from paddle.autograd import *` ，然后直接调用 `grad` 的方式。([#35579](https://github.com/PaddlePaddle/Paddle/pull/35579))
-
-<table>
-<tr>
-<th>
-2.1
-</th>
-<th>
-2.2
-</th>
-</tr>
-
-<tr>
-<td>
-<pre>
-
-```python
->>> import paddle
->>> from paddle.autograd import *
->>> x = paddle.ones(shape=[1], dtype='float32')
->>> x.stop_gradient = False
->>> y = x*x
->>> grad(outputs=[y], inputs=[x])
-[Tensor(shape=[1], dtype=float32, place=CUDAPlace(0), stop_gradient=True,
-        [2.])]
-```
-</pre>
-</td>
-
-<td>
-<pre>
-
-```python
->>> import paddle
->>> from paddle.autograd import *
->>> x = paddle.ones(shape=[1], dtype='float32')
->>> x.stop_gradient = False
->>> y = x*x
->>> grad(outputs=[y], inputs=[x])
-NameError: name 'grad' is not defined
->>> paddle.grad(outputs=[y], inputs=[x]) # 改用paddle.grad API
-[Tensor(shape=[1], dtype=float32, place=CUDAPlace(0), stop_gradient=True,
-       [2.])]
-```
-</pre>
-</td>
-</tr>
-</table>
-
-- ``Tensor.__setitem__`` 不再支持非 ``int`` 类型的 slice 索引( ``x[start:stop:step] = value`` )。由于 ``float``类型在作为索引时不具有数学意义（ 如 ``start`` 为 0.5 时如何确定具体索引的位置）且容易导致一些未知行为，所以本次更新中我们把 slice 索引的数据类型限定为 ``int``，使用 ``float`` 的 slice 索引将报错。([#35701](https://github.com/PaddlePaddle/Paddle/pull/35701))
-
-
-
-<table>
-<tr>
-<th>
-2.1
-</th>
-<th>
-2.2
-</th>
-</tr>
-
-<tr>
-<td>
-<pre>
-
-```python
->>> import paddle
->>> x = paddle.to_tensor([1, 2, 3, 4])
->>> start = paddle.zeros([1])
->>> stop = paddle.zeros([1]) + 2
->>> step = paddle.ones([1])
->>> x[start:stop:step] = 0 # start,stop,step 支持float类型Tensor
->>> x 
-```
-</pre>
-</td>
-
-<td>
-<pre>
-
-```python
->>> import paddle
->>> x = paddle.to_tensor([1, 2, 3, 4])
->>> start = paddle.zeros([1])
->>> stop = paddle.zeros([1]) + 2
->>> step = paddle.ones([1])
->>> x[start:stop:step] = 0
-ValueError: (InvalidArgument) Currently, the type of tensor in slice indices only allows int32 and int64, please check the type of index tensor.
-
->>> # 须改为如下代码：
->>> start = paddle.zeros([1], dtype='int32')
->>> stop = paddle.zeros([1], dtype='int32') + 2
->>> step = paddle.ones([1], dtype='int32')
->>> x[start:stop:step] = 0 # start,stop,step 必须为integer类型Tensor
->>> x
-Tensor(shape=[4], dtype=int64, place=CUDAPlace(0), stop_gradient=True,
-       [0, 0, 3, 4])
-```
-</pre>
-</td>
-</tr>
-</table>
-
-
-- 为动态图``Tensor.__setitem__`` 中加入 inplace 调用合法性检测，不满足检测的赋值代码会报错（检测逻辑：当 ``Tensor`` 为叶节点并且`stop_gradient`为 ``False`` 时，``Tensor`` 赋值操作将被拦截并报错）。由于 ``tensor[index]=value``的执行会覆盖 ``Tensor`` 中原来的值，是 ``Tensor`` 的 inplace 操作，如果 ``Tensor`` 是计算图中的一个叶节点并且需要计算梯度时，进行 ``Tensor`` 的赋值操作会使该 ``Tensor`` 反向梯度的计算出现问题，属于非法的 inplace 操作。所以本次更新加入了对这种操作的检测与拦截，当前使用 ``tensor[index]=value`` 方式赋值的代码都会检测是否满足 inplace 操作的要求，不满足将会报错。  ([#35701](https://github.com/PaddlePaddle/Paddle/pull/35701))
-	- 示例：使用` weight[index]=value `方式的参数初始化代码调整，`self.weight`属于叶节点且需要计算梯度，不能使用inplace操作（会影响反向梯度值计算），但初始化赋值本身不需要反向计算过程，所以在明确不需要反向计算时，可以使用`no_grad`关闭梯度计算后再进行赋值。
-
-
-<table>
-<tr>
-<th>
-2.1
-</th>
-<th>
-2.2
-</th>
-</tr>
-
-<tr>
-<td>
-<pre>
-
-```python
->>> import paddle
->>> class MyLayer(paddle.nn.Layer):
-...     def __init__(self):
-...         super(MyLayer, self).__init__()
-...         self.weight = self.create_parameter(...)
-...         self.weight[index] = 1.0
-...
-```
-</pre>
-</td>
-
-<td>
-<pre>
-
-```python
->>> import paddle
-class MyLayer(paddle.nn.Layer):
-...     def __init__(self):
-...         super(MyLayer, self).__init__()
-...         self.weight = self.create_parameter(...)
-...         with paddle.no_grad(): # 关闭梯度计算后可进行赋值
-...             self.weight[index] = 1.0
-```
-</pre>
-</td>
-</tr>
-</table>
-
-- 针对`paddle.sum` 输入类型为 ``bool`` 时，输出类型也为``bool``，行为与``numpy.sum`` 不一致问题，进行了不兼容升级，升级后输出类型为``int64``，与 ``numpy.sum`` 保持一致。([#34313](https://github.com/PaddlePaddle/Paddle/pull/34313))
-
-
-<table>
-<tr>
-<th>
-2.1
-</th>
-<th>
-2.2
-</th>
-</tr>
-
-<tr>
-<td>
-<pre>
-
-```python
->>> import paddle
->>> import numpy as np
->>> np_arr = np.ones((2, 3), dtype='bool')
->>> pd_arr = paddle.to_tensor(np_arr)
->>> pd_sum = pd_arr.sum(0)
->>> pd_sum.dtype
-paddle.bool
-```
-</pre>
-</td>
-
-<td>
-<pre>
-
-```python
->>> import paddle
->>> import numpy as np
->>> np_arr = np.ones((2, 3), dtype='bool')
->>> pd_arr = paddle.to_tensor(np_arr)
->>> pd_sum = pd_arr.sum(0)
->>> pd_sum.dtype
-paddle.int64
-```
-</pre>
-</td>
-</tr>
-</table>
-
-
-- 针对``paddle.to_tensor``在输入 ``data`` 为 ``Tensor`` 时不拷贝 ``Tensor`` 导致 ``stop_gradient`` 属性可能被错误修改的问题，优化了该情况下的 ``Tensor`` 拷贝行为。原实现中，当 ``data`` 为 ``Tensor`` 且 ``dtype`` 和 ``place`` 不改变时，会直接返回 ``data``（即不发生拷贝）并修改 ``data.stop_gradient`` 属性。该行为会导致原来的计算图 ``data`` 的反向传播出现问题。新实现中，上述情况下，``paddle.to_tensor`` 会拷贝一个新的 ``Tensor`` 且返回，不会修改原 ``data`` 的 ``stop_gradient`` 属性。([#33335](https://github.com/PaddlePaddle/Paddle/pull/33335)) 
-
-<table>
-<tr>
-<th>
-2.1
-</th>
-<th>
-2.2
-</th>
-</tr>
-
-<tr>
-<td>
-<pre>
-
-```python
->>> import paddle
->>> x = paddle.rand([2,3])
->>> x.stop_gradient = False
->>> y = paddle.to_tensor(x)
->>> print(id(x) == id(y)) # True
->>> print(x.stop_gradient, y.stop_gradient) # True True
-```
-</pre>
-</td>
-
-<td>
-<pre>
-
-```python
->>> import paddle
->>> x = paddle.rand([2,3])
->>> x.stop_gradient = False
->>> y = paddle.to_tensor(x)
->>> print(id(x) == id(y)) # False
->>> print(x.stop_gradient, y.stop_gradient) # False True
-```
-</pre>
-</td>
-</tr>
-</table>
+ assert alpha >= 0., "elu_ only support alpha >= 0, please use elu instead."
 
+AssertionError: elu_ only support alpha >= 0, please use elu instead.
 
 ## 3. 训练框架（含分布式）
 
 ### （1）新功能
+
 #### API
-- 新增线性代数计算API``paddle.linalg.*``
- - 新增 ``paddle.linalg.svd``，支持对多维 ``Tensor`` 进行奇异值分解。([#34953](https://github.com/PaddlePaddle/Paddle/pull/34953)) 
-	- 新增 ``paddle.linalg.cond``，支持根据范数种类 ``p`` 计算一个或一批矩阵的条件数。([#35140](https://github.com/PaddlePaddle/Paddle/pull/35140)) 
-	- 新增 ``paddle.linalg.matrix_rank``，支持计算多维矩阵 ``Tensor``的秩。 ([#34823](https://github.com/PaddlePaddle/Paddle/pull/34823)) 
-	- 新增 ``paddle.linalg.eigvals``，支持计算一般方阵的特征值。 ([#35720](https://github.com/PaddlePaddle/Paddle/pull/35720), [#35909](https://github.com/PaddlePaddle/Paddle/pull/35720))
-	- 新增 ``paddle.linalg.eigh``，支持计算复数厄米特矩阵或者实数对称矩阵的特征值和特征向量。([#34990](https://github.com/PaddlePaddle/Paddle/pull/34990), [#35916](https://github.com/PaddlePaddle/Paddle/pull/35916), [#35812](https://github.com/PaddlePaddle/Paddle/pull/35812), [#36091](https://github.com/PaddlePaddle/Paddle/pull/36091),[#35919](https://github.com/PaddlePaddle/Paddle/pull/35919)) 
-	- 新增 ``paddle.linalg.det``， 支持计算多维矩阵的行列式值。([#34992](https://github.com/PaddlePaddle/Paddle/pull/34992)) 
-	- 新增 ``paddle.linalg.slogdet``，支持计算多维矩阵行列式值的符号值与自然对数值。([#34992](https://github.com/PaddlePaddle/Paddle/pull/34992))
-	- 新增 ``paddle.linalg.pinv``，支持计算多维矩阵 ``Tensor`` 的伪逆矩阵。([#35804](https://github.com/PaddlePaddle/Paddle/pull/35804))
-	- 新增 ``paddle.linalg.multi_dot``，支持多个矩阵连乘的计算。([#35224](https://github.com/PaddlePaddle/Paddle/pull/35224))
-	- 新增 ``paddle.linalg.solve``，支持计算线性方程组的解。([#35715](https://github.com/PaddlePaddle/Paddle/pull/35715))
-	- 新增``paddle.linalg.matrix_power``，支持矩阵的幂运算操作。([#34667](https://github.com/PaddlePaddle/Paddle/pull/34667))
-	-  新增`paddle.linalg.eigvalsh`，用于计算厄米特矩阵或者实数对称矩阵的特征值。([#36680](https://github.com/PaddlePaddle/Paddle/pull/36680))
-	- 新增`paddle.linalg.eig`，用于计算一般方阵的特征值和特征向量。([#35674](https://github.com/PaddlePaddle/Paddle/pull/35674))
-	- 新增`paddle.linalg.qr`，用于计算矩阵的QR分解（暂不支持反向）。([#36627](https://github.com/PaddlePaddle/Paddle/pull/36627))
-- 新增傅里叶变换相关API ([#35665](https://github.com/PaddlePaddle/Paddle/pull/35665)) 
-    - 新增快速傅立叶变换系列函数
-        - 可微分的 1d 到 nd 复数到复数快速傅里叶变换。(``paddle.fft.fft``, ``paddle.fft.fft2``, ``paddle.fft.fftn``, ``paddle.fft.ifft``, ``paddle.fft.ifft2``, ``paddle.fft.ifftn``)
-        - 可微分的 1d 到 nd 实数到复数快速傅里叶变换。(``paddle.fft.rfft``, ``paddle.fft.rfft2``, ``paddle.fft.rfftn``, ``paddle.fft.ihfft``, ``paddle.fft.ihfft2``, ``paddle.fft.ihfftn``)
-        - 可微分的 1d 到 nd 复数到实数快速傅里叶变换。 (``paddle.fft.hfft``, ``paddle.fft.hfft2``, ``paddle.fft.hfftn``, ``paddle.fft.irfft``, ``paddle.fft.irfft2``, ``paddle.fft.irfftn``)
-        - fft 相关的辅助函数。(``paddle.fft.fftfreq``, ``paddle.fft.rfftfreq``, ``paddle.fft.fftshift``, ``paddle.fft.ifftshift``)
-
-    - 新增短时傅里叶变换相关函数
-        - 短时傅里叶变换。(``paddle.signal.stft``)
-        - 短时傅里叶逆变换。(``paddle.signal.istft``)
-
-- 新增高层API
-	- 新增 ``paddle.vision.ops.roi_pool`` 和 ``paddle.vision.ops.RoIPool``，支持检测任务中 RoI 区域池化操作。 ([#36154](https://github.com/PaddlePaddle/Paddle/pull/36154))
-	-  新增 ``paddle.vision.ops.roi_align`` 和 ``paddle.vision.ops.RoIAlign``，支持检测任务中 RoI 区域 Align 操作。([#36207](https://github.com/PaddlePaddle/Paddle/pull/36207))
-	- 新增 ``paddle.vision.ops.psroi_pool`` 和 ``paddle.vision.ops.PSRoIPool``，支持检测任务中位置敏感的 RoI 区域池化操作。 ([#36111](https://github.com/PaddlePaddle/Paddle/pull/36111))
-	- 新增 ``paddle.vision.models.vgg19`` 预训练权重。 ([#35788](https://github.com/PaddlePaddle/Paddle/pull/35788))
-	- 新增 ``paddle.vision.datasets.*`` 中数据集 API 下载进度条。([#33302](https://github.com/PaddlePaddle/Paddle/pull/33302))
-	- 新增 ``paddle.Model.predict`` 参数 ``verbose``，支持是否显示日志。([#33405](https://github.com/PaddlePaddle/Paddle/pull/33405))
-	- 新增 ``paddle.hub`` 下载选项 `wget` 方式。([#33379](https://github.com/PaddlePaddle/Paddle/pull/33379))
-	-  新增 ``paddle.Model`` 动态图模式下梯度累加功能。([#32702](https://github.com/PaddlePaddle/Paddle/pull/32702))
-	- 新增 ``paddle.Model.fit`` 和 ``paddle.Model.evaluate``  动态图模式下 ``num_iters`` 参数，控制训练迭代轮数。([#33986](https://github.com/PaddlePaddle/Paddle/pull/33986))
-	- 新增 ``paddle.vision.ops.yolo_box`` 参数 ``iou_aware`` 和 ``iou_aware_factor``，支持 YoloBox 使用预测的 IOU 作为置信度的因子。([#33400](https://github.com/PaddlePaddle/Paddle/pull/33400))
-	- 新增 ``paddle.summary`` 参数``input``，支持给定输入。([#34165](https://github.com/PaddlePaddle/Paddle/pull/34165))
-	- 新增`paddle.text.viterbi_decode`，支持动态图下CPU、GPU的Viterbi解码功能。([#35778](https://github.com/PaddlePaddle/Paddle/pull/35778))
+
+- 新增4个自动微分类 API，支持科学计算需求，具体列表如下：([#40692](https://github.com/PaddlePaddle/Paddle/pull/40692))
+  
+  - `paddle.incubate.autograd.vjp`，计算向量-雅可比矩阵乘积。
+  
+  - `paddle.incubate.autograd.jvp`，计算雅可比矩阵-向量乘积。
+  
+  - `paddle.incubate.autograd.Jacobian`，计算雅可比矩阵。
+  
+  - `paddle.incubate.autograd.Hessian`，计算海森矩阵。
+
+- 新增线性代数类 API
+  
+  - 新增 `paddle.linalg.triangular_solve`，计算具有唯一解的三角系数线性方程组。([#36714](https://github.com/PaddlePaddle/Paddle/pull/36714)) 
+  
+  - 新增 `paddle.linalg.eig`，计算一般方阵的特征分解。([#35764](https://github.com/PaddlePaddle/Paddle/pull/35764)) 
+  
+  - 新增 `paddle.linalg.sovle`，计算线性方程组的解。([#35715](https://github.com/PaddlePaddle/Paddle/pull/35715))
+  
+  - 新增 `paddle.linalg.lstsq`，计算线性方程组的最小二乘解。([#38585](https://github.com/PaddlePaddle/Paddle/pull/38585), [#38621](https://github.com/PaddlePaddle/Paddle/pull/38621)) 
+  
+  - 新增 `paddle.linalg.qr`，计算矩阵的 QR 分解。([#35742](https://github.com/PaddlePaddle/Paddle/pull/35742), [#38824](https://github.com/PaddlePaddle/Paddle/pull/38824)）
+  
+  - 新增 `paddle.inner`，计算矩阵内积。([#37706](https://github.com/PaddlePaddle/Paddle/pull/37706)) 
+  
+  - 新增 `paddle.outer`，计算矩阵外积。([#37706](https://github.com/PaddlePaddle/Paddle/pull/37706)) 
+  
+  - 新增 `paddle.linalg.cov`，计算向量间协方差。([#38392](https://github.com/PaddlePaddle/Paddle/pull/38392)) 
+  
+  - 新增 `paddle.linalg.cholesky_sovle`，计算方程 cholesky 解。([#38167](https://github.com/PaddlePaddle/Paddle/pull/38167)) 
+  
+  - 新增 `paddle.linalg.lu`、 `paddle.linalg.lu_unpack`，计算矩阵 lu 分解、解压缩 lu 矩阵。([#38617](https://github.com/PaddlePaddle/Paddle/pull/38617), [#38559](https://github.com/PaddlePaddle/Paddle/pull/38559), [#38616](https://github.com/PaddlePaddle/Paddle/pull/38616)) 
+
+- 新增21个概率分布类 API，包括6个随机变量分布，13个随机变量变换，2个 KL 散度计算，用于强化学习、变分推断、科学计算等场景，具体列表如下：([#40536](https://github.com/PaddlePaddle/Paddle/pull/40536), [#38820](https://github.com/PaddlePaddle/Paddle/pull/38820), [#38558](https://github.com/PaddlePaddle/Paddle/pull/38558/files), [#38445](https://github.com/PaddlePaddle/Paddle/pull/38445), [#38244](https://github.com/PaddlePaddle/Paddle/pull/38244), [#38047](https://github.com/PaddlePaddle/Paddle/pull/38047))
+  
+  - `paddle.distribution.ExponentialFamily`，指数分布族基类。
+  
+  - `paddle.distribution.Beta`，`Beta` 分布。
+  
+  - `paddle.distribution.Dirichlet`，`Dirichlet` 分布。
+  
+  - `paddle.distribution.Independent`，独立分布，用于创建高阶分布。
+  
+  - `paddle.distribution.TransformedDistribution`，变换分布，用于通过基础分布及一系列变换生成高阶分布。
+  
+  - `paddle.distribution.Multionmial`，多项分布。
+  
+  - `paddle.distribution.Transform`，随机变量变换的基类。
+  
+  - `paddle.distribution.AbsTransform`，取绝对值变换。
+  
+  - `paddle.distribution.AffineTransform`，仿射变换。
+  
+  - `paddle.distribution.ChainTransform`，变换的链式组合。
+  
+  - `paddle.distribution.ExpTransform`，指数变换。
+  
+  - `paddle.distribution.IndependentTransform`，独立变换，用于扩展变换定义域的 `event_dim`。
+  
+  - `paddle.distribution.PowerTransform`，幂变换。
+  
+  - `paddle.distribution.ReshapeTransform`，`reshape` 变换。
+  
+  - `paddle.distribution.SigmoidTransform`，`sigmoid` 变换。
+  
+  - `paddle.distribution.SoftmaxTransform`，`softmax` 变换。
+  
+  - `paddle.distribution.StackTransform`，`stack` 变换，用于以 `stack` 方式组合多个变换。
+  
+  - `paddle.distribution.StickBreakingTransform` , `stickbreaking` 变换。
+  
+  - `paddle.distribution.TanhTransform`，`tanh` 变换。
+  
+  - `paddle.distribution.kl_divergence`，计算 KL 散度。
+  
+  - `paddle.distribution.register_kl`，注册用户自定义 KL 散度计算函数。
+
+- 新增高层 API
+  
+  - 新增 `paddle.vision.models.AlexNet`、`paddle.vision.models.alexnet`，支持直接使用 AlexNet 模型。([#36058](https://github.com/PaddlePaddle/Paddle/pull/36058)) 
+  
+  - 新增 `paddle.vision.models.DenseNet`、 `paddle.vision.models.densenet121`、 `paddle.vision.models.densenet161`、 `paddle.vision.models.densenet169`、 `paddle.vision.models.densenet201`、 `paddle.vision.models.densenet264`，支持直接使用 DenseNet 模型。([#36069](https://github.com/PaddlePaddle/Paddle/pull/36069)) 
+  
+  - 新增 `paddle.vision.models.GoogLeNet`、`paddle.vision.models.googlenet`，支持直接使用 GoogLeNet 模型。([#36034](https://github.com/PaddlePaddle/Paddle/pull/36034)) 
+  
+  - 新增 `paddle.vision.models.InceptionV3`、`paddle.vision.models.inception_v3`，支持直接使用 InceptionV3 模型。([#36064](https://github.com/PaddlePaddle/Paddle/pull/36064)) 
+  
+  - 新增 `paddle.vision.models.MobileNetV3Small`、 `paddle.vision.models.MobileNetV3Large`、`paddle.vision.models.mobilenet_v3_small`、`paddle.vision.models.mobilenet_v3_large`，支持直接使用 MobileNetV3 模型。([#38653](https://github.com/PaddlePaddle/Paddle/pull/38653)) 
+  
+  - 新增 `paddle.vision.models.ResNeXt`、 `paddle.vision.models.resnext50_32x4d`、 `paddle.vision.models.resnext50_64x4d`、`paddle.vision.models.resnext101_32x4d`、`paddle.vision.models.resnext101_64x4d`、`paddle.vision.models.resnext152_32x4d`、`paddle.vision.models.resnext152_64x4d`，支持直接使用 ResNeXt 模型。([#36070](https://github.com/PaddlePaddle/Paddle/pull/36070)) 
+  
+  - 新增 `paddle.vision.models.ShuffleNetV2`、 `paddle.vision.models.shufflenet_v2_x0_25`、`paddle.vision.models.shufflenet_v2_x0_33`、`paddle.vision.models.shufflenet_v2_x0_5`、`paddle.vision.models.shufflenet_v2_x1_0`、`paddle.vision.models.shufflenet_v2_x1_5`、`paddle.vision.models.shufflenet_v2_x2_0`、`paddle.vision.models.shufflenet_v2_swish`，支持直接使用 ShuffleNetV2 模型。([#36067](https://github.com/PaddlePaddle/Paddle/pull/36067)) 
+  
+  - 新增 `paddle.vision.models.SqueezeNet`、 `paddle.vision.models.squeezenet1_0`、`paddle.vision.models.squeezenet1_1`，支持直接使用 SqueezeNet 模型。([#36066](https://github.com/PaddlePaddle/Paddle/pull/36066)) 
+  
+  - 新增 `paddle.vision.models.wide_resnet50_2`、`paddle.vision.models.wide_resnet101_2`，支持直接使用 WideResNet 模型。([#36952](https://github.com/PaddlePaddle/Paddle/pull/36952))
+  
+  - 新增`paddle.vision.ops.nms` API，支持单类别和多类别非极大抑制(non-maximum supression, nms)算法，用于目标检测预测任务加速。([#40962](https://github.com/PaddlePaddle/Paddle/pull/40962)) 
+  
+  - 新增`paddle.vision.ops.roi_pool` 和 `paddle.vision.ops.RoIPool`，支持检测任务中 RoI 区域池化操作。 ([#36154](https://github.com/PaddlePaddle/Paddle/pull/36154)) 
+  
+  - 新增`paddle.vision.ops.roi_align` 和 `paddle.vision.ops.RoIAlign`，支持检测任务中 RoI Align 操作。 ([#35102](https://github.com/PaddlePaddle/Paddle/pull/36154)) 
+  
+  - 新增 `paddle.text.ViterbiDecoder`、`paddle.text.viterbi_decode` Viterbi 解码 API，主要用于序列标注模型的预测。 ([#35778](https://github.com/PaddlePaddle/Paddle/pull/35778)) 
+
+- 新增 11 个 Sparse 类 API，支持创建 COO、CRS 格式的Sparse Tensor，与 Tensor 互相转换等基础功能： 
+  
+  - `paddle.sparse.sparse_coo_tensor`，创建 COO 格式的 Sparse Tensor。 ([#40780](https://github.com/PaddlePaddle/Paddle/pull/40780)）
+  
+  - `paddle.sparse.sparse_csr_tensor`，创建 CSR 格式的 Sparse Tensor。 ([#40780](https://github.com/PaddlePaddle/Paddle/pull/40780)）
+  
+  - `paddle.sparse.ReLU`，支持 SparseCooTensor 的 ReLU 激活层。（[#40959](https://github.com/PaddlePaddle/Paddle/pull/40959)) 
+  
+  - `paddle.sparse.functional.relu`，支持 SparseCooTensor 的 ReLU 函数。（[#40959](https://github.com/PaddlePaddle/Paddle/pull/40959)) 
+  
+  - `Tensor.values()`，获取 SparseCooTensor 或者 SparseCsrTensor 的非零元素方法。（[#40608](https://github.com/PaddlePaddle/Paddle/pull/40608)）
+  
+  - `Tensor.indices()`，获取 SparseCooTensor 的坐标信息的方法。（[#40608](https://github.com/PaddlePaddle/Paddle/pull/40608)）
+  
+  - `Tensor.crows()`，获取 SparseCsrTensor 的压缩行信息的方法。（[#40608](https://github.com/PaddlePaddle/Paddle/pull/40608)）
+  
+  - `Tensor.cols()`，获取 SparseCsrTensor 的列信息的方法。（[#40608](https://github.com/PaddlePaddle/Paddle/pull/40608)）
+  
+  - `Tensor.to_sparse_coo()`，将 DenseTensor 或者 SparseCsrTensor 转换为 SparseCooTensor。 ([#40780](https://github.com/PaddlePaddle/Paddle/pull/40780)）
+  
+  - `Tensor.to_sparse_csr()`，将 DenseTensor 或者 SparseCooTensor 转换为 SparseCsrTensor。([#40780](https://github.com/PaddlePaddle/Paddle/pull/40780)）
+  
+  - `Tensor.to_dense()`，将 SparseCooTensor 或者 SparseCsrTensor 转换为 DenseTensor。([#40780](https://github.com/PaddlePaddle/Paddle/pull/40780)）
+
+- 新增硬件相关 API
+  
+  - 新增 `paddle.device.cuda.max_memory_allocated`、`paddle.device.cuda.max_memory_reserved`、 `paddle.device.cuda.memory_allocated` 和 `paddle.device.cuda.memory_reserved` 四个 GPU 显存监测相关 API，方便实时查看和分析模型显存占用指标。([#38657](https://github.com/PaddlePaddle/Paddle/pull/38657)) 
+  
+  - 新增 `paddle.device.cuda.get_device_properties`，支持返回 CUDA 设备属性信息。([#35661](https://github.com/PaddlePaddle/Paddle/pull/35661)) 
+  
+  - 新增 `paddle.device.cuda.get_device_name` 和 `paddle.device.cuda.get_device_capability`，支持返回 GPU 设备名称信息和计算能力的主要和次要修订号。([#35672](https://github.com/PaddlePaddle/Paddle/pull/35672)) 
+
+- 新增 Tensor 操作 API
+  
+  - 新增 `paddle.nansum`，沿 `axis` 对输入 Tensor 求和，且忽略掉 `NaNs` 值。([#38137](https://github.com/PaddlePaddle/Paddle/pull/38137)) 
+  
+  - 新增 `paddle.nanmean`，沿 `axis`对输入 Tensor 求平均，且忽略掉 `NaNs` 值。（[#40472](https://github.com/PaddlePaddle/Paddle/pull/40472)）
+  
+  - 新增 `paddle.clone`，返回输入 Tensor 的拷贝，并且提供梯度计算。([#38020](https://github.com/PaddlePaddle/Paddle/pull/38020)) 
+  
+  - 新增 `paddle.Tensor.element_size`，返回 Tensor 中的单个元素在计算机中所分配的 bytes 数量。([#38020](https://github.com/PaddlePaddle/Paddle/pull/38020)) 
+  
+  - 新增 `paddle.Tensor.to_uva_tensor`，支持将 numpy 对象转换为实际存储在 CPU，但可作为 CUDA 对象进行虚拟地址访问的功能。([#39146](https://github.com/PaddlePaddle/Paddle/pull/39146), [#38950](https://github.com/PaddlePaddle/Paddle/pull/38950)) 
+  
+  - 新增`paddle.rot90`，沿 `axes` 指定的平面将 n 维 Tensor 旋转90度。([#37634](https://github.com/PaddlePaddle/Paddle/pull/37634)) 
+  
+  - 新增`paddle.logit` 和 `paddle.Tensor.logit`，计算输入 Tensor 的 logit 函数值。([#37844](https://github.com/PaddlePaddle/Paddle/pull/37844)) 
+  
+  - 新增 `paddle.repeat_interleave`，沿着指定轴对输入进行复制，创建并返回到一个新的 Tensor。([#37981](https://github.com/PaddlePaddle/Paddle/pull/37981)) 
+  
+  - 新增 `paddle.renorm`，把 Tensor 在指定的 `axis` 切分成多块后分别进行 p norm 操作。([#38130](https://github.com/PaddlePaddle/Paddle/pull/38130), [#38459](https://github.com/PaddlePaddle/Paddle/pull/38459)) 
+  
+  - 新增 `paddle.mode` 和 `paddle.Tensor.mode`，沿指定轴查找输入 Tensor 的众数及对应的索引。([#38446](https://github.com/PaddlePaddle/Paddle/pull/38446)) 
+  
+  - 新增 `paddle.quantile` 和 `paddle.Tensor.quantile`，沿指定轴计算 Tensor 的 q 分位数。([#38567](https://github.com/PaddlePaddle/Paddle/pull/38567)) 
+  
+  - 新增 `paddle.kthvalue` 和 `paddle.Tensor.kthvalue`，查找 Tensor 中指定轴上第 k 小的数及对应的索引。([#38386](https://github.com/PaddlePaddle/Paddle/pull/38386)) 
+  
+  - 新增 `paddle.is_floating_point` 和 `paddle.Tensor.is_floating_point`，判断输入 Tensor 是否为浮点类型。([#37885](https://github.com/PaddlePaddle/Paddle/pull/37885)) 
+  
+  - 新增 `paddle.erfinv` 和 `paddle.Tensor.erfinv`，计算输入 Tensor 的逆误差函数。([#38295](https://github.com/PaddlePaddle/Paddle/pull/38295)) 
+  
+  - 新增 `paddle.lerp` 和 `paddle.Tensor.lerp`，根据给定权重计算输入Tensor间的线性插值。([#37253](https://github.com/PaddlePaddle/Paddle/pull/37253)) 
+  
+  - 新增 `paddle.angle`，用于计算复数 Tensor 的相位角。 ([#37689](https://github.com/PaddlePaddle/Paddle/pull/37689)) 
+  
+  - 新增`paddle.rad2deg`和`paddle.Tensor.rad2deg`，将元素从弧度的角度转换为度。([#37598](https://github.com/PaddlePaddle/Paddle/pull/37598)) 
+  
+  - 新增`paddle.deg2rad`和`paddle.Tensor.deg2rad`，将元素从度的角度转换为弧度。([#37598](https://github.com/PaddlePaddle/Paddle/pull/37598)) 
+  
+  - 新增`paddle.gcd`和`paddle.Tensor.gcd`，计算两个输入的按元素绝对值的最大公约数。([#37819](https://github.com/PaddlePaddle/Paddle/pull/37819)) 
+  
+  - 新增`paddle.lcm`和`paddle.Tensor.lcm`，计算两个输入的按元素绝对值的最小公倍数。([#37819](https://github.com/PaddlePaddle/Paddle/pull/37819)) 
+  
+  - 新增`paddle.amax`和`paddle.Tensor.amax`，对指定维度上的 Tensor 元素求最大值，正向结果和 max 一样，有多个相等的最大值时，反向的梯度平均分到这多个值的位置上。([#38417](https://github.com/PaddlePaddle/Paddle/pull/38417)) 
+  
+  - 新增`paddle.amin`和`paddle.Tensor.amin`，对指定维度上的 Tensor 元素求最小值，正向结果和 min 一样，有多个相等的最小值时，反向的梯度平均分到这多个值的位置上。([#38417](https://github.com/PaddlePaddle/Paddle/pull/38417)) 
+  
+  - 新增`paddle.isclose`，用于判断两个 Tensor 的每个元素是否接近。([#37135](https://github.com/PaddlePaddle/Paddle/pull/37135)) 
+  
+  - 新增`paddle.put_along_axis` 和`paddle.take_along_axis`，用于提取或放置指定索引下标的元素。([#38608](https://github.com/PaddlePaddle/Paddle/pull/38608)) 
+  
+  - 新增 `paddle.bincount` 和 `paddle.Tensor.bincount`，用于统计 Tensor 中每个元素出现的次数。([#36317](https://github.com/PaddlePaddle/Paddle/pull/36317)) 
+  
+  - 新增 `paddle.fmax`、 `paddle.fmin`，扩展了max/min的功能，支持比较的两个 Tensor 中有 NaN 值的情况，即如果对应位置上有1个 NaN 值，则返回那个非 NaN 值；如果对应位置上有2个 NaN 值，则返回 NaN 值。([#37826](https://github.com/PaddlePaddle/Paddle/pull/37826)) 
+  
+  - 新增 `paddle.diff`，用于计算沿给定维度的第 n 个前向差值，目前支持 n=1。([#37441](https://github.com/PaddlePaddle/Paddle/pull/37441)) 
+  
+  - 新增 `paddle.asinh`、`paddle.acosh`、`paddle.atanh` 反双曲函数类 API。 ([#37076](https://github.com/PaddlePaddle/Paddle/pull/37076)) 
+  
+  - 新增 `paddle.as_real`，`paddle.as_complex` 用于实数 Tensor 和复数 Tensor 之间的转换。 ([#37784](https://github.com/PaddlePaddle/Paddle/pull/37784)) 
+  
+  - 新增 `paddle.complex` 用于给定实部和虚部构造复数 Tensor。 ([#37918](https://github.com/PaddlePaddle/Paddle/pull/37918), [#38272](https://github.com/PaddlePaddle/Paddle/pull/38272)) 
+  
+  - 新增 `paddle.det` 与 `paddle.slogdet`，用于计算矩阵的行列式和行列式的自然对数。 ([#34992](https://github.com/PaddlePaddle/Paddle/pull/34992)) 
+  
+  - 新增`paddle.nn.utils.parameters_to_vector`，可以将输入的多个 parameter 展平并连接为1个1-D Tensor。([#38020](https://github.com/PaddlePaddle/Paddle/pull/38020)) 
+  
+  - 新增`paddle.nn.utils.vector_to_parameters`，将1个1-D Tensor按顺序切分给输入的多个 parameter。([#38020](https://github.com/PaddlePaddle/Paddle/pull/38020)) 
 
 - 新增组网类 API
-	- 新增`paddle.nn.functional.sparse_attention`，用于计算稀疏的Transformer Attention模块。([#35757](https://github.com/PaddlePaddle/Paddle/pull/35757))
-	- 新增 ``paddle.nn.MaxUnPool2D`` 和 ``paddle.nn.functional.max_unpool2d``，支持根据输入的input和最大值位置计算出池化的逆结果。([#35056](https://github.com/PaddlePaddle/Paddle/pull/35056))
-	- 新增 ``paddle.nn.functional.gumbel_softmax``，支持 ``gumbel softmax`` 采样。([#35506](https://github.com/PaddlePaddle/Paddle/pull/35506), [#36065](https://github.com/PaddlePaddle/Paddle/pull/36065), [#36094](https://github.com/PaddlePaddle/Paddle/pull/36094))
-	- 新增 ``paddle.nn.functional.class_center_sample``，支持 PartialFC 类中心采样功能。([#34106](https://github.com/PaddlePaddle/Paddle/pull/34106))
-	- 新增 ``paddle.nn.functional.margin_cross_entropy``，支持 ArcFace，CosFace，SphereFace 等 MarginLoss 功能。([#34247](https://github.com/PaddlePaddle/Paddle/pull/34247))
-	- ``paddle.nn.AvgPool2D``支持二阶导数。([#35388](https://github.com/PaddlePaddle/Paddle/pull/35388))
-	- ``paddle.nn.Linear、paddle.matmul、paddle.mm`` 支持二阶导数。[#35428](https://github.com/PaddlePaddle/Paddle/pull/35428)
-	- ``paddle.nn.GroupNorm``支持 (N, C, *) 形式的输入。([#34773](https://github.com/PaddlePaddle/Paddle/pull/34773))
-	- 新增 ``paddle.nn.BatchNorm1D/2D/3D`` 在 ``x.stop_gradient=True`` 的条件下计算反向。([#34102](https://github.com/PaddlePaddle/Paddle/pull/34102))
-	- 新增 ``paddle.nn.Dropout, paddle,nn.Dropout2D/3D`` 在 ``model.eval``模式下计算反向 。([#35122](https://github.com/PaddlePaddle/Paddle/pull/35122))
-
-- 新增硬件相关API
-	- 新增`paddle.device.cuda.Stream`,`paddle.device.cuda.Event`,`paddle.device.cuda.current_stream`,`paddle.device.cuda.synchronize` ， 支持在Python端对CUDA的event和 stream进行同步操作。([#32460](https://github.com/PaddlePaddle/Paddle/pull/32460))
-	- 新增 ``paddle.device.cuda.device_count``，支持返回当前可用GPU数量。([#34811](https://github.com/PaddlePaddle/Paddle/pull/34811))
-	- 新增 ``paddle.device.cuda.empty_cache``，支持清理空闲的显存。([#35427](https://github.com/PaddlePaddle/Paddle/pull/35427))
-	- 新增 ``paddle.device.cuda.get_device_properties``，支持返回给定的设备属性。([#35875](https://github.com/PaddlePaddle/Paddle/pull/35875))
-	- 新增 ``paddle.device.cuda.stream_guard``，用于动态图下 CUDA Stream的灵活切换。([#35623](https://github.com/PaddlePaddle/Paddle/pull/35623))
-	- 新增`paddle.device.cuda.get_device_name`，支持返回给定设备的名称。([#36172](https://github.com/PaddlePaddle/Paddle/pull/36172))
-	- 新增`paddle.device.cuda.get_device_capability`，支持返回给定设备计算能力的版本号。([#36172](https://github.com/PaddlePaddle/Paddle/pull/36172))
-	- 新增`paddle.framework.core.async_read`和`paddle.framework.core.async_write`，可支持非默认 CUDA `Stream`下`CUDAPinnedPlace` 和 `CUDAPlace` 的 `Tensor` 数据异步读写。([#36501](https://github.com/PaddlePaddle/Paddle/pull/36501))
-
-- 新增Tensor操作API
- - 新增`paddle.tensordot`，支持对高维张量做缩并(Tensor Contraction)运算。([#36454](https://github.com/PaddlePaddle/Paddle/pull/36454))
- - 新增`paddle.bincount`，支持对一维张量内元素进行计数。([#36709](https://github.com/PaddlePaddle/Paddle/pull/36709))
- - 新增 `paddle.broadcast_tensors` ，支持对一组 `Tensor` 进行广播操作。([#33294](https://github.com/PaddlePaddle/Paddle/pull/33294), [#34874](https://github.com/PaddlePaddle/Paddle/pull/34874))
- - 新增 `paddle.einsum` 。([#33821](https://github.com/PaddlePaddle/Paddle/pull/34874))
- - 增强``paddle.tensor.gradient``接口，支持sigmoid_op的二阶求导算子。([#32971](https://github.com/PaddlePaddle/Paddle/pull/32971))
- - 新增 ``paddle.searchsorted``，支持在有序``Tensor``中查找给定值的索引。([#35159](https://github.com/PaddlePaddle/Paddle/pull/35159))
- - 新增 ``paddle.unique_consecutive`` ，支持将 ``Tensor`` 中连续重复的元素进行去重，返回连续不重复的``Tensor``。([#34334](https://github.com/PaddlePaddle/Paddle/pull/34334))
- - 新增  ``paddle.diagflat``，支持返回以输入 ``Tensor`` 的元素为对角线的对角矩阵。([#33334](https://github.com/PaddlePaddle/Paddle/pull/33334))
- - 新增 ``paddle.lgamma``，支持逐元素计算 ``Tensor`` 的 ``lgamma`` 函数值。([#33913](https://github.com/PaddlePaddle/Paddle/pull/32913))
- - 新增 ``paddle.digamma``，支持逐元素计算 ``Tensor`` 的 ``digamma`` 函数值。([#33278](https://github.com/PaddlePaddle/Paddle/pull/33278))
- - 新增 ``paddle.neg``，支持逐元素计算 ``Tensor`` 的相反数值。([#33248](https://github.com/PaddlePaddle/Paddle/pull/33248))
- - 新增 ``paddle.cumprod``，支持根据给定维度计算 ``Tensor`` 累乘。([#35185](https://github.com/PaddlePaddle/Paddle/pull/35185))
- - 新增 ``paddle.atan2`` ，支持逐元素的 ``arctangent`` 运算，通过符号确定象限。([#33067](https://github.com/PaddlePaddle/Paddle/pull/33067))
- - 新增 ``paddle.expm1``，支持逐元素进行以 ``exp(x)-1`` 运算。 ([#33066](https://github.com/PaddlePaddle/Paddle/pull/33066))
- - 新增 ``paddle.trunc``，支持对输入的 ``Tensor`` 进行截断整数值。([#33371](https://github.com/PaddlePaddle/Paddle/pull/33371))
- - 新增 ``paddle.diagonal``，支持提取输入的 ``Tensor`` 的对角线元素。 ([#33586](https://github.com/PaddlePaddle/Paddle/pull/33586)) 
- - 新增``paddle.utils.dlpack``，包含： ``paddle.utils.dlpack.to_dlpack`` 和 ``paddle.utils.dlpack.from_dlpack``，利用 ``DLPack`` 支持不同框架间的 ``Tensor`` 传输。([#35067](https://github.com/PaddlePaddle/Paddle/pull/35067))
- - 新增 ``paddle.Tensor.uniform_``, 支持使用服从均匀分布的随机数原地填充一个``Tensor``。([#33394](https://github.com/PaddlePaddle/Paddle/pull/33934))
- - 新增 ``paddle.Tensor.T``，对 N-D Tensor 会进行转置，返回一个与原 Tensor 的shape相反的Tensor。([#35379](https://github.com/PaddlePaddle/Paddle/pull/35379)) 
- - 新增 ``paddle.Tensor`` 魔法操作符：&（按位与）、| （按位或）、^ （按位异或）、~（按位取反）。 ([#33524](https://github.com/PaddlePaddle/Paddle/pull/33524))
- - 新增 `paddle.Tensor.fill_`、`paddle.Tensor.zero_`，原地修改Tensor中的值，分别使用固定值填充、使用全零填充。([#33829](https://github.com/PaddlePaddle/Paddle/pull/33829)) 
- - 新增 `paddle.Tensor.fill_diagonal`、`paddle.Tensor.fill_diagonal` ,修改Tensor对角线元素值。([#34460](https://github.com/PaddlePaddle/Paddle/pull/34460)) 
- - 新增 `paddle.Tensor.fill_diagonal_tensor_`，对Tensor两个指定坐标轴的对角线与其他坐标轴形成的子Tensor进行整体修改。([#34515](https://github.com/PaddlePaddle/Paddle/pull/34515)) 
- - 动静态图 ``Tensor`` 新增多种索引类型的支持，包括：省略号（...）、维度扩增（None）、布尔类型数组（Bool Mask）、整数数组（list）以及张量（Tensor）。
-    - 省略号（...）索引：`X[..., 0]` 。([#34267](https://github.com/PaddlePaddle/Paddle/pull/34267), [#32876](https://github.com/PaddlePaddle/Paddle/pull/32876))
-    - 维度扩增（None）索引： `X[None, :]` 。([#34338](https://github.com/PaddlePaddle/Paddle/pull/34338), [#34442](https://github.com/PaddlePaddle/Paddle/pull/34442),  [#34877](https://github.com/PaddlePaddle/Paddle/pull/34877),  [#34911](https://github.com/PaddlePaddle/Paddle/pull/34911),  [#33001](https://github.com/PaddlePaddle/Paddle/pull/33001))
-	 - 布尔类型数组（Bool Mask）索引：`X[X > 0] = 0` 。 ([#35026](https://github.com/PaddlePaddle/Paddle/pull/35026),  [#35133](https://github.com/PaddlePaddle/Paddle/pull/35133),  [#33298](https://github.com/PaddlePaddle/Paddle/pull/33298))
-	 - 整数数组（list）索引：`X[[1, 0], [0]]` 。([#34824](https://github.com/PaddlePaddle/Paddle/pull/34824), [#33000](https://github.com/PaddlePaddle/Paddle/pull/33000),  [#35404](https://github.com/PaddlePaddle/Paddle/pull/35404))
-	 - 张量（Tensor）索引：`X[panddle.to_tensor([0, 1], [1, 0])]` 。([#34824](https://github.com/PaddlePaddle/Paddle/pull/34824))
-
-- 新增分布式相关API
-    - 新增 ``paddle.distributed.utils.global_scatter`` 和 `paddle.distributed.utils.global_gather`，支持 MOE 有条件分发数据，`global_scatter`会根据条件将数据分发到所有卡上，然后`global_gather`则会将数据根据条件从所有 GPU 卡上收集数据。([#35546](https://github.com/PaddlePaddle/Paddle/pull/35546))
-
-- 新增其他的API
-    -  新增 ``paddle.disable_signal_handler`` ，支持关闭PaddlePaddle中信号捕捉机制，从而使得用户可以同时使用Paddle与TVM。([#34577](https://github.com/PaddlePaddle/Paddle/pull/34577))
-    - 新增  ``paddle.incubate.softmax_mask_fuse ``，支持加速 Transformer 架构的 softmax 与 mask 的运算速度。([#33841](https://github.com/PaddlePaddle/Paddle/pull/33841))
-    - 新增  ``paddle.incubate.softmax_mask_fuse_upper_triangle ``，支持加速 GPT 版本的 Transformer 架构的 softmax 与 mask 的运算速度。([#33981](https://github.com/PaddlePaddle/Paddle/pull/33981))
-    - 新增  ``paddle.static.ExponentialMovingAverage``，支持用指数衰减计算参数的滑动平均值。([#35673](https://github.com/PaddlePaddle/Paddle/pull/35673))
-    - 新增 `` paddle::Tensor::slice`` C++ API， 支持 slice 操作，允许用户对外部 Tensor 切片操作。([#34227](https://github.com/PaddlePaddle/Paddle/pull/34227))
-    - 新增``paddle.incubate.segment_*``系列API，包含 ``paddle.incubate.segment_sum, paddle.incubate.segment_mean,  paddle.incubate.segment_max, paddle.incubate.segment_min``。支持对`Tensor`按照分段求和、求均值、求最大值、求最小值。 ([#35759](https://github.com/PaddlePaddle/Paddle/pull/35759))
-    - 新增`paddle.version.cuda`和`paddle.version.cudnn`，用于获取 paddle 安装包所使用的 `CUDA`和 `cuDNN`的版本号。([#36556](https://github.com/PaddlePaddle/Paddle/pull/36556))
+  
+  - 新增 `paddle.nn.Fold`、`paddle.nn.functional.fold`，支持将提取出的滑动局部区域块还原成 batch 的 Tensor。([#38613](https://github.com/PaddlePaddle/Paddle/pull/38613)) 
+  
+  - 新增 `paddle.nn.CELU`、`paddle.nn.functional.celu`，支持 CELU 激活层。([#36088](https://github.com/PaddlePaddle/Paddle/pull/36088)) 
+  
+  - 新增 `paddle.nn.HingeEmbeddingLoss`，增加计算 hinge embedding 损失的方式，通常用于学习 nonlinear embedding 或半监督学习。([#37540](https://github.com/PaddlePaddle/Paddle/pull/37540))
+  
+  - 新增 `paddle.nn.ZeroPad2D` API，按照 padding 属性对输入进行零填充。([#37151](https://github.com/PaddlePaddle/Paddle/pull/37151))
+  
+  - 新增 `paddle.nn.MaxUnPool3D` 和 `paddle.nn.MaxUnPool1D`，用于计算 3D 最大反池化和 1D 最大反池化。([#38716](https://github.com/PaddlePaddle/Paddle/pull/38716)) 
+  
+  - 新增 `paddle.incubate.graph_khop_sampler`、`paddle.incubate.graph_sample_neighbors`、 `paddle.incubate.graph_reindex` API，支持图多阶邻居采样和图编号重索引操作，主要用于图神经网络模型训练。([#39146](https://github.com/PaddlePaddle/Paddle/pull/39146), [#40809](https://github.com/PaddlePaddle/Paddle/pull/40809)) 
+
+- 新增随机数类 API
+  
+  - 新增 `paddle.poisson`，以输入 Tensor 为泊松分布的 lambda 参数，生成一个泊松分布的随机数 Tensor。([#38117](https://github.com/PaddlePaddle/Paddle/pull/38117)) 
+  
+  - 新增 `paddle.randint_like` API，支持新建服从均匀分布的、范围在[low, high) 的随机 Tensor，输出的形状与输入的形状一致。([#36169](https://github.com/PaddlePaddle/Paddle/pull/36169)) 
+  
+  - 新增 `paddle.Tensor.exponential_`，为 inplace 式 API，通过指数分布随机数来填充输入 Tensor。([#38256](https://github.com/PaddlePaddle/Paddle/pull/38256)) 
+
+- 新增参数初始化类 API
+  
+  - 新增`paddle.nn.initializer.Dirac`，通过迪拉克 delta 函数来初始化 3D/4D/5D 参数，其常用于卷积层 Conv1D/Conv2D/Conv3D 的参数初始化。([#37389](https://github.com/PaddlePaddle/Paddle/pull/37389)) 
+  
+  - 新增`paddle.nn.initializer.Orthogonal`，正交矩阵初始化，被初始化后的参数是（半）正交向量。([#37163](https://github.com/PaddlePaddle/Paddle/pull/37163)) 
+  
+  - 新增`paddle.nn.initializer.calculate_gain`，获取激活函数的推荐增益值，增益值可用于设置某些初始化 API，以调整初始化范围。([#37163](https://github.com/PaddlePaddle/Paddle/pull/37163)) 
+
+- 新增学习率类 API
+  
+  - 新增 `paddle.optimizer.lr.MultiplicativeDecay`，提供 `lambda` 函数设置学习率的策略。([#38250](https://github.com/PaddlePaddle/Paddle/pull/38250))
+
+- 新增分布式相关 API
+  
+  - 新增 `paddle.incubate.optimizer.DistributedFusedLamb`，使得 Lamb 优化器可分布式更新参数。([#40011](https://github.com/PaddlePaddle/Paddle/pull/40011), [#39972](https://github.com/PaddlePaddle/Paddle/pull/39972), [#39900](https://github.com/PaddlePaddle/Paddle/pull/39900), [#39747](https://github.com/PaddlePaddle/Paddle/pull/39747), [#39148](https://github.com/PaddlePaddle/Paddle/pull/39148), [#39416](https://github.com/PaddlePaddle/Paddle/pull/39416))
+
+- 新增优化器相关 API([#40710](https://github.com/PaddlePaddle/Paddle/pull/40710))
+  
+  - `paddle.incubate.optimizer.functional.minimize_bfgs`，增加二阶优化器 BFGS。
+  
+  - `paddle.incubate.optimizer.functional.minimize_lbfgs`，增加二阶优化器 L-BFGS。
+
+- 新增 `paddle.incubate.multiprocessing`模块，支持 Tensor（CPU/GPU）在 python 进程间传输。([#37302](https://github.com/PaddlePaddle/Paddle/pull/37302), [#41339](https://github.com/PaddlePaddle/Paddle/pull/41339)) 
 
 #### IR(Intermediate Representation)
-- 动态图转静态图 
-    - 新增动转静转写报错类型识别，并给出修改建议。 ([#35648](https://github.com/PaddlePaddle/Paddle/pull/35648)) 
-    - 新增对混合精度训练功能支持，``@to_static`` c支持一键转为静态图混合精度训练模式。 ([#34562](https://github.com/PaddlePaddle/Paddle/pull/34562))
-	- ``@to_static`` 中新增 ``build_strategy`` 参数，支持动转静后自定义开启相关 `Pass` 优化策略加速模型训练，如算子融合等。 ([#34347](https://github.com/PaddlePaddle/Paddle/pull/34347))
-	- 增加`a, b = static_variable` 的支持。([#33499](https://github.com/PaddlePaddle/Paddle/pull/33499))
-	- 新增二阶导能力支持。([#33110](https://github.com/PaddlePaddle/Paddle/pull/33110))
-
-- Program和Graph互转 ：``Program`` 和 ``Graph``是 飞桨框架底层用来表达计算的中间表示，对于飞桨的开发者而言，有时需要将 ``Program`` 和 ``Graph``互相转化来进行计算处理。本功能添加了 ``Program`` 和 ``Graph`` 互转相关能力。
-    - 开发完善 ``Program`` 和 ``Graph`` 相互转换功能。 ([#33949](https://github.com/PaddlePaddle/Paddle/pull/33949))
-    - 为了支持 `while` 等控制流节点，飞桨框架的 `Program` 中除了主 `block` 外，还可能包含多个子 `block`。之前 `Program` 转 `Graph` 的过程中，只将主 `block` 转化为 `Graph`，这里改进 `Graph`，支持表达子 `block`，实现完整的 `Program` 转 `Graph`。([#33320](https://github.com/PaddlePaddle/Paddle/pull/33320))
-    - 提供分析 `Program` 中控制流需要的依赖辅助函数。 ([#33439](https://github.com/PaddlePaddle/Paddle/pull/33439))
-    - `Program` 和 `Graph` 相互转换后保留训练所需要的 `stop_gradient` ,  `persistable` 属性值。([#33771](https://github.com/PaddlePaddle/Paddle/pull/33771)) 
-    - 原 `Pass` 只处理主`Graph`，忽略子图，现`Pass` 支持处理主 `Graph`及其所有子图。 ([#34158](https://github.com/PaddlePaddle/Paddle/pull/34158)) 
-    - 处理了在预测情况下 `Program` 和 `Graph` 互转的一些拓扑排序问题。([#34121](https://github.com/PaddlePaddle/Paddle/pull/34121), [#34521](https://github.com/PaddlePaddle/Paddle/pull/34521))
 
-- Pass开发
-    - 新增 Python 侧针对 fusion 等子图替换场景下的 Pass 开发方式。([#35708](https://github.com/PaddlePaddle/Paddle/pull/35708), [#35602](https://github.com/PaddlePaddle/Paddle/pull/35602))
+- 动态图转静态图
+  
+  - 变量类型 StaticAnalysis 模块新增支持类似 `a, b = paddle.shape(x)` 的类型标记。([#39245](https://github.com/PaddlePaddle/Paddle/pull/39245)) 
+  
+  - 新增支持 `InputSpec.name` 作为 Program 缓存 hash key 的计算字段。([#38273](https://github.com/PaddlePaddle/Paddle/pull/38273)) 
+  
+  - 新增支持 `dict['key'] = x.shape` 语法。([#40611](https://github.com/PaddlePaddle/Paddle/pull/40611)) 
+  
+  - 新增支持 Pure FP16 训练。([#36944](https://github.com/PaddlePaddle/Paddle/pull/36944)) 
+  
+  - 新增支持 `for i in [x,y,z]` 语法。([#37259](https://github.com/PaddlePaddle/Paddle/pull/37259)) 
+  
+  - 新增支持 python3 的 type hint 语法。([#36544](https://github.com/PaddlePaddle/Paddle/pull/36544)) 
 
-- Kernel Primitive API	
-    - 对算子 Kernel 实现中的底层代码进行了抽象与功能封装，提供高性能的 Block 级 IO 运算和 Compute 运算。使用 Kernel Primitive API 进行 Kernel 开发可以更加专注计算逻辑的实现，在保证性能的同时大幅减少代码量，同时实现了算子计算与硬件解耦。([#34672](https://github.com/PaddlePaddle/Paddle/pull/34672),  [#35075](https://github.com/PaddlePaddle/Paddle/pull/35075),  [#34456](https://github.com/PaddlePaddle/Paddle/pull/34456),  [#35282](https://github.com/PaddlePaddle/Paddle/pull/35282),  [#35743](https://github.com/PaddlePaddle/Paddle/pull/35743),  [#34208](https://github.com/PaddlePaddle/Paddle/pull/34208))
-    - 在 Kernel Primitive API中添加一元和二元计算Functor共13个。 ([#36418](https://github.com/PaddlePaddle/Paddle/pull/36418))
-    - 修改 Kernel Primitive API 中 ReadData 实现方式，修复`NX !=1`访存越界的问题。 ([#36373](https://github.com/PaddlePaddle/Paddle/pull/36373))
+- Pass开发
+  
+  - 新增基于 NVIDIA cuBlasLt Epilogue 的 FC + [relu|gelu] 的前向与反向融合。([#39437](https://github.com/PaddlePaddle/Paddle/pull/39437)）
+
+- Kernel Primitive API
+  
+  - 新增 GPU 平台 KP 算子，包括 cast、scale、clip、bce_loss、abs_grad、reduce_sum_grad、reduce_mean_grad、clip、bce_loss、full、full_like、distribution、 random、masked_select_kernel、where_index、masked_select_grad、dropout、sigmoid、where、abs_grad。 ([#36203](https://github.com/PaddlePaddle/Paddle/pull/36203), [#36423](https://github.com/PaddlePaddle/Paddle/pull/36423), [#39390](https://github.com/PaddlePaddle/Paddle/pull/39390), [#39734](https://github.com/PaddlePaddle/Paddle/pull/39734), [#38500](https://github.com/PaddlePaddle/Paddle/pull/38500), [#38959](https://github.com/PaddlePaddle/Paddle/pull/38959), [#39197](https://github.com/PaddlePaddle/Paddle/pull/39197/), [#39563](https://github.com/PaddlePaddle/Paddle/pull/39563), [#39666](https://github.com/PaddlePaddle/Paddle/pull/39666), [#40517](https://github.com/PaddlePaddle/Paddle/pull/40517), [#40617](https://github.com/PaddlePaddle/Paddle/pull/40617), [#40766](https://github.com/PaddlePaddle/Paddle/pull/40766), [#39898](https://github.com/PaddlePaddle/Paddle/pull/39898), [#39609](https://github.com/PaddlePaddle/Paddle/pull/39609)) 
+  
+  - 新增支持 XPU2 源码编译模式。([#37254](https://github.com/PaddlePaddle/Paddle/pull/37254), [#40397](https://github.com/PaddlePaddle/Paddle/pull/40397), [#38455](https://github.com/PaddlePaddle/Paddle/pull/38455)) 
+  
+  - 新增支持 KP 算子在 XPU2 和 GPU 中复用，包括 reduce、broadcast、elementwise_add、`exp、log、relu、sigmoid、leaky_relu、softplus、hard_swish、reciprocal`。([#36904](https://github.com/PaddlePaddle/Paddle/pull/36904), [#37226](https://github.com/PaddlePaddle/Paddle/pull/37226), [#38918](https://github.com/PaddlePaddle/Paddle/pull/38918), [#40560](https://github.com/PaddlePaddle/Paddle/pull/40560/), [#39787](https://github.com/PaddlePaddle/Paddle/pull/39787), [#39917](https://github.com/PaddlePaddle/Paddle/pull/39917), [#40002](https://github.com/PaddlePaddle/Paddle/pull/40002), [#40364](https://github.com/PaddlePaddle/Paddle/pull/40364))
+  
+  - 新增 XPU2 平台 KP 算子单测，包括 `brelu、ceil、celu、elu、floor、hard_shrink、hard_sigmoid、log1p、logsigmoid、relu6、silu、soft_relu、softsign、sqrt、square、swish、thresholded_relu、softshrink`。([#40448](https://github.com/PaddlePaddle/Paddle/pull/40448), [#40524](https://github.com/PaddlePaddle/Paddle/pull/40524)) 
+  
+  - 新增 XPU2 KP 模型支持，包括 resnet50、deepfm、wide_deep、yolov3-darknet53、det_mv3_db、bert、transformer、mobilenet_v3、GPT2。
 
 #### 混合精度训练
-- 动态图混合精度功能增强，新增整个任务使用半精度（float16）训练的方式，主要任务下的计算效率提升20%左右。 ([#35521](https://github.com/PaddlePaddle/Paddle/pull/35521))
-- 动态图混合精度 ``paddle.amp.GradScaler`` 新增 ``get`` 和 ``set`` 方法，方便用户设置。([#33835](https://github.com/PaddlePaddle/Paddle/pull/33835))
-- 动态图混合精度 ``paddle.amp.GradScaler`` 新增 ``state_dict`` 和 ``load_state_dict`` 方法。 ([#34300](https://github.com/PaddlePaddle/Paddle/pull/34300))
-- 动态图混合精度拆分 ``minimize``为 ``step`` + ``update`` ；并新增 ``unscale``方法。 ([#35927](https://github.com/PaddlePaddle/Paddle/pull/35927))
-- 动态图混合精度训练支持 param group。([#34899](https://github.com/PaddlePaddle/Paddle/pull/34899))
-- 静态图混合精度训练支持梯度裁剪。 ([#33565](https://github.com/PaddlePaddle/Paddle/pull/33565))
 
+- 从混合精度训练 `paddle.amp.GradScaler` 的 `minimize` 中拆分出 `paddle.amp.Gradscaler.unscale_` 方法，提供恢复 loss 的独立接口。([#35825](https://github.com/PaddlePaddle/Paddle/pull/35825)) 
+
+- 为 `paddle.nn.ClipByGlobalNorm` 动态图模式添加 FP16 支持，为clip op 添加 FP16 Kernel，使`clip`相关操作支持 FP16。([#36198](https://github.com/PaddlePaddle/Paddle/pull/36198), [#36577](https://github.com/PaddlePaddle/Paddle/pull/36577))
+
+- 支持 `paddle.amp.decorate` 传入的`optimizer`参数为 None。([#37541](https://github.com/PaddlePaddle/Paddle/pull/37541)) 
+
+- 为 merged_momentum op 添加支持输入多学习率、支持 use_nesterov 策略的计算、支持 regularization 计算。([#37527](https://github.com/PaddlePaddle/Paddle/pull/37527))
+
+- 为`paddle.optimizer.Momentum`优化器添加 multi_tensor 策略、为`Optimzizer`类的`clear_grad`添加`set_to_zero`分支。([#37564](https://github.com/PaddlePaddle/Paddle/pull/37564)) 
+
+- 为`paddle.optimizer.Adam`优化器添加 multi_tensor 策略。([#38010](https://github.com/PaddlePaddle/Paddle/pull/38010)) 
+
+- 为`paddle.optimizer.SGD`优化器添加 multi_precision 策略。([#38231](https://github.com/PaddlePaddle/Paddle/pull/38231)) 
+
+- 为优化器 `state_dict` 方法添加存储 `master weight` 参数。([#39121](https://github.com/PaddlePaddle/Paddle/pull/39121)) 
+
+- 添加支持 op CUDA bfloat16 混合精度训练，支持 O1、O2 模式，通过 `paddle.amp.auto_cast`可开启上述训练模式。([#39029](https://github.com/PaddlePaddle/Paddle/pull/39029), [#39815](https://github.com/PaddlePaddle/Paddle/pull/39815))
+
+- 为如下 ops 添加 bfloat16 CUDA Kernel：matmul、concat、split、dropout、reshape、slice、squeeze、stack、transpose、unbind、elementwize_max、elementwize_add、elementwize_mul、elementwize_sub、scale、sum、layer_norm、p_norm、reduce_sum、softmax、log_softmax、sigmoid、sqrt、softplus、square、gaussian_random、fill_constant、fill_any_like。([#39485](https://github.com/PaddlePaddle/Paddle/pull/39485), [#39380](https://github.com/PaddlePaddle/Paddle/pull/39380), [#39395](https://github.com/PaddlePaddle/Paddle/pull/39380), [#39402](https://github.com/PaddlePaddle/Paddle/pull/39402), [#39457](https://github.com/PaddlePaddle/Paddle/pull/39457), [#39461](https://github.com/PaddlePaddle/Paddle/pull/39461), [#39602](https://github.com/PaddlePaddle/Paddle/pull/39602), [#39716](https://github.com/PaddlePaddle/Paddle/pull/39716), [#39683](https://github.com/PaddlePaddle/Paddle/pull/39683), [#39843](https://github.com/PaddlePaddle/Paddle/pull/39843), [#39999](https://github.com/PaddlePaddle/Paddle/pull/39999), [#40004](https://github.com/PaddlePaddle/Paddle/pull/40004), [#40027](https://github.com/PaddlePaddle/Paddle/pull/40027)) 
+
+- 为如下 ops 添加 bfloat16 CPU Kernel：dropout、reshape、slice、squeeze、unsqueeze、stack、transpose、unbind、elementwize_max、elementwise_mul、elementwise_sub、gather。 ([#39380](https://github.com/PaddlePaddle/Paddle/pull/39380), [#39395](https://github.com/PaddlePaddle/Paddle/pull/39380), [#39402](https://github.com/PaddlePaddle/Paddle/pull/39402), [#39457](https://github.com/PaddlePaddle/Paddle/pull/39457), [#39461](https://github.com/PaddlePaddle/Paddle/pull/39461), [#39602](https://github.com/PaddlePaddle/Paddle/pull/39602), [#39716](https://github.com/PaddlePaddle/Paddle/pull/39716), [#39683](https://github.com/PaddlePaddle/Paddle/pull/39683)) 
+
+- 支持打印 bfloat16 类型的 Tensor。([#39375](https://github.com/PaddlePaddle/Paddle/pull/39375), [#39370](https://github.com/PaddlePaddle/Paddle/pull/39370))
+
+- 为`p_norm`、`elementwise_max` 、`fill_constant_batch_size_like``scatter`增加 FP16 计算支持。([#35888](https://github.com/PaddlePaddle/Paddle/pull/35888), [#39907](https://github.com/PaddlePaddle/Paddle/pull/39907), [#38136](https://github.com/PaddlePaddle/Paddle/pull/38136), [#38499](https://github.com/PaddlePaddle/Paddle/pull/38499))
+
+- 为如下 ops 增加 int16_t 支持：cumsum、less_than、less_equal、greater_than、greater_equal、equal、not_equal、fill_any_like、grather_nd、reduce_sum、where_index、reshape、unsqueeze。([#39636](https://github.com/PaddlePaddle/Paddle/pull/39636)) 
+
+- 为 cross_entropy op 增加 int16_t label 类型的支持。([#39409](https://github.com/PaddlePaddle/Paddle/pull/39409)) 
+
+- 为 embedding op 增加 int16_t id 类型的支持。([#39381](https://github.com/PaddlePaddle/Paddle/pull/39381))
+
+- 为 reduce_mean op 增加 FP16 类型的支持。([#38289](https://github.com/PaddlePaddle/Paddle/pull/38289))
+
+- 为 elementwise_min op 增加 FP16 类型的支持。([#38123](https://github.com/PaddlePaddle/Paddle/pull/38123))
+
+- 更新 bfloat16 AMP oneDNN 默认支持列表。([#39304](https://github.com/PaddlePaddle/Paddle/pull/39304)) 
+
+#### 飞桨高可复用算子库 PHI
+
+针对飞桨框架原算子库存在的算子接口不清晰、算子复用成本较高、调用性能不够快的问题，我们重构了飞桨框架的算子库，设计了灵活、高效的函数式算子库 PHI，可以通过对函数式算子接口组合调用的方式实现新算子。新算子库提供了 200 余个跟 python 开发接口保持一致的 C++ 运算类 API，以及近500个可供组合调用的前、反向函数式算子内核 Kernel，可大幅降低框架原生算子和自定义算子的开发成本。新算子库支持Primitive API方式开发算子内核，可支持不同硬件（比如GPU和XPU）的算子内核复用。新算子库支持以插件方式接入硬件（比如NPU）的加速库，实现低成本复用硬件加速库。主要可分为以下几部分工作：
+
+- **算子库基础架构、核心组件与机制实现**：合理规划新算子库的目录结构，设计实现了新算子库的公共基础数据结构、新的函数式 InferMeta 和 Kernel 开发范式以及相应的注册和管理组件，并且支持 Kernel 文件的自动化编译对象生成及编译依赖关系生成，使开发者仅需关注函数式 Kernel 的实现，开发范式简洁清晰。([#34425](https://github.com/PaddlePaddle/Paddle/pull/34425), [#37107](https://github.com/PaddlePaddle/Paddle/pull/37107), [#36946](https://github.com/PaddlePaddle/Paddle/pull/36946), [#36948](https://github.com/PaddlePaddle/Paddle/pull/36948), [#37876](https://github.com/PaddlePaddle/Paddle/pull/37876), [#37916](https://github.com/PaddlePaddle/Paddle/pull/37916), [#37977](https://github.com/PaddlePaddle/Paddle/pull/37977), [38078](https://github.com/PaddlePaddle/Paddle/pull/38078), [#38861](https://github.com/PaddlePaddle/Paddle/pull/38861), [#39123](https://github.com/PaddlePaddle/Paddle/pull/39123), [#39131](https://github.com/PaddlePaddle/Paddle/pull/39131), [#39748](https://github.com/PaddlePaddle/Paddle/pull/39748), [#39790](https://github.com/PaddlePaddle/Paddle/pull/39790), [#39941](https://github.com/PaddlePaddle/Paddle/pull/39941), [#40239](https://github.com/PaddlePaddle/Paddle/pull/40239), [#40635](https://github.com/PaddlePaddle/Paddle/pull/40635), [#41091](https://github.com/PaddlePaddle/Paddle/pull/41091), [#37409](https://github.com/PaddlePaddle/Paddle/pull/37409), [#37942](https://github.com/PaddlePaddle/Paddle/pull/37942), [#39002](https://github.com/PaddlePaddle/Paddle/pull/39002), [#38109](https://github.com/PaddlePaddle/Paddle/pull/38109), [#37881](https://github.com/PaddlePaddle/Paddle/pull/37881), [#37517](https://github.com/PaddlePaddle/Paddle/pull/37517), [#39870](https://github.com/PaddlePaddle/Paddle/pull/39870), [#40975](https://github.com/PaddlePaddle/Paddle/pull/40975), [#39475](https://github.com/PaddlePaddle/Paddle/pull/39475), [#37304](https://github.com/PaddlePaddle/Paddle/pull/37304), #36910, #37120, #37146, #37215, #37255, #37369, #38258, #38257, #38355, #38853, #38937, #38977, #38946, #39085, #39153, #39228, #38301, #38275, #38506, #38607, #38473, #38632, #38811, #38880, #38996, #38914, #39101)
+
+- **算子库C++ API体系建设**：设计实现了基于 yaml 配置文件的算子定义范式、自动生成了200余个C++运算类 API，供内外部开发者复用，降低了基础运算的重复开发成本。([#37668](https://github.com/PaddlePaddle/Paddle/pull/37668), [#36938](https://github.com/PaddlePaddle/Paddle/pull/36938), [#38172](https://github.com/PaddlePaddle/Paddle/pull/38172), [#38182](https://github.com/PaddlePaddle/Paddle/pull/38182), [#38311](https://github.com/PaddlePaddle/Paddle/pull/38311), [#38438](https://github.com/PaddlePaddle/Paddle/pull/38438), [#39057](https://github.com/PaddlePaddle/Paddle/pull/39057), [#39229](https://github.com/PaddlePaddle/Paddle/pull/39229), [#39281](https://github.com/PaddlePaddle/Paddle/pull/39281), [#39263](https://github.com/PaddlePaddle/Paddle/pull/39263), [#39408](https://github.com/PaddlePaddle/Paddle/pull/39408), [#39436](https://github.com/PaddlePaddle/Paddle/pull/39436), [#39482](https://github.com/PaddlePaddle/Paddle/pull/39482), [#39497](https://github.com/PaddlePaddle/Paddle/pull/39497), [#39651](https://github.com/PaddlePaddle/Paddle/pull/39651), [#39521](https://github.com/PaddlePaddle/Paddle/pull/39521), [#39760](https://github.com/PaddlePaddle/Paddle/pull/39760), [#40060](https://github.com/PaddlePaddle/Paddle/pull/40060), [#40196](https://github.com/PaddlePaddle/Paddle/pull/40196), [#40218](https://github.com/PaddlePaddle/Paddle/pull/40218), [#40640](https://github.com/PaddlePaddle/Paddle/pull/40640), [#40732](https://github.com/PaddlePaddle/Paddle/pull/40732), [#40729](https://github.com/PaddlePaddle/Paddle/pull/40729), [#40840](https://github.com/PaddlePaddle/Paddle/pull/40840), [#40867](https://github.com/PaddlePaddle/Paddle/pull/40867), [#41025](https://github.com/PaddlePaddle/Paddle/pull/41025), [#41368](https://github.com/PaddlePaddle/Paddle/pull/41368))
+
+- **算子库兼容各执行体系**：实现新的 InferMeta 及 Kernel 接入原动静态图执行体系、支持原 OpKernel 注册安全移除并迁移为新的 Kernel 形式。([#34425](https://github.com/PaddlePaddle/Paddle/pull/34425), [#38825](https://github.com/PaddlePaddle/Paddle/pull/38825), [#38837](https://github.com/PaddlePaddle/Paddle/pull/38837), [#38842](https://github.com/PaddlePaddle/Paddle/pull/38842), [#38976](https://github.com/PaddlePaddle/Paddle/pull/38976), [#39134](https://github.com/PaddlePaddle/Paddle/pull/39134), [#39140](https://github.com/PaddlePaddle/Paddle/pull/39140), [#39135](https://github.com/PaddlePaddle/Paddle/pull/39135), [#39252](https://github.com/PaddlePaddle/Paddle/pull/39252), [#39222](https://github.com/PaddlePaddle/Paddle/pull/39222), [#39351](https://github.com/PaddlePaddle/Paddle/pull/39351))
+
+- **算子库底层数据结构及工具函数与框架解耦**：解除 Phi 在核心数据结构上对 框架的依赖，为后续 Phi 独立编译奠定基础，支持 infrt、自定义 Kernel 等一系列基于 Phi 的建设工作 ([#38583](https://github.com/PaddlePaddle/Paddle/pull/38583), [#39188](https://github.com/PaddlePaddle/Paddle/pull/39188), [#39560](https://github.com/PaddlePaddle/Paddle/pull/39560), [#39931](https://github.com/PaddlePaddle/Paddle/pull/39931), [#39169](https://github.com/PaddlePaddle/Paddle/pull/39169), [#38951](https://github.com/PaddlePaddle/Paddle/pull/38951), [#38898](https://github.com/PaddlePaddle/Paddle/pull/38898), [#38873](https://github.com/PaddlePaddle/Paddle/pull/38873), [#38696](https://github.com/PaddlePaddle/Paddle/pull/38696), [#38651](https://github.com/PaddlePaddle/Paddle/pull/38651), [#39359](https://github.com/PaddlePaddle/Paddle/pull/39359), [#39305](https://github.com/PaddlePaddle/Paddle/pull/39305), [#39234](https://github.com/PaddlePaddle/Paddle/pull/39234), [#39098](https://github.com/PaddlePaddle/Paddle/pull/39098), [#39120](https://github.com/PaddlePaddle/Paddle/pull/39120), [#38979](https://github.com/PaddlePaddle/Paddle/pull/38979), [#38899](https://github.com/PaddlePaddle/Paddle/pull/38899), [#38844](https://github.com/PaddlePaddle/Paddle/pull/38844), [#39714](https://github.com/PaddlePaddle/Paddle/pull/39714), [#39729](https://github.com/PaddlePaddle/Paddle/pull/39729), [#39889](https://github.com/PaddlePaddle/Paddle/pull/39889), [#39587](https://github.com/PaddlePaddle/Paddle/pull/39587), [#39558](https://github.com/PaddlePaddle/Paddle/pull/39558), [#39514](https://github.com/PaddlePaddle/Paddle/pull/39514), [#39502](https://github.com/PaddlePaddle/Paddle/pull/39502), [#39300](https://github.com/PaddlePaddle/Paddle/pull/39300), [#39246](https://github.com/PaddlePaddle/Paddle/pull/39246), [#39124](https://github.com/PaddlePaddle/Paddle/pull/39124))
+
+- **自定义算子机制与 Phi 整合并完善**：支持在自定义算子编写时调用 Phi 自动生成的200余个C++运算类 API，降低自定义算子开发成本，并进行一系列问题修复。([#37122](https://github.com/PaddlePaddle/Paddle/pull/37122), [#37276](https://github.com/PaddlePaddle/Paddle/pull/37276), [#37281](https://github.com/PaddlePaddle/Paddle/pull/37281), [#37262](https://github.com/PaddlePaddle/Paddle/pull/37281), [#37415](https://github.com/PaddlePaddle/Paddle/pull/37415), [#37423](https://github.com/PaddlePaddle/Paddle/pull/37423), [#37583](https://github.com/PaddlePaddle/Paddle/pull/37683), [#38776](https://github.com/PaddlePaddle/Paddle/pull/38776), [#39353](https://github.com/PaddlePaddle/Paddle/pull/39353), [#41072](https://github.com/PaddlePaddle/Paddle/pull/41072))
+
+- **算子规模化迁移改写**：迁移了约250个高频算子的前、反向算子内核 Kernel 至新算子库，改写为函数式，支持在 C++端通过调用多个基础 Kernel 函数封装，快速组合实现高性能算子；同时，添加相应的 yaml 算子定义，并接入新动态图执行体系，提升 python API 调度性能。迁移改写的算子包括：
+  
+  - sqrt （[#40727](https://github.com/PaddlePaddle/Paddle/pull/40727)）
+  
+  - square（[#40727](https://github.com/PaddlePaddle/Paddle/pull/40727)）
+  
+  - sin ([#40175](https://github.com/PaddlePaddle/Paddle/pull/40175))
+  
+  - sinh ([#40175](https://github.com/PaddlePaddle/Paddle/pull/40175))
+  
+  - elementwise_fmax（[#40140](https://github.com/PaddlePaddle/Paddle/pull/40140)）
+  
+  - elementwise_fmin（[#40140](https://github.com/PaddlePaddle/Paddle/pull/40140)）
+  
+  - pool2d（[#40208](https://github.com/PaddlePaddle/Paddle/pull/40208), [#41053](https://github.com/PaddlePaddle/Paddle/pull/41053)）
+  
+  - max_pool2d_with_index（[#40208](https://github.com/PaddlePaddle/Paddle/pull/40208), [#41053](https://github.com/PaddlePaddle/Paddle/pull/41053)）
+  
+  - pool3d（[#40208](https://github.com/PaddlePaddle/Paddle/pull/40208), [#41053](https://github.com/PaddlePaddle/Paddle/pull/41053)）
+  
+  - max_pool3d_with_index（[#40208](https://github.com/PaddlePaddle/Paddle/pull/40208), [#41053](https://github.com/PaddlePaddle/Paddle/pull/41053)）
+  
+  - fill_constant ([#36930](https://github.com/PaddlePaddle/Paddle/pull/36930), [#39465](https://github.com/PaddlePaddle/Paddle/pull/39465))
+  
+  - p_norm ([#40819](https://github.com/PaddlePaddle/Paddle/pull/40819))
+  
+  - fill_constant_batch_size_like ([#40784](https://github.com/PaddlePaddle/Paddle/pull/40784))
+  
+  - conv2d（[#39354](https://github.com/PaddlePaddle/Paddle/pull/39354)）
+  
+  - conv2d_transpose（[#40675](https://github.com/PaddlePaddle/Paddle/pull/40675), [#41053](https://github.com/PaddlePaddle/Paddle/pull/41053)）
+  
+  - conv3d（[#39354](https://github.com/PaddlePaddle/Paddle/pull/39354)）
+  
+  - conv3d_transpose（[#40675](https://github.com/PaddlePaddle/Paddle/pull/40675), [#41053](https://github.com/PaddlePaddle/Paddle/pull/41053)）
+  
+  - mish（[#40727](https://github.com/PaddlePaddle/Paddle/pull/40727)）
+  
+  - gather_nd ([#40090](https://github.com/PaddlePaddle/Paddle/pull/40090), [#40043](https://github.com/PaddlePaddle/Paddle/pull/40043))
+  
+  - gather ([#40500](https://github.com/PaddlePaddle/Paddle/pull/40500))
+  
+  - scatter ([#40090](https://github.com/PaddlePaddle/Paddle/pull/40090), [#40043](https://github.com/PaddlePaddle/Paddle/pull/40043))
+  
+  - scatter_nd_add ([#40090](https://github.com/PaddlePaddle/Paddle/pull/40090), [#40043](https://github.com/PaddlePaddle/Paddle/pull/40043))
+  
+  - sgd（[40045](https://github.com/PaddlePaddle/Paddle/pull/40045)）
+  
+  - momentum ([#41319](https://github.com/PaddlePaddle/Paddle/pull/41319))
+  
+  - rmsprop（[#40994](https://github.com/PaddlePaddle/Paddle/pull/40994)）
+  
+  - index_sample（[#38130](https://github.com/PaddlePaddle/Paddle/pull/38130), [#38459](https://github.com/PaddlePaddle/Paddle/pull/38459),[#39905](https://github.com/PaddlePaddle/Paddle/pull/39905)）
+  
+  - adam ([#40351](https://github.com/PaddlePaddle/Paddle/pull/40351))
+  
+  - layer_norm（[#40193](https://github.com/PaddlePaddle/Paddle/pull/40193)）
+  
+  - adagrad（[#40994](https://github.com/PaddlePaddle/Paddle/pull/40994/)）
+  
+  - adamax ([#40173](https://github.com/PaddlePaddle/Paddle/pull/40173))
+  
+  - adadelta ([#40173](https://github.com/PaddlePaddle/Paddle/pull/40173))
+  
+  - clip（[#40602](https://github.com/PaddlePaddle/Paddle/pull/40602), [#41661](https://github.com/PaddlePaddle/Paddle/pull/41661), [#41675](https://github.com/PaddlePaddle/Paddle/pull/41675)）
+  
+  - ceil ([#40913](https://github.com/PaddlePaddle/Paddle/pull/40913))
+  
+  - cos ([#40175](https://github.com/PaddlePaddle/Paddle/pull/40175))
+  
+  - atan ([#40175](https://github.com/PaddlePaddle/Paddle/pull/40175))
+  
+  - cosh ([#40175](https://github.com/PaddlePaddle/Paddle/pull/40175))
+  
+  - erf（[#40388](https://github.com/PaddlePaddle/Paddle/pull/40388)）
+  
+  - asin ([#40175](https://github.com/PaddlePaddle/Paddle/pull/40175))
+  
+  - acos ([#40175](https://github.com/PaddlePaddle/Paddle/pull/40175))
+  
+  - scale ([#39278](https://github.com/PaddlePaddle/Paddle/pull/39278))
+  
+  - elementwise_pow ([#40993](https://github.com/PaddlePaddle/Paddle/pull/40993))
+  
+  - elementwise_sub ([#39225](https://github.com/PaddlePaddle/Paddle/pull/39225), [#37260](https://github.com/PaddlePaddle/Paddle/pull/37260))
+  
+  - round ([#40913](https://github.com/PaddlePaddle/Paddle/pull/40913))
+  
+  - floor ([#40913](https://github.com/PaddlePaddle/Paddle/pull/40913))
+  
+  - pow ([#40913](https://github.com/PaddlePaddle/Paddle/pull/40913))
+  
+  - elementwise_floordiv ([#40993](https://github.com/PaddlePaddle/Paddle/pull/40993))
+  
+  - reciprocal（[#40727](https://github.com/PaddlePaddle/Paddle/pull/40727)）
+  
+  - log1p ([#40785](https://github.com/PaddlePaddle/Paddle/pull/40785))
+  
+  - allclose ([#40469](https://github.com/PaddlePaddle/Paddle/pull/40469))
+  
+  - mul ([#40833](https://github.com/PaddlePaddle/Paddle/pull/40833))
+  
+  - elementwise_max ([#40590](https://github.com/PaddlePaddle/Paddle/pull/40590))
+  
+  - elementwise_min ([#40590](https://github.com/PaddlePaddle/Paddle/pull/40590))
+  
+  - elementwise_mod ([#40590](https://github.com/PaddlePaddle/Paddle/pull/40590))
+  
+  - elementwise_add ([#39048](https://github.com/PaddlePaddle/Paddle/pull/39048), [#37043](https://github.com/PaddlePaddle/Paddle/pull/37043))
+  
+  - matmul_v2 ([#36844](https://github.com/PaddlePaddle/Paddle/pull/36844), [#38713](https://github.com/PaddlePaddle/Paddle/pull/38713))
+  
+  - elementwise_mul ([#41042](https://github.com/PaddlePaddle/Paddle/pull/41042), [#40252](https://github.com/PaddlePaddle/Paddle/pull/40252), [#37471](https://github.com/PaddlePaddle/Paddle/pull/37471))
+  
+  - elementwise_div ([#40172](https://github.com/PaddlePaddle/Paddle/pull/40172), [#40039](https://github.com/PaddlePaddle/Paddle/pull/40039), [#37418](https://github.com/PaddlePaddle/Paddle/pull/37418))
+  
+  - SelectedRows ([#39037](https://github.com/PaddlePaddle/Paddle/pull/39037), [#39087](https://github.com/PaddlePaddle/Paddle/pull/39087), [#39128](https://github.com/PaddlePaddle/Paddle/pull/39128), [#39162](https://github.com/PaddlePaddle/Paddle/pull/39162), [#39236](https://github.com/PaddlePaddle/Paddle/pull/39236)) 
+  
+  - fill_any_like ([#39807](https://github.com/PaddlePaddle/Paddle/pull/39807))
+  
+  - dot（[#38359](https://github.com/PaddlePaddle/Paddle/pull/38359)）
+  
+  - sum ([#40873](https://github.com/PaddlePaddle/Paddle/pull/40873))
+  
+  - cumsum ([#39976](https://github.com/PaddlePaddle/Paddle/pull/39976), [#40200](https://github.com/PaddlePaddle/Paddle/pull/40200))
+  
+  - diag_v2 ([#39914](https://github.com/PaddlePaddle/Paddle/pull/39914))
+  
+  - auc ([#39976](https://github.com/PaddlePaddle/Paddle/pull/39976), [#40200](https://github.com/PaddlePaddle/Paddle/pull/40200))
+  
+  - log_loss ([#39976](https://github.com/PaddlePaddle/Paddle/pull/39976), [#40200](https://github.com/PaddlePaddle/Paddle/pull/40200))
+  
+  - one_hot_v2（[39876](https://github.com/PaddlePaddle/Paddle/pull/39876)）
+  
+  - sigmoid_cross_entropy_with_logits ([#39976](https://github.com/PaddlePaddle/Paddle/pull/39976), [#40200](https://github.com/PaddlePaddle/Paddle/pull/40200))
+  
+  - bce_loss ([#39868](https://github.com/PaddlePaddle/Paddle/pull/39868))
+  
+  - argsort ([#40151](https://github.com/PaddlePaddle/Paddle/pull/40151))
+  
+  - arg_max ([#40222](https://github.com/PaddlePaddle/Paddle/pull/40222))
+  
+  - arg_min ([#40222](https://github.com/PaddlePaddle/Paddle/pull/40222))
+  
+  - segment_pool ([#40099](https://github.com/PaddlePaddle/Paddle/pull/40099))
+  
+  - frobenius_norm（[#40707](https://github.com/PaddlePaddle/Paddle/pull/40707), [#41053](https://github.com/PaddlePaddle/Paddle/pull/41053)）
+  
+  - dist ([#40178](https://github.com/PaddlePaddle/Paddle/pull/40178))
+  
+  - isnan_v2 ([#40076](https://github.com/PaddlePaddle/Paddle/pull/40076))
+  
+  - logical_and ([#39942](https://github.com/PaddlePaddle/Paddle/pull/39942))
+  
+  - logical_not ([#39942](https://github.com/PaddlePaddle/Paddle/pull/39942))
+  
+  - isfinite_v2 ([#40076](https://github.com/PaddlePaddle/Paddle/pull/40076))
+  
+  - logical_or ([#39942](https://github.com/PaddlePaddle/Paddle/pull/39942))
+  
+  - isinf_v2 ([#40076](https://github.com/PaddlePaddle/Paddle/pull/40076))
+  
+  - is_empty ([#39919](https://github.com/PaddlePaddle/Paddle/pull/39919))
+  
+  - logical_xor ([#39942](https://github.com/PaddlePaddle/Paddle/pull/39942))
+  
+  - less_than（[#39970](https://github.com/PaddlePaddle/Paddle/pull/39970)）
+  
+  - not_equal（[#39970](https://github.com/PaddlePaddle/Paddle/pull/39970)）
+  
+  - equal（[#39970](https://github.com/PaddlePaddle/Paddle/pull/39970)）
+  
+  - less_equal（[#39970](https://github.com/PaddlePaddle/Paddle/pull/39970)）
+  
+  - equal_all（[#39970](https://github.com/PaddlePaddle/Paddle/pull/39970)）
+  
+  - uniform_random ([#39937](https://github.com/PaddlePaddle/Paddle/pull/39937))
+  
+  - randint ([#39876](https://github.com/PaddlePaddle/Paddle/pull/39876), [#41375](https://github.com/PaddlePaddle/Paddle/pull/41375))
+  
+  - randperm ([#41265](https://github.com/PaddlePaddle/Paddle/pull/41265))
+  
+  - unbind ([#39789](https://github.com/PaddlePaddle/Paddle/pull/39789))
+  
+  - bernoulli ([#39590](https://github.com/PaddlePaddle/Paddle/pull/39590))
+  
+  - increment ([#39858](https://github.com/PaddlePaddle/Paddle/pull/39858), [#39913](https://github.com/PaddlePaddle/Paddle/pull/39913))
+  
+  - multinomial ([#39858](https://github.com/PaddlePaddle/Paddle/pull/39858), [#39913](https://github.com/PaddlePaddle/Paddle/pull/39913))
+  
+  - addmm ([#39858](https://github.com/PaddlePaddle/Paddle/pull/39858), [#39913](https://github.com/PaddlePaddle/Paddle/pull/39913))
+  
+  - cholesky ([#39858](https://github.com/PaddlePaddle/Paddle/pull/39858), [#39913](https://github.com/PaddlePaddle/Paddle/pull/39913))
+  
+  - where ([#39811](https://github.com/PaddlePaddle/Paddle/pull/39811))
+  
+  - log10 ([#40785](https://github.com/PaddlePaddle/Paddle/pull/40785))
+  
+  - log2 ([#40785](https://github.com/PaddlePaddle/Paddle/pull/40785))
+  
+  - expm1（[#40727](https://github.com/PaddlePaddle/Paddle/pull/40727)）
+  
+  - atan2 ([#39806](https://github.com/PaddlePaddle/Paddle/pull/39806))
+  
+  - gaussian_random ([#39932](https://github.com/PaddlePaddle/Paddle/pull/39932), [#40122](https://github.com/PaddlePaddle/Paddle/pull/40122), [#40191](https://github.com/PaddlePaddle/Paddle/pull/40191))
+  
+  - empty ([#38334](https://github.com/PaddlePaddle/Paddle/pull/38334))
+  
+  - truncated_gaussian_random ([#39971](https://github.com/PaddlePaddle/Paddle/pull/39971), [#40191](https://github.com/PaddlePaddle/Paddle/pull/40191))
+  
+  - mv ([#39861](https://github.com/PaddlePaddle/Paddle/pull/39861), [#39954](https://github.com/PaddlePaddle/Paddle/pull/39954))
+  
+  - tan ([#40175](https://github.com/PaddlePaddle/Paddle/pull/40175))
+  
+  - set_value ([#40195](https://github.com/PaddlePaddle/Paddle/pull/40195), [#40478](https://github.com/PaddlePaddle/Paddle/pull/40478), [#40636](https://github.com/PaddlePaddle/Paddle/pull/40636))
+  
+  - bitwise_and （[#40031](https://github.com/PaddlePaddle/Paddle/pull/40031)）
+  
+  - bitwise_not（[#40031](https://github.com/PaddlePaddle/Paddle/pull/40031)）
+  
+  - bitwise_or（[#40031](https://github.com/PaddlePaddle/Paddle/pull/40031)）
+  
+  - poisson（[#39814](https://github.com/PaddlePaddle/Paddle/pull/39814)）
+  
+  - cholesky_solve（[#40387](https://github.com/PaddlePaddle/Paddle/pull/40387)）
+  
+  - bitwise_xor（[#40031](https://github.com/PaddlePaddle/Paddle/pull/40031)）
+  
+  - triangular_solve（[#40417](https://github.com/PaddlePaddle/Paddle/pull/40417)）
+  
+  - sigmoid ([#40626](https://github.com/PaddlePaddle/Paddle/pull/40626))
+  
+  - atanh ([#40175](https://github.com/PaddlePaddle/Paddle/pull/40175))
+  
+  - softsign（[#40727](https://github.com/PaddlePaddle/Paddle/pull/40727)）
+  
+  - thresholded_relu ([#40385](https://github.com/PaddlePaddle/Paddle/pull/40385))
+  
+  - tanh_shrink ([#40565](https://github.com/PaddlePaddle/Paddle/pull/40565))
+  
+  - stanh（[#40727](https://github.com/PaddlePaddle/Paddle/pull/40727)）
+  
+  - reduce_mean ([#37559](https://github.com/PaddlePaddle/Paddle/pull/37559))
+  
+  - reduce_max（[#40225](https://github.com/PaddlePaddle/Paddle/pull/40225)）
+  
+  - reduce_min ([#40374](https://github.com/PaddlePaddle/Paddle/pull/40374))
+  
+  - mean ([#40872](https://github.com/PaddlePaddle/Paddle/pull/40872), [#41319](https://github.com/PaddlePaddle/Paddle/pull/41319))
+  
+  - reduce_all ([#40374](https://github.com/PaddlePaddle/Paddle/pull/40374))
+  
+  - reduce_any ([#40374](https://github.com/PaddlePaddle/Paddle/pull/40374))
+  
+  - logsumexp ([#40790](https://github.com/PaddlePaddle/Paddle/pull/40790))
+  
+  - softshrink（[#40565](https://github.com/PaddlePaddle/Paddle/pull/40565)）
+  
+  - range ([#41265](https://github.com/PaddlePaddle/Paddle/pull/41265), [#40581](https://github.com/PaddlePaddle/Paddle/pull/40851))
+  
+  - stack（[#40581](https://github.com/PaddlePaddle/Paddle/pull/40851)）
+  
+  - tile ([#40371](https://github.com/PaddlePaddle/Paddle/pull/40371))
+  
+  - unique（[#40581](https://github.com/PaddlePaddle/Paddle/pull/40851)）
+  
+  - unstack（[#40581](https://github.com/PaddlePaddle/Paddle/pull/40851)）
+  
+  - slice（[#40736](https://github.com/PaddlePaddle/Paddle/pull/40736)）
+  
+  - transpose2（[#39327](https://github.com/PaddlePaddle/Paddle/pull/39327)）
+  
+  - unsqueeze2（ [#40596](https://github.com/PaddlePaddle/Paddle/pull/40596)）
+  
+  - squeeze2（ [#40596](https://github.com/PaddlePaddle/Paddle/pull/40596)）
+  
+  - strided_slice ([#40708](https://github.com/PaddlePaddle/Paddle/pull/40708))
+  
+  - softmax ([#39547](https://github.com/PaddlePaddle/Paddle/pull/39547))
+  
+  - leaky_relu ([#40385](https://github.com/PaddlePaddle/Paddle/pull/40385))
+  
+  - gelu ([#40393](https://github.com/PaddlePaddle/Paddle/pull/40393))
+  
+  - prelu ([#40393](https://github.com/PaddlePaddle/Paddle/pull/40393))
+  
+  - log_softmax ([#40393](https://github.com/PaddlePaddle/Paddle/pull/40393))
+  
+  - elu ([#40565](https://github.com/PaddlePaddle/Paddle/pull/40565))
+  
+  - logsigmoid ([#40626](https://github.com/PaddlePaddle/Paddle/pull/40626))
+  
+  - psroi_pool ([#40353](https://github.com/PaddlePaddle/Paddle/pull/40353), [#41173](https://github.com/PaddlePaddle/Paddle/pull/41173))
+  
+  - kthvalue（[#40575](https://github.com/PaddlePaddle/Paddle/pull/40575)）
+  
+  - mode ([#40571](https://github.com/PaddlePaddle/Paddle/pull/40571))
+  
+  - yolo_box（[#40112](https://github.com/PaddlePaddle/Paddle/pull/40112)）
+  
+  - yolov3_loss ([#40944](https://github.com/PaddlePaddle/Paddle/pull/40944)）
+  
+  - temporal_shift（[#40727](https://github.com/PaddlePaddle/Paddle/pull/40727)）
+  
+  - depthwise_conv2d（[#39354](https://github.com/PaddlePaddle/Paddle/pull/39354)）
+  
+  - pad3d ([#40701](https://github.com/PaddlePaddle/Paddle/pull/40701))
+  
+  - pad（ [#40012](https://github.com/PaddlePaddle/Paddle/pull/40012)）
+  
+  - greater_equal（[#39970](https://github.com/PaddlePaddle/Paddle/pull/39970)）
+  
+  - kldiv_loss ([#39770](https://github.com/PaddlePaddle/Paddle/pull/39770))
+  
+  - isclose ([#39770](https://github.com/PaddlePaddle/Paddle/pull/39770))
+  
+  - silu ([#40565](https://github.com/PaddlePaddle/Paddle/pull/40565))
+  
+  - unfold ([#39778](https://github.com/PaddlePaddle/Paddle/pull/39778))
+  
+  - batch_norm（[39347](https://github.com/PaddlePaddle/Paddle/pull/39347)）
+  
+  - norm（[#39324](https://github.com/PaddlePaddle/Paddle/pull/39324)）
+  
+  - roi_pool ([#40574](https://github.com/PaddlePaddle/Paddle/pull/40574), [#40682](https://github.com/PaddlePaddle/Paddle/pull/40682), [#41173](https://github.com/PaddlePaddle/Paddle/pull/41173))
+  
+  - roi_align ([#40382](https://github.com/PaddlePaddle/Paddle/pull/40382), [#40556](https://github.com/PaddlePaddle/Paddle/pull/40556), [#41402](https://github.com/PaddlePaddle/Paddle/pull/41402))
+  
+  - deformable_conv ([#40700](https://github.com/PaddlePaddle/Paddle/pull/40700), [#40794](https://github.com/PaddlePaddle/Paddle/pull/40794), [#41644](https://github.com/PaddlePaddle/Paddle/pull/41644))
+  
+  - deformable_conv_v1 ([#40794](https://github.com/PaddlePaddle/Paddle/pull/40794), [#41644](https://github.com/PaddlePaddle/Paddle/pull/41644))
+  
+  - label_smooth ([#39796](https://github.com/PaddlePaddle/Paddle/pull/39796))
+  
+  - grid_sampler ([#40585](https://github.com/PaddlePaddle/Paddle/pull/40585))
+  
+  - greater_than（[#39970](https://github.com/PaddlePaddle/Paddle/pull/39970)）
+  
+  - pixel_shuffle ([#39949](https://github.com/PaddlePaddle/Paddle/pull/39949), [#39712](https://github.com/PaddlePaddle/Paddle/pull/39712))
+  
+  - nearest_interp_v2 ([#40855](https://github.com/PaddlePaddle/Paddle/pull/40855))
+  
+  - bilinear_interp_v2 ([#40855](https://github.com/PaddlePaddle/Paddle/pull/40855))
+  
+  - softmax_with_cross_entropy ([#40832](https://github.com/PaddlePaddle/Paddle/pull/40832))
+  
+  - rnn ([#41007](https://github.com/PaddlePaddle/Paddle/pull/41007))
+  
+  - reverse ([#40791](https://github.com/PaddlePaddle/Paddle/pull/40791))
+  
+  - trace ([#39510](https://github.com/PaddlePaddle/Paddle/pull/39510))
+  
+  - kron（[#40427](https://github.com/PaddlePaddle/Paddle/pull/40427)）
+  
+  - accuracy（[#39982](https://github.com/PaddlePaddle/Paddle/pull/39982)）
+  
+  - gather_tree ([#40082](https://github.com/PaddlePaddle/Paddle/pull/40082), [#39844](https://github.com/PaddlePaddle/Paddle/pull/39844))
+  
+  - dropout（[#40148](https://github.com/PaddlePaddle/Paddle/pull/40148)）
+  
+  - bincount ([#39947](https://github.com/PaddlePaddle/Paddle/pull/39947))
+  
+  - warpctc ([#41389](https://github.com/PaddlePaddle/Paddle/pull/41389), [#40023](https://github.com/PaddlePaddle/Paddle/pull/https://github.com/PaddlePaddle/Paddle/pull/40023))
+  
+  - multiplex（[#40007](https://github.com/PaddlePaddle/Paddle/pull/40007), [#40102](https://github.com/PaddlePaddle/Paddle/pull/40102)）
+  
+  - qr（[#40007](https://github.com/PaddlePaddle/Paddle/pull/40007), [#40007](https://github.com/PaddlePaddle/Paddle/pull/40007)）
+  
+  - assign_value ([#40967](https://github.com/PaddlePaddle/Paddle/pull/40967))
+  
+  - assign ([#40022](https://github.com/PaddlePaddle/Paddle/pull/40022))
+  
+  - cast ([#37610](https://github.com/PaddlePaddle/Paddle/pull/37610))
+  
+  - tril_triu（[#40007](https://github.com/PaddlePaddle/Paddle/pull/40007), [#41053](https://github.com/PaddlePaddle/Paddle/pull/41053)）
+  
+  - where_index ([#40255](https://github.com/PaddlePaddle/Paddle/pull/40255))
+  
+  - index_select ([#40260](https://github.com/PaddlePaddle/Paddle/pull/40260), [#41053](https://github.com/PaddlePaddle/Paddle/pull/41053))
+  
+  - roll ([#40257](https://github.com/PaddlePaddle/Paddle/pull/40257), [#41053](https://github.com/PaddlePaddle/Paddle/pull/41053))
+  
+  - cumprod (熊昆 [#39770](https://github.com/PaddlePaddle/Paddle/pull/39770))
+  
+  - shard_index ([#40254](https://github.com/PaddlePaddle/Paddle/pull/40254))
+  
+  - reshape2 ([#40914](https://github.com/PaddlePaddle/Paddle/pull/40914), [#39631](https://github.com/PaddlePaddle/Paddle/pull/39631), [#38833](https://github.com/PaddlePaddle/Paddle/pull/38833), [#37164](https://github.com/PaddlePaddle/Paddle/pull/37164))
+  
+  - flip ([#39822](https://github.com/PaddlePaddle/Paddle/pull/39822), [#40974](https://github.com/PaddlePaddle/Paddle/pull/40974))
+  
+  - eye ([#39712](https://github.com/PaddlePaddle/Paddle/pull/39712), [#40105](https://github.com/PaddlePaddle/Paddle/pull/40105), [#41476](https://github.com/PaddlePaddle/Paddle/pull/41476))
+  
+  - lookup_table_v2（[#39901](https://github.com/PaddlePaddle/Paddle/pull/39901)）
+  
+  - searchsorted（[#40520](https://github.com/PaddlePaddle/Paddle/pull/40520), [#41053](https://github.com/PaddlePaddle/Paddle/pull/41053)）
+  
+  - adamw ([#40351](https://github.com/PaddlePaddle/Paddle/pull/40351))
+  
+  - tanh ([#40385](https://github.com/PaddlePaddle/Paddle/pull/40385))
+  
+  - cross ([#39829](https://github.com/PaddlePaddle/Paddle/pull/39829))
+  
+  - concat ([#38955](https://github.com/PaddlePaddle/Paddle/pull/38955), [#41112](https://github.com/PaddlePaddle/Paddle/pull/41112))
+  
+  - split ([#39060](https://github.com/PaddlePaddle/Paddle/pull/39060))
+  
+  - linspace ([#40124](https://github.com/PaddlePaddle/Paddle/pull/40124))
+  
+  - huber_loss ([#39761](https://github.com/PaddlePaddle/Paddle/pull/39761))
+  
+  - hierarchical_sigmoid（[#40553](https://github.com/PaddlePaddle/Paddle/pull/40553)）
+  
+  - nll_loss ([#39936](https://github.com/PaddlePaddle/Paddle/pull/https://github.com/PaddlePaddle/Paddle/pull/39936))
+  
+  - graph_send_recv ([#40092](https://github.com/PaddlePaddle/Paddle/pull/40092), [#40320](https://github.com/PaddlePaddle/Paddle/pull/40320))
+  
+  - abs（[#39492](https://github.com/PaddlePaddle/Paddle/pull/39492), [#39762](https://github.com/PaddlePaddle/Paddle/pull/39762)）
+  
+  - exp（[#40727](https://github.com/PaddlePaddle/Paddle/pull/40727)）
+  
+  - rsqrt（[#40727](https://github.com/PaddlePaddle/Paddle/pull/40727)）
+  
+  - viterbi_decode ([#40186](https://github.com/PaddlePaddle/Paddle/pull/40186))
+  
+  - conj ([#38247](https://github.com/PaddlePaddle/Paddle/pull/38247))
+  
+  - real ([#39777](https://github.com/PaddlePaddle/Paddle/pull/39777), [#41173](https://github.com/PaddlePaddle/Paddle/pull/41173))
+  
+  - imag ([#39777](https://github.com/PaddlePaddle/Paddle/pull/39777), [#41173](https://github.com/PaddlePaddle/Paddle/pull/41173))
+  
+  - take_along_axis ([#39959](https://github.com/PaddlePaddle/Paddle/pull/39959), [#40270](https://github.com/PaddlePaddle/Paddle/pull/40270), [#40974](https://github.com/PaddlePaddle/Paddle/pull/40974))
+  
+  - put_along_axis ([#39959](https://github.com/PaddlePaddle/Paddle/pull/39959), [#40974](https://github.com/PaddlePaddle/Paddle/pull/40974))
+  
+  - lgamma ([#39770](https://github.com/PaddlePaddle/Paddle/pull/39770))
+  
+  - relu ([#40175](https://github.com/PaddlePaddle/Paddle/pull/40175))
+  
+  - maxout ([#39959](https://github.com/PaddlePaddle/Paddle/pull/39959), [#40974](https://github.com/PaddlePaddle/Paddle/pull/40974))
+  
+  - log ([#40785](https://github.com/PaddlePaddle/Paddle/pull/40785))
+  
+  - bilinear_tensor_product（[#39903](https://github.com/PaddlePaddle/Paddle/pull/39903)）
+  
+  - flatten_contiguous_range ([#38712](https://github.com/PaddlePaddle/Paddle/pull/38712), [#36957](https://github.com/PaddlePaddle/Paddle/pull/36957), [#41345](https://github.com/PaddlePaddle/Paddle/pull/41345))
+  
+  - matrix_rank ([#40074](https://github.com/PaddlePaddle/Paddle/pull/40074), [#40519](https://github.com/PaddlePaddle/Paddle/pull/40519), [#41466](https://github.com/PaddlePaddle/Paddle/pull/41466))
+  
+  - logit ([#37844](https://github.com/PaddlePaddle/Paddle/pull/37844))
+  
+  - lerp ([#40105](https://github.com/PaddlePaddle/Paddle/pull/40105), [#39524](https://github.com/PaddlePaddle/Paddle/pull/39524))
+  
+  - erfinv ([#39949](https://github.com/PaddlePaddle/Paddle/pull/39949), [#39712](https://github.com/PaddlePaddle/Paddle/pull/39712))
+  
+  - broadcast_tensors（[#40047](https://github.com/PaddlePaddle/Paddle/pull/40047)）
+  
+  - gumbel_softmax（[#39873](https://github.com/PaddlePaddle/Paddle/pull/39873)）
+  
+  - diagonal （[#39575](https://github.com/PaddlePaddle/Paddle/pull/39575)）
+  
+  - trunc ([#39543](https://github.com/PaddlePaddle/Paddle/pull/39543), [#39772](https://github.com/PaddlePaddle/Paddle/pull/39772))
+  
+  - multi_dot ([#40038](https://github.com/PaddlePaddle/Paddle/pull/40038))
+  
+  - matrix_power ([#40231](https://github.com/PaddlePaddle/Paddle/pull/40231))
+  
+  - digamma（[#39240](https://github.com/PaddlePaddle/Paddle/pull/39240)）
+  
+  - masked_select（[#39193](https://github.com/PaddlePaddle/Paddle/pull/39193)）
+  
+  - determinant ([#40539](https://github.com/PaddlePaddle/Paddle/pull/40539))
+  
+  - eigh ([#40213](https://github.com/PaddlePaddle/Paddle/pull/40213))
+  
+  - size ([#39949](https://github.com/PaddlePaddle/Paddle/pull/39949), [#39712](https://github.com/PaddlePaddle/Paddle/pull/39712))
+  
+  - shape ([#40248](https://github.com/PaddlePaddle/Paddle/pull/40248))
+  
+  - reduce_sum（[#37559](https://github.com/PaddlePaddle/Paddle/pull/37559), [#41295](https://github.com/PaddlePaddle/Paddle/pull/41295)）
+  
+  - reduce_prod ([#39844](https://github.com/PaddlePaddle/Paddle/pull/39844))
+  
+  - histogram（[#39496](https://github.com/PaddlePaddle/Paddle/pull/39496)）
+  
+  - meshgrid ([#41411](https://github.com/PaddlePaddle/Paddle/pull/41411))
+  
+  - brelu ([#40385](https://github.com/PaddlePaddle/Paddle/pull/40385))
+  
+  - hard_swish ([#40913](https://github.com/PaddlePaddle/Paddle/pull/40913))
+  
+  - hard_shrink ([#40565](https://github.com/PaddlePaddle/Paddle/pull/40565))
+  
+  - selu (熊昆 [#39819](https://github.com/PaddlePaddle/Paddle/pull/39819))
+  
+  - expand_v2 ([#39471](https://github.com/PaddlePaddle/Paddle/pull/39471))
+  
+  - top_k_v2（[#40064](https://github.com/PaddlePaddle/Paddle/pull/40064)）
+  
+  - expand_as_v2（[#40373](https://github.com/PaddlePaddle/Paddle/pull/40373)）
+  
+  - swish ([#40913](https://github.com/PaddlePaddle/Paddle/pull/40913))
+  
+  - hard_sigmoid ([#40626](https://github.com/PaddlePaddle/Paddle/pull/40626))
+
+#### 新动态图执行机制
+
+针对飞桨原动态图执行机制的调度性能、二次开发能力差的问题，我们重构了动态图的底层执行机制。通过全新的调用执行方式，配合 Phi 算子库进行高效的运行时执行，对于 Phi 算子库支持的算子，切换到新动态图模式能体验到调度性能有较大幅度的提升。但是由于整体框架执行机制升级的工作量巨大，且该部分工作耦合了大量 Phi 算子库的工作， 因此在这个版本下我们仍未默认使用该执行方式。如果想要试用可以通过设置环境变量 `FLAGS_enable_eager_mode=1` 来切换使用。具体包括如下内容：
+
+- **新动态图执行机制基础架构、核心组件与机制实现**：静态化动态图相关执行代码，将原本的同质化的算子构建变成针对不同 Phi API 的特异化调用从而极大的优化了调度开销。([#36059](https://github.com/PaddlePaddle/Paddle/pull/36059), [#37323](https://github.com/PaddlePaddle/Paddle/pull/37323), [#37556](https://github.com/PaddlePaddle/Paddle/pull/37556), [#37555](https://github.com/PaddlePaddle/Paddle/pull/37555), [#37478](https://github.com/PaddlePaddle/Paddle/pull/37478), [#37458](https://github.com/PaddlePaddle/Paddle/pull/37458), [#37479](https://github.com/PaddlePaddle/Paddle/pull/37479), [#37599](https://github.com/PaddlePaddle/Paddle/pull/37599), [#37659](https://github.com/PaddlePaddle/Paddle/pull/37659), [#37654](https://github.com/PaddlePaddle/Paddle/pull/37654), [#39200](https://github.com/PaddlePaddle/Paddle/pull/39200), [#39309](https://github.com/PaddlePaddle/Paddle/pull/39309), [#39319](https://github.com/PaddlePaddle/Paddle/pull/39319), [#39414](https://github.com/PaddlePaddle/Paddle/pull/39414), [#39504](https://github.com/PaddlePaddle/Paddle/pull/39504), [#39526](https://github.com/PaddlePaddle/Paddle/pull/39526), [#39878](https://github.com/PaddlePaddle/Paddle/pull/39878), [#39963](https://github.com/PaddlePaddle/Paddle/pull/39963))
+
+- **新动态图执行机制子功能开发、适配**：支持了更加灵活，更加完备的动态图子功能例如 hook，pylayer，double_grad, inplace，amp 等等。([#41396](https://github.com/PaddlePaddle/Paddle/pull/41396), [#40400](https://github.com/PaddlePaddle/Paddle/pull/40400), [#40695](https://github.com/PaddlePaddle/Paddle/pull/40695), [#41043](https://github.com/PaddlePaddle/Paddle/pull/41043), [#40915](https://github.com/PaddlePaddle/Paddle/pull/40915), [#41104](https://github.com/PaddlePaddle/Paddle/pull/41104), [#41350](https://github.com/PaddlePaddle/Paddle/pull/41350), [#41209](https://github.com/PaddlePaddle/Paddle/pull/41209), [#40830](https://github.com/PaddlePaddle/Paddle/pull/40830), [#40891](https://github.com/PaddlePaddle/Paddle/pull/40891), [#36814](https://github.com/PaddlePaddle/Paddle/pull/36814), [#37377](https://github.com/PaddlePaddle/Paddle/pull/37377), [#37193](https://github.com/PaddlePaddle/Paddle/pull/37193), [#36965](https://github.com/PaddlePaddle/Paddle/pull/36965), [#37810](https://github.com/PaddlePaddle/Paddle/pull/37810), [#36837](https://github.com/PaddlePaddle/Paddle/pull/36837), [#38488](https://github.com/PaddlePaddle/Paddle/pull/38488), [#39282](https://github.com/PaddlePaddle/Paddle/pull/39282), [#39449](https://github.com/PaddlePaddle/Paddle/pull/39449), [#39531](https://github.com/PaddlePaddle/Paddle/pull/39531), [#39638](https://github.com/PaddlePaddle/Paddle/pull/39638), [#39674](https://github.com/PaddlePaddle/Paddle/pull/39674), [#39893](https://github.com/PaddlePaddle/Paddle/pull/39893), [#40170](https://github.com/PaddlePaddle/Paddle/pull/40170), [#40693](https://github.com/PaddlePaddle/Paddle/pull/40693), [#40937](https://github.com/PaddlePaddle/Paddle/pull/40937), [#41016](https://github.com/PaddlePaddle/Paddle/pull/41016), [#41051](https://github.com/PaddlePaddle/Paddle/pull/41051), [#41121](https://github.com/PaddlePaddle/Paddle/pull/41121), [#41198](https://github.com/PaddlePaddle/Paddle/pull/41198), [#41287](https://github.com/PaddlePaddle/Paddle/pull/41287), [#41380](https://github.com/PaddlePaddle/Paddle/pull/41380), [#41306](https://github.com/PaddlePaddle/Paddle/pull/41306), [#41387](https://github.com/PaddlePaddle/Paddle/pull/41387), [#40623](https://github.com/PaddlePaddle/Paddle/pull/40623), [#40945](https://github.com/PaddlePaddle/Paddle/pull/40945), [#39282](https://github.com/PaddlePaddle/Paddle/pull/39282), [#39449](https://github.com/PaddlePaddle/Paddle/pull/39449), [#38488](https://github.com/PaddlePaddle/Paddle/pull/38488))
+
+- **新动态图执行的自动代码生成机制**：当我们为了将大量的同质化算子的计算和调度逻辑分化成不同的特异化的调度逻辑时，我们发现这是一个非常庞大的工作，因此我们引入了全新的自动代码生成逻辑来生成代码从而简化动态图的运行时逻辑。同时，为了能够适配之前框架中的各类运行时逻辑，我们也利用了一些复杂的编译手段来运行时的获取信息从而生成更加准确的调度代码。([#37574](https://github.com/PaddlePaddle/Paddle/pull/37574), [#37575](https://github.com/PaddlePaddle/Paddle/pull/37575), [#37639](https://github.com/PaddlePaddle/Paddle/pull/37639), [#37723](https://github.com/PaddlePaddle/Paddle/pull/37723), [#37753](https://github.com/PaddlePaddle/Paddle/pull/37753), [#37812](https://github.com/PaddlePaddle/Paddle/pull/37812), [#37837](https://github.com/PaddlePaddle/Paddle/pull/37837), [#37910](https://github.com/PaddlePaddle/Paddle/pull/37910), [#37943](https://github.com/PaddlePaddle/Paddle/pull/37943), [#37992](https://github.com/PaddlePaddle/Paddle/pull/37992), [#37959](https://github.com/PaddlePaddle/Paddle/pull/37959), [#38017](https://github.com/PaddlePaddle/Paddle/pull/38017), [#37969](https://github.com/PaddlePaddle/Paddle/pull/37969), [#38160](https://github.com/PaddlePaddle/Paddle/pull/38160), [#38085](https://github.com/PaddlePaddle/Paddle/pull/38085), [#38562](https://github.com/PaddlePaddle/Paddle/pull/38562), [#38573](https://github.com/PaddlePaddle/Paddle/pull/38573), [#39192](https://github.com/PaddlePaddle/Paddle/pull/39192), [#39215](https://github.com/PaddlePaddle/Paddle/pull/39215), [#39355](https://github.com/PaddlePaddle/Paddle/pull/39355), [#39358](https://github.com/PaddlePaddle/Paddle/pull/39358), [#39328](https://github.com/PaddlePaddle/Paddle/pull/39328), [#39233](https://github.com/PaddlePaddle/Paddle/pull/39233), [#39628](https://github.com/PaddlePaddle/Paddle/pull/39628), [#39767](https://github.com/PaddlePaddle/Paddle/pull/39767), [#39743](https://github.com/PaddlePaddle/Paddle/pull/39743), [#39897](https://github.com/PaddlePaddle/Paddle/pull/39897), [#39797](https://github.com/PaddlePaddle/Paddle/pull/39797), [#39997](https://github.com/PaddlePaddle/Paddle/pull/39997), [#40058](https://github.com/PaddlePaddle/Paddle/pull/40058), [#40080](https://github.com/PaddlePaddle/Paddle/pull/40080), [#40107](https://github.com/PaddlePaddle/Paddle/pull/40107), [#39962](https://github.com/PaddlePaddle/Paddle/pull/39962), [#40132](https://github.com/PaddlePaddle/Paddle/pull/40132), [#40276](https://github.com/PaddlePaddle/Paddle/pull/40276), [#40266](https://github.com/PaddlePaddle/Paddle/pull/40266), [#40480](https://github.com/PaddlePaddle/Paddle/pull/40480), [#40482](https://github.com/PaddlePaddle/Paddle/pull/40482), [#40368](https://github.com/PaddlePaddle/Paddle/pull/40368), [#40650](https://github.com/PaddlePaddle/Paddle/pull/40650), [#40815](https://github.com/PaddlePaddle/Paddle/pull/40815), [#40907](https://github.com/PaddlePaddle/Paddle/pull/40907), [#40935](https://github.com/PaddlePaddle/Paddle/pull/40935), [#41089](https://github.com/PaddlePaddle/Paddle/pull/41089))
+
+- **新动态图执行机制接入主框架，联合调试**：我们目前利用一些环境变量区分静态图模式和动态图模式（含新动态图和老动态图模式），这些模式下我们已经适配了大部分的动态图的逻辑，但是仍有大量问题正在修复中。([#37638](https://github.com/PaddlePaddle/Paddle/pull/37638), [#37643](https://github.com/PaddlePaddle/Paddle/pull/37643), [#37653](https://github.com/PaddlePaddle/Paddle/pull/37653), [#38314](https://github.com/PaddlePaddle/Paddle/pull/38314), [#38337](https://github.com/PaddlePaddle/Paddle/pull/38337), [#38338](https://github.com/PaddlePaddle/Paddle/pull/38338), [#39164](https://github.com/PaddlePaddle/Paddle/pull/39164), [#39326](https://github.com/PaddlePaddle/Paddle/pull/39326), [#40391](https://github.com/PaddlePaddle/Paddle/pull/40391), [#40201](https://github.com/PaddlePaddle/Paddle/pull/40201), [#40854](https://github.com/PaddlePaddle/Paddle/pull/40854), [#40887](https://github.com/PaddlePaddle/Paddle/pull/40887))
+
+- **更新了动态图下的一些判断逻辑，支持兼容形态下的动态图快速执行路径**：（[#40786](https://github.com/PaddlePaddle/Paddle/pull/40786)）
+  
+  - 非静态图模式（目前的过渡方案）：`_non_static_mode()`。
+  
+  - 在动态图模式下且判断在新动态图（推荐的判断逻辑）：`_in_dygrah_mode()`。
+  
+  - 在动态图模式下且判断在老动态图（不推荐的判断逻辑，在将来的版本中将废弃）：`_in_legacy_dygraph()`。
+  
+  - 在动态图模式下开启老动态图并关闭新动态图：`_enable_legacy_dygraph()` 或者退出 `_test_eager_guard()`。
+  
+  - 在动态图模式下开启新动态图并关闭老动态图：`_disable_legacy_dygraph()` 或者 `with _test_eager_guard()`。
+  
+  - 在静态图或者动态图模式下判断在新动态图：`_in_eager_without_dygraph_check()`。
+
+- **动态图重构后支持 inplace 策略**：输入与输出为同一个 Tensor。
+  
+  - - 为动态图重构中间态适配 inplace 策略。([#40400](https://github.com/PaddlePaddle/Paddle/pull/40400))
+    
+    - 为动态图重构最终态适配 inplace 策略。([#40695](https://github.com/PaddlePaddle/Paddle/pull/40695))
+    
+    - 动态图重构后，为 PyLayer 功能添加 inplace 策略。([#41043](https://github.com/PaddlePaddle/Paddle/pull/41043))
+    
+    - 动态图重构后，为 Tensor 的 setitem 功能添加 inplace 策略。([#40915](https://github.com/PaddlePaddle/Paddle/pull/40915))
+    
+    - 动态图重构后添加`_reset_grad_inplace_version`接口，将 Tensor 的梯度的 inplace version 置为0。([#41101](https://github.com/PaddlePaddle/Paddle/pull/41101))
+    
+    - 反向计算过程中如果不需要前向 Tensor 的值（no need buffer 属性），则不需要对该 Tensor 进行 inplace version 的检测操作。 为 no_need_buffer 的 Tensor 跳过 inplace version 的检查。([#41350](https://github.com/PaddlePaddle/Paddle/pull/41350))
+    
+    - 统一动态图重构后与重构前对 inplace version 检查的报错信息。([#41209](https://github.com/PaddlePaddle/Paddle/pull/41209))
+
+- **动态图重构后支持 view 策略**：输入与输出 Tensor 共享底层数据。
+  
+  - - 为动态图重构中间态适配 view 机制。包括`reshape`、`squeeze`、`unsqueeze`、`flatten` API。([#40830](https://github.com/PaddlePaddle/Paddle/pull/40830))
+    
+    - 为动态图重构最终态适配 view 机制。包括`reshape` API。([#40891](https://github.com/PaddlePaddle/Paddle/pull/40891))
+
+#### 全新静态图执行器
+
+为了解决飞桨原静态图执行器在部分场景下调度性能不够理想，不便于扩展多 stream 等问题，我们实现了全新的性能优越，易于扩展的静态图执行器，充分利用了多 stream、多线程的异步调度能力。新执行器相当于原执行器是兼容升级，目前已在单机单卡场景下默认使用，用户不需要在训练代码中做任何修改即可自动使用。当然，我们也提供了接口来切换回原执行器，用户可以通过设置环境变量 `FLAGS_USE_STANDALONE_EXECUTOR=false` 来切换回原执行器。([#41179](https://github.com/PaddlePaddle/Paddle/pull/41179)) 主要内容如下：
+
+- 基础组件：用于执行器中多线程算子调度的高性能线程池 ([#35470](https://github.com/PaddlePaddle/Paddle/pull/35470), [#35930](https://github.com/PaddlePaddle/Paddle/pull/35930), [#36030](https://github.com/PaddlePaddle/Paddle/pull/36030), [#36480](https://github.com/PaddlePaddle/Paddle/pull/36480), [#36688](https://github.com/PaddlePaddle/Paddle/pull/36688), [#36740](https://github.com/PaddlePaddle/Paddle/pull/36740), [#38335](https://github.com/PaddlePaddle/Paddle/pull/38335), [#40770](https://github.com/PaddlePaddle/Paddle/pull/40770)) 及线程协同组件 ([#38779](https://github.com/PaddlePaddle/Paddle/pull/38779), [#40876](https://github.com/PaddlePaddle/Paddle/pull/40876), [#40912](https://github.com/PaddlePaddle/Paddle/pull/40912))，算子执行后及时地显存回收 ([#37642](https://github.com/PaddlePaddle/Paddle/pull/37642), [#39617](https://github.com/PaddlePaddle/Paddle/pull/39617), [#40859](https://github.com/PaddlePaddle/Paddle/pull/40859))，并行执行器新依赖分析算法 ([#37231](https://github.com/PaddlePaddle/Paddle/pull/37231)) 等。
+
+- 调度逻辑：优化执行器中算子的调度方法，支持多 stream 的多线程异步调度机制，将数据类型、设备、布局等转换改为算子调度以提升性能，支持缓存算子 Kernel 选择，支持选择全新 Phi 算子等。（[#35024](https://github.com/PaddlePaddle/Paddle/pull/35024), [#34922](https://github.com/PaddlePaddle/Paddle/pull/34922), [#35711](https://github.com/PaddlePaddle/Paddle/pull/35711), [#35928](https://github.com/PaddlePaddle/Paddle/pull/35928), [#39458](https://github.com/PaddlePaddle/Paddle/pull/39458)，[#36899](https://github.com/PaddlePaddle/Paddle/pull/36899)）。
+
+- 接口兼容：兼容原执行器的用户接口和功能，如对齐 python 端 Executor.run()、支持 Scope 中管理 Tensor 等，确保用户可以无感知地切换新执行器。 ([#37278](https://github.com/PaddlePaddle/Paddle/pull/37278), [#37379](https://github.com/PaddlePaddle/Paddle/pull/37379), [#37445](https://github.com/PaddlePaddle/Paddle/pull/37445), [#37510](https://github.com/PaddlePaddle/Paddle/pull/37510), [#40955](https://github.com/PaddlePaddle/Paddle/pull/40955), [#41778](https://github.com/PaddlePaddle/Paddle/pull/41178), [#41058](https://github.com/PaddlePaddle/Paddle/pull/41058), [#38584](https://github.com/PaddlePaddle/Paddle/pull/38584), [#37957](https://github.com/PaddlePaddle/Paddle/pull/37957), [#37672](https://github.com/PaddlePaddle/Paddle/pull/37672), [#37474](https://github.com/PaddlePaddle/Paddle/pull/37474), [#37085](https://github.com/PaddlePaddle/Paddle/pull/37085), [#37061](https://github.com/PaddlePaddle/Paddle/pull/37061), [#36945](https://github.com/PaddlePaddle/Paddle/pull/36945))
+
+- 增强多线程场景下调试和报错功能，将子线程的报错捕获到主线程中统一抛出，以提升用户体验。([#36692](https://github.com/PaddlePaddle/Paddle/pull/36692)，[#36802](https://github.com/PaddlePaddle/Paddle/pull/36802))
 
 #### 分布式训练
-- 分布式训练基础功能
-    - 新增 `paddle.DataParallel.no_sync`，实现动态图数据并行下暂停多卡通信和梯度同步。([#34740](https://github.com/PaddlePaddle/Paddle/pull/34740)) 
-    - 新增 `paddle.distributed.launch` 启动方式对容错的支持，实现 `collective` 模式下的节点容错功能。 ([#33369](https://github.com/PaddlePaddle/Paddle/pull/33369),  [#34572](https://github.com/PaddlePaddle/Paddle/pull/34572))
-	- 分布式训练API `paddle.static.Executor.train_from_dataset, paddle.static.Executor.infer_from_dataset` 新增dump功能训练过程中模型的参数和中间变量的功能。[#34457](https://github.com/PaddlePaddle/Paddle/pull/34457) 
-	- 混合并行支持模型并行与数据并行的组合。([#34377](https://github.com/PaddlePaddle/Paddle/pull/34377))
-	- 新增分布式策略`gradient scale`选项，用户可以指定`gradient scale`的方式：`avg`、`sum`或者自定义。([#33862](https://github.com/PaddlePaddle/Paddle/pull/33862))
-	- 新增 `paddle.distributed.parallel_with_gloo`，支持 CPU barrier 操作。([#34671](https://github.com/PaddlePaddle/Paddle/pull/34671))
-	- GPU 参数服务器新增训练 profiler 功能。([#32640](https://github.com/PaddlePaddle/Paddle/pull/32640))
-	- GPU 参数服务器新增流水线功能，训练性能提升可40%。[#33159](https://github.com/PaddlePaddle/Paddle/pull/33159)  
-	- 静态图混合并行添加 `dp_as_optimizer_sharding` 实验性功能，可将数据并行作为优化器参数分片并行，节约优化器状态显存占用。([#35593](https://github.com/PaddlePaddle/Paddle/pull/35593))
-	- 静态图流水线并行执行器支持 `LRScheduler`。([#34402](https://github.com/PaddlePaddle/Paddle/pull/34402))
-	- 新增`paddle.fluid.core.GraphPyClient.set_node_feat`,支持用户在图引擎客户端设置图节点特征,支持多种类型特征存储。([#34994](https://github.com/PaddlePaddle/Paddle/pull/34994))
-	- 提升图引擎图节点邻居采样算法的性能，优化图游走算法的执行。([#34088](https://github.com/PaddlePaddle/Paddle/pull/34088))
-	- 模型并行接口`paddle.distributed.fleet.meta_parallel.ColumnParallelLinear`、`paddle.distributed.fleet.meta_parallel.RowParallelLinear`、`paddle.distributed.fleet.meta_parallel.VocabParallelEmbedding`、`paddle.distributed.fleet.meta_parallel.ParallelCrossEntropy`实现动静统一。([#33700](https://github.com/PaddlePaddle/Paddle/pull/33700),  [#33411](https://github.com/PaddlePaddle/Paddle/pull/33411))
-	- 新增分布式模型并行cpu `c_embedding` op。([#35467](https://github.com/PaddlePaddle/Paddle/pull/35467))
-	- 已修改为新增分布式通信初始化阶段gen_comm_id时得到 gethostbyname 的重试机制。([#34855](https://github.com/PaddlePaddle/Paddle/pull/34855))
-	- 新增 `fleet` 梯度更新时的开关配置 `scale_sparse_gradient_with_batch_size`，决定梯度是否乘以 `batch_size`。  ([#34893](https://github.com/PaddlePaddle/Paddle/pull/34893))
+
+- 集合通信多机多卡训练基础功能
+  
+  - 新增弹性功能（含节点故障、扩容、缩容），提升分布式的容错能力。 ([#36684](https://github.com/PaddlePaddle/Paddle/pull/36684), [#37177](https://github.com/PaddlePaddle/Paddle/pull/37177), [#37781](https://github.com/PaddlePaddle/Paddle/pull/37781)) 
+  
+  - Launch启动模块，重构并新增 `master` 协同和节点个数 `nnodes` 定义 ，提升分布式启动易用性。 ([#40086](https://github.com/PaddlePaddle/Paddle/pull/40086), [#40568](https://github.com/PaddlePaddle/Paddle/pull/40568), [#40782](https://github.com/PaddlePaddle/Paddle/pull/40782), [#40844](https://github.com/PaddlePaddle/Paddle/pull/40844), [#40936](https://github.com/PaddlePaddle/Paddle/pull/40936), [#41190](https://github.com/PaddlePaddle/Paddle/pull/41190), [#41314](https://github.com/PaddlePaddle/Paddle/pull/41314)) 
+  
+  - 新增对 GPU/NPU/XPU 多种硬件的异构训练的支持。([#37613](https://github.com/PaddlePaddle/Paddle/pull/37613), [#37998](https://github.com/PaddlePaddle/Paddle/pull/37998)) 
+  
+  - 新增 fleet_executor 异步流水执行器。([#36966](https://github.com/PaddlePaddle/Paddle/pull/36966), [#37049](https://github.com/PaddlePaddle/Paddle/pull/37049), [#37087](https://github.com/PaddlePaddle/Paddle/pull/37087), [#37126](https://github.com/PaddlePaddle/Paddle/pull/37126), [#37150](https://github.com/PaddlePaddle/Paddle/pull/37150), [#37203](https://github.com/PaddlePaddle/Paddle/pull/37203), [#37167](https://github.com/PaddlePaddle/Paddle/pull/37167), [#37282](https://github.com/PaddlePaddle/Paddle/pull/37282), [#37319](https://github.com/PaddlePaddle/Paddle/pull/37319), [#37462](https://github.com/PaddlePaddle/Paddle/pull/37462), [#37507](https://github.com/PaddlePaddle/Paddle/pull/37507), [#37533](https://github.com/PaddlePaddle/Paddle/pull/37533), [#37576](https://github.com/PaddlePaddle/Paddle/pull/37576), [#37605](https://github.com/PaddlePaddle/Paddle/pull/37605), [#37691](https://github.com/PaddlePaddle/Paddle/pull/37691), [#37742](https://github.com/PaddlePaddle/Paddle/pull/37742), [#37783](https://github.com/PaddlePaddle/Paddle/pull/37783), [#37809](https://github.com/PaddlePaddle/Paddle/pull/37809), [#37862](https://github.com/PaddlePaddle/Paddle/pull/37862), [#37882](https://github.com/PaddlePaddle/Paddle/pull/37882), [#37934](https://github.com/PaddlePaddle/Paddle/pull/37934), [#38024](https://github.com/PaddlePaddle/Paddle/pull/38024), [#38083](https://github.com/PaddlePaddle/Paddle/pull/38083), [#38164](https://github.com/PaddlePaddle/Paddle/pull/38164), [#38261](https://github.com/PaddlePaddle/Paddle/pull/38261), [#38290](https://github.com/PaddlePaddle/Paddle/pull/38290), [#40607](https://github.com/PaddlePaddle/Paddle/pull/40607), [#37093](https://github.com/PaddlePaddle/Paddle/pull/37093), [#37106](https://github.com/PaddlePaddle/Paddle/pull/37106), [#37143](https://github.com/PaddlePaddle/Paddle/pull/37143), [#37338](https://github.com/PaddlePaddle/Paddle/pull/37338), [#37376](https://github.com/PaddlePaddle/Paddle/pull/37376), [#37485](https://github.com/PaddlePaddle/Paddle/pull/37485), [#37531](https://github.com/PaddlePaddle/Paddle/pull/37531), [#37623](https://github.com/PaddlePaddle/Paddle/pull/37623), [#37693](https://github.com/PaddlePaddle/Paddle/pull/37693), [#37755](https://github.com/PaddlePaddle/Paddle/pull/37755), [#37807](https://github.com/PaddlePaddle/Paddle/pull/37807), [#37889](https://github.com/PaddlePaddle/Paddle/pull/37889), [#38420](https://github.com/PaddlePaddle/Paddle/pull/38420), [#38539](https://github.com/PaddlePaddle/Paddle/pull/38539), [#36892](https://github.com/PaddlePaddle/Paddle/pull/36892), [#37084](https://github.com/PaddlePaddle/Paddle/pull/37084), [#37158](https://github.com/PaddlePaddle/Paddle/pull/37158), [#37361](https://github.com/PaddlePaddle/Paddle/pull/37361), [#37509](https://github.com/PaddlePaddle/Paddle/pull/37509), [#37603](https://github.com/PaddlePaddle/Paddle/pull/37603), [#37703](https://github.com/PaddlePaddle/Paddle/pull/37703), [#37824](https://github.com/PaddlePaddle/Paddle/pull/37824), [#38114](https://github.com/PaddlePaddle/Paddle/pull/38114), [#38322](https://github.com/PaddlePaddle/Paddle/pull/38322), [#38535](https://github.com/PaddlePaddle/Paddle/pull/38535), [#38650](https://github.com/PaddlePaddle/Paddle/pull/38650), [#38709](https://github.com/PaddlePaddle/Paddle/pull/38709), [#38799](https://github.com/PaddlePaddle/Paddle/pull/38799), [#38839](https://github.com/PaddlePaddle/Paddle/pull/38839), [#38904](https://github.com/PaddlePaddle/Paddle/pull/38904))
+  
+  - 新增分布式大模型推理功能。([#38795](https://github.com/PaddlePaddle/Paddle/pull/38795), [#39012](https://github.com/PaddlePaddle/Paddle/pull/39012), [#39032](https://github.com/PaddlePaddle/Paddle/pull/39032), [#39076](https://github.com/PaddlePaddle/Paddle/pull/39076), [#39194](https://github.com/PaddlePaddle/Paddle/pull/39194), [#39207](https://github.com/PaddlePaddle/Paddle/pull/39207), [#39241](https://github.com/PaddlePaddle/Paddle/pull/39241), [#39603](https://github.com/PaddlePaddle/Paddle/pull/39603), [#39758](https://github.com/PaddlePaddle/Paddle/pull/39758), [#39992](https://github.com/PaddlePaddle/Paddle/pull/39992)) 
 
 - 动态图混合并行
-    - 在动态图分布式数据并行场景下，新增 `paddle.distributed.fleet.dygraph_optimizer.DygraphShardingOptimizer` 接口，通过在不同卡间切分优化器状态优化显存占用，支持更大的模型或batch size。 ([#33633](https://github.com/PaddlePaddle/Paddle/pull/33633))
-    - 动态图 Sharding 支持 MP-PP-DP， 实现动态图 4D 混合并行。([#35580](https://github.com/PaddlePaddle/Paddle/pull/35580))
-    - 动态图 Recompute 支持混合精度计算。([#33251](https://github.com/PaddlePaddle/Paddle/pull/33251))
-    - 流水线并行支持 1f1b 调度策略，用于节约运行期显存。([#34483](https://github.com/PaddlePaddle/Paddle/pull/34483))
-    - 动态图3D混合并行支持 recompute 策略，支持offload功能。 ([#34607](https://github.com/PaddlePaddle/Paddle/pull/34607) [#35588](https://github.com/PaddlePaddle/Paddle/pull/35588))
-    - 动态图3D混合并行支持模型保存和加载。 ([#34768](https://github.com/PaddlePaddle/Paddle/pull/34768))
-    - 针对模型并行+流水线并行场景，新增scatter-gather方案，优化跨机通信性能。 ([#34130](https://github.com/PaddlePaddle/Paddle/pull/34130))
-    - 流水线并行支持根据 Layer 数量的切分方式，保证切分更加均衡。 ([#34207](https://github.com/PaddlePaddle/Paddle/pull/34207))
-    - 流水线并行支持自动混合精度。([#33951](https://github.com/PaddlePaddle/Paddle/pull/33951))
-    - 流水线并行添加`paddle.distributed.fleet.meta_parallel.SharedLayerDesc`的组网描述， 用于支持参数共享的组网方式。([#33578](https://github.com/PaddlePaddle/Paddle/pull/33578))
-    - 张量并行添加 `paddle.distributed.fleet.meta_parallel.ParallelCrossEntropy`，支持交叉熵Loss的张量并行计算方式。([#33401](https://github.com/PaddlePaddle/Paddle/pull/33401))
-    - `paddle.DataParallel`添加`find_unused_parameters`接口，用于数据并行模式下，支持模型中使用控制流的情况。([#32826](https://github.com/PaddlePaddle/Paddle/pull/32826))
-    - 数据并行模式添加端口等待功能，解决端口冲突问题。([#34207](https://github.com/PaddlePaddle/Paddle/pull/34207))
+  
+  - 重构 `paddle.distributed.fleet.utils.recompute`，支持新动态图。 ([#41396](https://github.com/PaddlePaddle/Paddle/pull/41396))
+  
+  - 支持 Pure FP16 训练。([#36420](https://github.com/PaddlePaddle/Paddle/pull/36420)) 
+  
+  - 新增 MoE（Mixture of Experts）并行策略, 支持超大 MoE 模型训练。([#41092](https://github.com/PaddlePaddle/Paddle/pull/41092), [#40895](https://github.com/PaddlePaddle/Paddle/pull/40895), [#40850](https://github.com/PaddlePaddle/Paddle/pull/40580), [#39224](https://github.com/PaddlePaddle/Paddle/pull/39224))
+  
+  - 新增 GroupSharded 并行策略，支持 stage1、stage2、stage3三个阶段模型状态分组切片训练策略，支持同、异步通信，并可与 Recompute、AMP O1\O2、Offload、GroupShardedClipGrad、GroupShardedScaler 等基础功能组合使用。([#37489](https://github.com/PaddlePaddle/Paddle/pull/37489), [#37568](https://github.com/PaddlePaddle/Paddle/pull/37568), [#37707](https://github.com/PaddlePaddle/Paddle/pull/37707), [#37836](https://github.com/PaddlePaddle/Paddle/pull/37836), [#37947](https://github.com/PaddlePaddle/Paddle/pull/37947), [#38151](https://github.com/PaddlePaddle/Paddle/pull/38151), [#38407](https://github.com/PaddlePaddle/Paddle/pull/38407), [#38052](https://github.com/PaddlePaddle/Paddle/pull/38052), [#39112](https://github.com/PaddlePaddle/Paddle/pull/39112), [#38989](https://github.com/PaddlePaddle/Paddle/pull/38989), [#39171](https://github.com/PaddlePaddle/Paddle/pull/39171), [#39285](https://github.com/PaddlePaddle/Paddle/pull/39285), [#39334](https://github.com/PaddlePaddle/Paddle/pull/39334), [#39397](https://github.com/PaddlePaddle/Paddle/pull/39397), [#39581](https://github.com/PaddlePaddle/Paddle/pull/39581), [#39668](https://github.com/PaddlePaddle/Paddle/pull/39668), [#40129](https://github.com/PaddlePaddle/Paddle/pull/40129), [#40396](https://github.com/PaddlePaddle/Paddle/pull/40396), [#40488](https://github.com/PaddlePaddle/Paddle/pull/40488), [#40601](https://github.com/PaddlePaddle/Paddle/pull/40601)，[#37725](https://github.com/PaddlePaddle/Paddle/pull/37725)，[#37904](https://github.com/PaddlePaddle/Paddle/pull/37904), [#38064](https://github.com/PaddlePaddle/Paddle/pull/38064))
 
 - 静态图混合并行
-    - 支持流水线并行下 fuse grad merge 的功能，通过 `distributed_strategy.fuse_grad_merge` 开关控制，性能提升约5%。([#35004](https://github.com/PaddlePaddle/Paddle/pull/35004))
-    - 支持混合并行开启 dp 下 fuse allreduce sum功能，性能提升约3%。([#34480](https://github.com/PaddlePaddle/Paddle/pull/34480))
-	
-- 自动并行
-    - 新增自动并行 `shard_tensor`，`shard_op` 接口。([#33804](https://github.com/PaddlePaddle/Paddle/pull/33804), [#35765](https://github.com/PaddlePaddle/Paddle/pull/35765))，支持基于用户标记的半自动并行。
-    - 新增自动补全分布式属性功能，支持基于用户已标记的分布式属性补全所有未标记的分布式属性。 ([#34813](https://github.com/PaddlePaddle/Paddle/pull/34813))
-    - 新增自动切分串行 `Program` 功能。([#35117](https://github.com/PaddlePaddle/Paddle/pull/35117))
-    - 实现自动并行对 Fleet API 的适配。([#35483](https://github.com/PaddlePaddle/Paddle/pull/35483))
+  
+  - 新增`scale_gradient`标志位至`gradient_scale_configs`，用于控制流水线并行下梯度聚合运算对梯度进行求平均运算的位置。([#36384](https://github.com/PaddlePaddle/Paddle/pull/36384)) 
+  
+  - 张量模型并行下，dropout 支持设置确定性随机种子生成器，以确保非分布式变量的随机一致性和分布式变量的随机性。([#36228](https://github.com/PaddlePaddle/Paddle/pull/36228)) 
+  
+  - NPU 混合并行支持 Offload，可节约40%显存。([#37224](https://github.com/PaddlePaddle/Paddle/pull/37224))
+  
+  - 为 seed op 增加 `force_cpu` 可选参数，使 dropout 可以直接从 CPU 读取 seed 的值。([#35820](https://github.com/PaddlePaddle/Paddle/pull/35820)) 
+  
+  - 完善Automatic Sparsity (ASP)sharding策略，支持根据program选择sharding策略。(#[#40028](https://github.com/PaddlePaddle/Paddle/pull/40028)）
 
+- 自动并行
+  
+  - 新增逻辑进程与物理设备自动映射后的进程重新启动（relaunch）。([#37523](https://github.com/PaddlePaddle/Paddle/pull/37523), [#37326](https://github.com/PaddlePaddle/Paddle/pull/37326))  
+  
+  - 完善自动并行底层机制和接口，利于各个模块统一和添加优化 pass。([#36617](https://github.com/PaddlePaddle/Paddle/pull/36617), [#38132](https://github.com/PaddlePaddle/Paddle/pull/38132)) 
+  
+  - 新增统一资源表示，支持逻辑进程与物理设备自动映射功能。([#37091](https://github.com/PaddlePaddle/Paddle/pull/37091), [#37482](https://github.com/PaddlePaddle/Paddle/pull/37482), [#37094](https://github.com/PaddlePaddle/Paddle/pull/37094)) 
+  
+  - 完善自动并行计算图反向和更新部分的分布式属性补全功能。([#36744](https://github.com/PaddlePaddle/Paddle/pull/36744))
+  
+  - 新增数据切分功能。([#36055](https://github.com/PaddlePaddle/Paddle/pull/36055)) 
+  
+  - 新增张量重切分功能，根据张量和算子的分布式属性对张量进行重新切分。([#40865](https://github.com/PaddlePaddle/Paddle/pull/40865), [#41106](https://github.com/PaddlePaddle/Paddle/pull/41106))
+  
+  - 新增资源数量或并行策略变化时分布式参数的自动转换功能。([#40434](https://github.com/PaddlePaddle/Paddle/pull/40434))
+  
+  - 新增梯度累加功能（GradientMerge），减少通信次数，提升训练效率。([#38259](https://github.com/PaddlePaddle/Paddle/pull/38259), [#40737](https://github.com/PaddlePaddle/Paddle/pull/40737)) 
+  
+  - 新增重计算功能(Recompute)，优化显存。([#38920](https://github.com/PaddlePaddle/Paddle/pull/38920)) 
+  
+  - 新增 Sharding 优化 pass， 支持 p-g-os 3 个stage 的切分优化。([#38502](https://github.com/PaddlePaddle/Paddle/pull/38502))
+  
+  - 新增 AMP + FP16 优化 pass。([#38764](https://github.com/PaddlePaddle/Paddle/pull/38764), [#40615](https://github.com/PaddlePaddle/Paddle/pull/40615))
+  
+  - 新增 Transformer 类模型的 QKV fuse 切分。([#39080](https://github.com/PaddlePaddle/Paddle/pull/39080))
+  
+  - 新增 while op 的分布式属性推导功能，确保迭代推导算法能收敛。([#39939](https://github.com/PaddlePaddle/Paddle/pull/39939), [#39086](https://github.com/PaddlePaddle/Paddle/pull/39086), [#39014](https://github.com/PaddlePaddle/Paddle/pull/39014)) 
+  
+  - 支持子 block 和 while op 控制流的训练和推理。([#39612](https://github.com/PaddlePaddle/Paddle/pull/39612), [#39895](https://github.com/PaddlePaddle/Paddle/pull/39895), [#40077](https://github.com/PaddlePaddle/Paddle/pull/40077))
+
+- 参数服务器
+  
+  - GPUPS 下，新增 NAN/INF 值检查工具。 ([#38131](https://github.com/PaddlePaddle/Paddle/pull/38131))
+  
+  - GPUPS 下，新增 set_date 接口，适配增量训练。([#36194](https://github.com/PaddlePaddle/Paddle/pull/36194)) 
+  
+  - GPUPS 下，新增异步 release dataset 功能。 ([#37790](https://github.com/PaddlePaddle/Paddle/pull/37790))
+  
+  - GPUPS 下，支持 Dump 参数和中间层（[#36157](https://github.com/PaddlePaddle/Paddle/pull/36157)）；
+  
+  - GPUPS 下，支持优化器参数配置。([#39783](https://github.com/PaddlePaddle/Paddle/pull/39783), [#39849](https://github.com/PaddlePaddle/Paddle/pull/39849))
+  
+  - 统一参数服务器下，重构通信、存储等各个模块基类，提升各个模块的易二次开发性。([#41207](https://github.com/PaddlePaddle/Paddle/pull/41207), [#41022](https://github.com/PaddlePaddle/Paddle/pull/41022), [#40702](https://github.com/PaddlePaddle/Paddle/pull/40702), [#39341](https://github.com/PaddlePaddle/Paddle/pull/39341) [#39377](https://github.com/PaddlePaddle/Paddle/pull/39377), [#39191](https://github.com/PaddlePaddle/Paddle/pull/39191), [#39064](https://github.com/PaddlePaddle/Paddle/pull/39064))
+  
+  - 统一参数服务器下，新增评估指标模块，支持 AUC/WuAUC/MaskAuc 等评估指标计算及可自定义扩展。 ([#38789](https://github.com/PaddlePaddle/Paddle/pull/38789)) 
+
+#### Profiler
+
+- Python 层新增性能分析模块 `paddle.profiler`: 提供对训推过程中性能数据的收集，导出和统计的功能。 ([#40065](https://github.com/PaddlePaddle/Paddle/pull/40065), [#40357](https://github.com/PaddlePaddle/Paddle/pull/40357), [#40888](https://github.com/PaddlePaddle/Paddle/pull/40888)) 
+  
+  - `paddle.profiler.Profiler`，性能分析器，用户交互的接口。([#41029](https://github.com/PaddlePaddle/Paddle/pull/41029), [#41524](https://github.com/PaddlePaddle/Paddle/pull/41524), [#41157](https://github.com/PaddlePaddle/Paddle/pull/41157), [#40249](https://github.com/PaddlePaddle/Paddle/pull/40249), [#40111](https://github.com/PaddlePaddle/Paddle/pull/40111), [#39964](https://github.com/PaddlePaddle/Paddle/pull/39964), [#40133](https://github.com/PaddlePaddle/Paddle/pull/40133))
+  
+  - `paddle.profiler.RecordEvent`，提供自定义打点来记录时间的功能。 ([#39693](https://github.com/PaddlePaddle/Paddle/pull/39693), [#39694](https://github.com/PaddlePaddle/Paddle/pull/39694), [#39695](https://github.com/PaddlePaddle/Paddle/pull/39695), [#39675](https://github.com/PaddlePaddle/Paddle/pull/39675),[#41445](https://github.com/PaddlePaddle/Paddle/pull/41445), [#41132](https://github.com/PaddlePaddle/Paddle/pull/41132))
+  
+  - `paddle.profiler.ProfilerTarget`，指定性能分析的目标设备。
+  
+  - `paddle.profiler.ProfilerState`，表示性能分析器的状态。
+  
+  - `paddle.profiler.SortedKeys`，指定统计表单内数据的排序方式。
+  
+  - `paddle.profiler.make_scheduler`，生成性能分析器状态的调度器，实现采集范围的周期性控制。
+  
+  - `paddle.profiler.export_chrome_tracing`，将性能数据保存到可供 chrome://tracing 插件查看的 google chrome tracing 文件。 ([#39316](https://github.com/PaddlePaddle/Paddle/pull/39316), [#39984](https://github.com/PaddlePaddle/Paddle/pull/39984), [#41029](https://github.com/PaddlePaddle/Paddle/pull/41029))
+  
+  - `paddle.profiler.export_protobuf`，将性能数据保存到内部结构表示的 protobuf 文件。 ([#39519](https://github.com/PaddlePaddle/Paddle/pull/39519), [#39109](https://github.com/PaddlePaddle/Paddle/pull/39109), [#39474](https://github.com/PaddlePaddle/Paddle/pull/39474))
+  
+  - `paddle.profiler.load_profiler_result`，载入所保存到 protobuf 文件的性能数据。
+  
+  - `paddle.profiler.Profiler`通过指定 `timer_only` 参数，对模型进行数据读取、step 开销和吞吐量的统计。([#40386](https://github.com/PaddlePaddle/Paddle/pull/40386)) 
+
+- C++层重构 Profiler 底层基础设施 
+  
+  - 重构 Profiler 的控制器架构。（[#38826](https://github.com/PaddlePaddle/Paddle/pull/38826), [#39230](https://github.com/PaddlePaddle/Paddle/pull/39230), [#39779](https://github.com/PaddlePaddle/Paddle/pull/39779) ）
+  
+  - 新增 Host Tracer，收集主机侧性能指标。（[#37629](https://github.com/PaddlePaddle/Paddle/pull/39629), [#37766](https://github.com/PaddlePaddle/Paddle/pull/37766), [#37944](https://github.com/PaddlePaddle/Paddle/pull/37944), [#38280](https://github.com/PaddlePaddle/Paddle/pull/38280), [#39975](https://github.com/PaddlePaddle/Paddle/pull/39975), [#40460](https://github.com/PaddlePaddle/Paddle/pull/40460)）
+  
+  - 新增 CUDA Tracer，收集设备侧性能指标。（[#39488](https://github.com/PaddlePaddle/Paddle/pull/39488)）
+  
+  - Profiler 支持分级。（[#39926](https://github.com/PaddlePaddle/Paddle/pull/39926)）
 
 #### 其他
 
 - 模型量化
-    - 新增动态图离线量化功能。([#33445](https://github.com/PaddlePaddle/Paddle/pull/33445),  [#33898](https://github.com/PaddlePaddle/Paddle/pull/33898), [#33962](https://github.com/PaddlePaddle/Paddle/pull/33962),  [#35015](https://github.com/PaddlePaddle/Paddle/pull/35015))
-    - 重构动态图量化训练功能中统计输出量化信息模块，和预测端打通，提升鲁棒性。 ([#31680](https://github.com/PaddlePaddle/Paddle/pull/31680), [#31710](https://github.com/PaddlePaddle/Paddle/pull/31710), [#31861](https://github.com/PaddlePaddle/Paddle/pull/31861))
-    - 动态图量化训练支持和混合精度训练结合使用。([#33484](https://github.com/PaddlePaddle/Paddle/pull/33484))
-    - 动态图量化训练功能支持对Function类API进行量化。([#33162](https://github.com/PaddlePaddle/Paddle/pull/33162), [#33871](https://github.com/PaddlePaddle/Paddle/pull/33871))
-    - 支持静态图模式下分布式量化训练。 ([#33781](https://github.com/PaddlePaddle/Paddle/pull/33781))
-    - 支持动态图模式下conv2d_transpose的量化。([#34547](https://github.com/PaddlePaddle/Paddle/pull/34547))
+  
+  - 升级量化存储格式，并统一动、静态图量化格式。([#41041](https://github.com/PaddlePaddle/Paddle/pull/41041))
+  
+  - 新增离线量化方法: EMD、Adaround。([#40421](https://github.com/PaddlePaddle/Paddle/pull/40421), [#38460](https://github.com/PaddlePaddle/Paddle/pull/38460))
+  
+  - 支持更多 op 适配模 op 量化。([#40083](https://github.com/PaddlePaddle/Paddle/pull/40083))
+  
+  - 支持控制流中的OP量化。([#37498](https://github.com/PaddlePaddle/Paddle/pull/37498)) 
+  
+  - 新增支持matmul_v2 OP的量化。([#36469](https://github.com/PaddlePaddle/Paddle/pull/36469))
+  
+  - 新增支持量化后的 matmul_v2 在 TensorRT 上的推理。([#36594](https://github.com/PaddlePaddle/Paddle/pull/36594))
+
+- 显存优化
+  
+  - 实现多 stream 安全 Allocator，支持在多 stream 异步计算场景下安全高效地使用显存。([#37290](https://github.com/PaddlePaddle/Paddle/pull/37290)) 
+  
+  - 新增运行时显存监控模块(paddle.device.cuda.max_memory_allocated, paddle.device.cuda.max_memory_reserved, paddle.device.cuda.memory_allocated and paddle.device.cuda.memory_reserved)，支持高性能地实时统计显存数据。([#38657](https://github.com/PaddlePaddle/Paddle/pull/38657)) 
+  
+  - 实现 CPU-GPU 统一内存寻址（CUDA Managed Memory），支持在显存受限场景下训练超大模型。([#39075](https://github.com/PaddlePaddle/Paddle/pull/39075)) 
+  
+  - C++底层新增GetBasePtr接口，用来获取设备接口CUDAMalloc创建的设备地址。([#37978](https://github.com/PaddlePaddle/Paddle/pull/37978)) 
+  
+  - 减少AutoGrowth Allocator 中 free blocks 的数量，提升显存分配性能。([#35732](https://github.com/PaddlePaddle/Paddle/pull/35732)) 
+  
+  - 对于 `initializer.Normal` 和 `initializer.Constant` 数据类型是 FP16 的 Tensor 去除多余的 float32 临时 Tensor 以及 cast，节省2倍显存。 ([#38818](https://github.com/PaddlePaddle/Paddle/pull/38818)) 
+
+- 动态图高阶导数组网测试 
+  
+  - 为动态图增加三阶导数组网测试，以及Broadcast情况的测试。 ([#36814](https://github.com/PaddlePaddle/Paddle/pull/36814) , [#37377](https://github.com/PaddlePaddle/Paddle/pull/37377))
+
+- 自定义 op：支持 ROCm(HIP) 平台进行自定义 op 注册。 ([#36771](https://github.com/PaddlePaddle/Paddle/pull/36771)) 
+
+- Cost Model：增加基于运行 Profile 的 Cost Model。 ([#35774](https://github.com/PaddlePaddle/Paddle/pull/35774))
+
+- 提供定制化层 (nn.Layer)的自动稀疏训练支持，让用戶可根据自定义的Prune函数来对其设计的层进行稀疏剪枝。([#40253](https://github.com/PaddlePaddle/Paddle/pull/40253)) 
+
+- 新增字符串张量底层数据结构表示，使框架具备字符串张量表示和计算的能力。([#39830](https://github.com/PaddlePaddle/Paddle/pull/39830), [#40992](https://github.com/PaddlePaddle/Paddle/pull/40992)) 
+
+- 新增或者升级 oneDNN FP32/int8/bfloat16 Kernel，包括：
+  
+  - ELU ([#37149](https://github.com/PaddlePaddle/Paddle/pull/37149))
+  
+  - exp ([#38624](https://github.com/PaddlePaddle/Paddle/pull/38624))
+  
+  - stack ([#37002](https://github.com/PaddlePaddle/Paddle/pull/37002))
+  
+  - softplus ([#36382](https://github.com/PaddlePaddle/Paddle/pull/36382))
+  
+  - round ([#39653](https://github.com/PaddlePaddle/Paddle/pull/39653))
+  
+  - shape ([#36033](https://github.com/PaddlePaddle/Paddle/pull/36033))
+  
+  - flatten and flatten2 ([#35892](https://github.com/PaddlePaddle/Paddle/pull/35892))
+  
+  - slice ([#37630](https://github.com/PaddlePaddle/Paddle/pull/37630))
+  
+  - elementwise_mul ([#40546](https://github.com/PaddlePaddle/Paddle/pull/40546))
+  
+  - elementwise_add ([#38176](https://github.com/PaddlePaddle/Paddle/pull/38176))
+  
+  - ementwise_div ([#36158](https://github.com/PaddlePaddle/Paddle/pull/36158))
+  
+  - elementwise_sub ([#35662](https://github.com/PaddlePaddle/Paddle/pull/35662))
+  
+  - roi_align ([#37848](https://github.com/PaddlePaddle/Paddle/pull/37848))
+  
+  - nearest_interp and nearest_interp_v2 ([#37985](https://github.com/PaddlePaddle/Paddle/pull/37985)，[#38622](https://github.com/PaddlePaddle/Paddle/pull/38622)，[#39490](https://github.com/PaddlePaddle/Paddle/pull/39490))
+  
+  - assembly optimized Adam ([#39158](https://github.com/PaddlePaddle/Paddle/pull/39158))
+  
+  - logsoftmax ([#39793](https://github.com/PaddlePaddle/Paddle/pull/39793))
+  
+  - activation ([#40721](https://github.com/PaddlePaddle/Paddle/pull/40721))
+  
+  - mul ([#38552](https://github.com/PaddlePaddle/Paddle/pull/38552))
+  
+  - mean ([#37104](https://github.com/PaddlePaddle/Paddle/pull/37104))
+  
+  - relu ([#36265](https://github.com/PaddlePaddle/Paddle/pull/36265))
+  
+  - pool2d ([#37081](https://github.com/PaddlePaddle/Paddle/pull/37081))
+  
+  - concat ([#35889](https://github.com/PaddlePaddle/Paddle/pull/35889))
+  
+  - conv2d ([#38507](https://github.com/PaddlePaddle/Paddle/pull/38507)，[#38938](https://github.com/PaddlePaddle/Paddle/pull/38938)，[#36284](https://github.com/PaddlePaddle/Paddle/pull/36284))
+  
+  - LayerNorm ([#40418](https://github.com/PaddlePaddle/Paddle/pull/40418))
+
+### （2）功能优化
 
-- 自定义OP
-    - 新增自定义算子 DCU 后端支持。([#34050](https://github.com/PaddlePaddle/Paddle/pull/34050))
-	
-- Cost Model
-    - 新增 Paddle CostModel，实现通过 Profiler 获取 op 时间 cost 的方法。 ([#35774](https://github.com/PaddlePaddle/Paddle/pull/35774)) 
+#### API
 
-- 模型保存与载入 
-    - 新增通过 ``paddle.jit.save`` 接口直接将 Layer 的非 forward 成员方法及相关参数保存为推理模型的功能。 ([#34070](https://github.com/PaddlePaddle/Paddle/pull/34070))
+- 为 `paddle.Model`新增支持混合精度训练 O2 模式，即支持原来动/静态图的 Pure FP16 训练模式。([#36441](https://github.com/PaddlePaddle/Paddle/pull/40962441)) 
 
+- 为 `paddle.nn.Layer` 支持 self chain 调用。([#36609](https://github.com/PaddlePaddle/Paddle/pull/36609)) 
 
-- ONNX Exporter 
-    - 新增8个算子适配： `softplus`、`elementwise_mod`、 `elementwise_floordiv`、`p_norm`、`depthwise_transpose`、`group_norm`、`pixel_shuffle`、`top_k`。([Paddle2ONNX#252](https://github.com/PaddlePaddle/Paddle2ONNX/pull/252),  [Paddle2ONNX#261](https://github.com/PaddlePaddle/Paddle2ONNX/pull/261),  [Paddle2ONNX#293](https://github.com/PaddlePaddle/Paddle2ONNX/pull/293))
-    - 新增8个检测模型导出：PPYOLO、PPYOLOv2、PPYOLO-Tiny、TTFNet、PAFNet、FCOS、SSD。 ([Paddle2ONNX#252](https://github.com/PaddlePaddle/Paddle2ONNX/pull/252))
+- 为 `paddle.nn.Layer`的`to`方法添加`is_distributed`属性的设置，保证网络参数转换前后分布式属性保持一致。([#36221](https://github.com/PaddlePaddle/Paddle/pull/36221)) 
 
-### （2）功能优化
+- 完善 `paddle.nn.Layer`的`to` 方法的参数转换逻辑，降低转换过程占用的峰值显存，提高转换成功率。([#36862](https://github.com/PaddlePaddle/Paddle/pull/36862)) 
 
-#### API
-- `paddle.slice` 增加对`bool`类型Tensor的支持以及优化了报错信息。([#35586](https://github.com/PaddlePaddle/Paddle/pull/35586), [#35179](https://github.com/PaddlePaddle/Paddle/pull/35179))
-- `paddle.strided_slice`新增对`TensorArray`类型输入的支持，调整了`step<0`时的输出结果，调整后的结果与`numpy`保持一致。([#34205](https://github.com/PaddlePaddle/Paddle/pull/34205), [#34172](https://github.com/PaddlePaddle/Paddle/pull/34172))
-- ``paddle.multiply`` 支持 ``bool`` 数据类型的运算。([#35551](https://github.com/PaddlePaddle/Paddle/pull/35551))
-- 逻辑运算(``paddle.logical_not, paddle.logical_and, paddle.logical_or, paddle.logical_xor``)支持非 ``bool`` 数据类型(``int8, int16, int32, int64, float, double``)。([#34141](https://github.com/PaddlePaddle/Paddle/pull/34141))
-- ``paddle.transpose`` 支持 ``bool`` 类型运算。([#35886](https://github.com/PaddlePaddle/Paddle/pull/35886))
-- ``paddle.strided_slice`` 支持 ``bool`` 类型运算。([#33373](https://github.com/PaddlePaddle/Paddle/pull/33373))
-- ``paddle.set_printoptions`` 支持设置 ``linewidth`` 来打印 ``Tensor`` 。([#35175](https://github.com/PaddlePaddle/Paddle/pull/35175))
-- ``paddle.to_tensor`` 支持 ``LoDTensor`` 。([#33027](https://github.com/PaddlePaddle/Paddle/pull/33027))
-- ``paddle.linalg.det`` 和 ``paddle.linalg.slogdet`` 支持反向运算。([#36013](https://github.com/PaddlePaddle/Paddle/pull/36013))
-- ``paddle.nn.functional.pad`` 支持全维度pad时，tuple类型pad参数的输入。 ([35985](https://github.com/PaddlePaddle/Paddle/pull/35985))
-- 优化``paddle.nn.functional.pad`` 输入异常时的报错信息。 ([34979](https://github.com/PaddlePaddle/Paddle/pull/34979))
-- 静态图支持对部分 ``program``，生成相应的反向``program``。([#34395](https://github.com/PaddlePaddle/Paddle/pull/34395))
-- oneDNN 功能优化
-    - 新增多个算子的oneDNN kernels支持，包括新增对 ``clip``、``slice``、``split``、``cast``、 ``scale``、``expand_v2``、``sigmoid``、``matmul_v2``、``PRelu`` 的前向和反向 oneDNN FP32 和 oneNheN BF16的支持。([#35601](https://github.com/PaddlePaddle/Paddle/pull/35601), [#34332](https://github.com/PaddlePaddle/Paddle/pull/34332), [#34284](https://github.com/PaddlePaddle/Paddle/pull/34284), [#34216](https://github.com/PaddlePaddle/Paddle/pull/34216), [#34192](https://github.com/PaddlePaddle/Paddle/pull/34192),  [#33878](https://github.com/PaddlePaddle/Paddle/pull/33878), [#33584](https://github.com/PaddlePaddle/Paddle/pull/33584), [#33056](https://github.com/PaddlePaddle/Paddle/pull/33056), [#32975](https://github.com/PaddlePaddle/Paddle/pull/32975))
-  - 新增SGD算子中 Selected rows 使用 oneDNN AXPY 的实现。([33632](https://github.com/PaddlePaddle/Paddle/pull/33632))
-- Ampere 架构的GPU上支持 ``bfloat16`` 数据类型。([#31232](https://github.com/PaddlePaddle/Paddle/pull/32132), [#32221](https://github.com/PaddlePaddle/Paddle/pull/32221), [#32542](https://github.com/PaddlePaddle/Paddle/pull/32542))
-- Ampere 架构的GPU上 ``Conv`` 算子设置使用 Tensor Core 。([#34409](https://github.com/PaddlePaddle/Paddle/pull/34409))
-- 支持 ``paddle.device.cuda.current_stream().cuda_stream`` 获取裸指针。([#35813](https://github.com/PaddlePaddle/Paddle/pull/35813))
-- 新增``paddle.optimizer.AdamW`` GPU fuse kernel 实现，并支持 layerwise learning rate 功能。([#35020](https://github.com/PaddlePaddle/Paddle/pull/35020), [#35569](https://github.com/PaddlePaddle/Paddle/pull/35569))
-- 支持在 paddle 中使用Nvidia的cusparse库函数。([#35675](https://github.com/PaddlePaddle/Paddle/pull/35675))
-- 新增 ``paddle.full`` 对 ``int16`` 类型的支持。([#35619](https://github.com/PaddlePaddle/Paddle/pull/35619))
-- 优化 ``paddle.nn.ClipGradByGlobalNorm`` 的显存占用。([#34586](https://github.com/PaddlePaddle/Paddle/pull/34586))
-- `reduce_sum` 算子支持float16类型([#32966](https://github.com/PaddlePaddle/Paddle/pull/32966))
-- `paddle.nn.CTCLoss` 新增两种 grad norm 方法`norm_by_total_logits_len` 和 `norm_by_batchsize` 。([#34729](https://github.com/PaddlePaddle/Paddle/pull/34729/)) 
-- 新增各路径下公开API推荐使用路径。([#33313](https://github.com/PaddlePaddle/Paddle/pull/33313), [#33308](https://github.com/PaddlePaddle/Paddle/pull/33308), [#32759](https://github.com/PaddlePaddle/Paddle/pull/32759), [#32695](https://github.com/PaddlePaddle/Paddle/pull/32695), [#32643](https://github.com/PaddlePaddle/Paddle/pull/32643), [#31912](https://github.com/PaddlePaddle/Paddle/pull/31912), [#32650](https://github.com/PaddlePaddle/Paddle/pull/32650), [#32034](https://github.com/PaddlePaddle/Paddle/pull/32034), [#33897](https://github.com/PaddlePaddle/Paddle/pull/33897)) 
-- 恢复 `paddle.vision` 路径下原API可访问性。([#34432](https://github.com/PaddlePaddle/Paddle/pull/34432))
--  `paddle.vision.ops.deform_conv2d, paddle.vision.ops.DeformConv2D` 新增 double 输入类型支持。 ([#35330](https://github.com/PaddlePaddle/Paddle/pull/35330))
-- `paddle.fluid.contrib.layers.shuffle_batch` 新增 GPU Kernel实现。[#33938](https://github.com/PaddlePaddle/Paddle/pull/33938) 
-- 已有API新增公开调用路径 `paddle.linalg.cholesky`, `paddle.linalg.norm`, `paddle.linalg.inv`。([#33420](https://github.com/PaddlePaddle/Paddle/pull/33420)) 
-- `paddle.reshape` 支持将空 `Tensor` 形变成另一个形状的空 `Tensor`。([#36087](https://github.com/PaddlePaddle/Paddle/pull/36087))
-- `paddle.equal`第二个输入新增 `int`、`float` 和 `bool` 类型的支持。([#35695](https://github.com/PaddlePaddle/Paddle/pull/35695))
-- ``paddle.io.DataLoader``新增支持persistent_worker模式。([#34017](https://github.com/PaddlePaddle/Paddle/pull/34017))
-- 优化``l2_normalize``,``p_norm``,``elementwise_max``,``prelu``,``clip_by_norm``,``lars optimizer``算子支持float16计算。 ([#35576](https://github.com/PaddlePaddle/Paddle/pull/35576), [#35888](https://github.com/PaddlePaddle/Paddle/pull/35888), [#35888](https://github.com/PaddlePaddle/Paddle/pull/35888), [35532](https://github.com/PaddlePaddle/Paddle/pull/35532), [#35446](https://github.com/PaddlePaddle/Paddle/pull/35446), [#33280](https://github.com/PaddlePaddle/Paddle/pull/33280))
-- 优化flowers数据集的读取速度，从每批次数分钟优化至1~3秒。([#31408](https://github.com/PaddlePaddle/Paddle/pull/31408))
-- 支持`paddle.distributed.fleet.DistributedStrategy` 中 `without_graph_optimize` 开关打开后的fuse allreduce sum功能。FP32下性能提升3%，AMP下性能提升8%。([#34446](https://github.com/PaddlePaddle/Paddle/pull/34446)) 
-- `paddle.matmul` 将底层Op算子由matmul op 切换到 matmul_v2 op。 ([#36374](https://github.com/PaddlePaddle/Paddle/pull/36374))
-- `paddle.fft` 模块添加了 mkl_cdft 和 hipfft 两个计算后端。 ([#36537](https://github.com/PaddlePaddle/Paddle/pull/36537))
-- `paddle.roll` 的参数 `shifts` 支持 `Tensor` 作为输入。 ([#36537](https://github.com/PaddlePaddle/Paddle/pull/36537))
-- `paddle.shape` 支持复数类型的输入。([#36835](https://github.com/PaddlePaddle/Paddle/pull/36835))
-- matmul_v2 支持量化。([#36469](https://github.com/PaddlePaddle/Paddle/pull/36469))
-- 新增 `clip_op` 对 `float16` 的支持。 ([#36672](https://github.com/PaddlePaddle/Paddle/pull/36672))
-- `paddle.fft` 模块为 cufft 后端添加了缓存 plan 的功能，优化性能。([#36537](https://github.com/PaddlePaddle/Paddle/pull/36537))
+- 为 `paddle.incubate.graph_send_recv`支持设置输出 Tensor 的 shape，有利于减少实际计算过程的显存占用。([#40509](https://github.com/PaddlePaddle/Paddle/pull/40509)) 
+
+- 为 `paddle.incubate.segment_sum`、`segment_mean`、`segment_max`、`segment_min` 新增 int32、int64 数据类型支持。([#40577](https://github.com/PaddlePaddle/Paddle/pull/40577)) 
+
+- 为 transpose op 新增 bool 类型支持。([#35886](https://github.com/PaddlePaddle/Paddle/pull/35886)) 
+
+- 将 `paddle.mm` 底层算子从 matmul 切换到matmul_v2。 ([#35770](https://github.com/PaddlePaddle/Paddle/pull/35770)) 
+
+- 为 `paddle.einsum` 支持静态图模式调用，支持未知 shape。 ([#40360](https://github.com/PaddlePaddle/Paddle/pull/40360)) 
+
+- 为 `paddle.nn.functional.margin_cross_entropy` 和 `paddle.nn.functional.class_center_sample` 支持数据并行。([#39852](https://github.com/PaddlePaddle/Paddle/pull/39852)) 
+
+- 为 `paddle.nn.functional.grid_sample`支持形状为[1]的输入。（[#36183](https://github.com/PaddlePaddle/Paddle/pull/36183)）
+
+- 为 `paddle.nn.PRelu` 支持 `NHWC` 数据格式。([#37019](https://github.com/PaddlePaddle/Paddle/pull/37019)) 
+
+- 为 `paddle.nn.functional.class_center_sample` 支持使用 `paddle.seed` 固定随机状态。([#38248](https://github.com/PaddlePaddle/Paddle/pull/38248)) 
+
+- 为 `paddle.fft` 下所有 API 新增 ROCM 后端支持，并优化 CUFFT 后端报错信息。([#36415](https://github.com/PaddlePaddle/Paddle/pull/36415), [#36114](https://github.com/PaddlePaddle/Paddle/pull/36114/files))
+
+- 为 `Tensor.getitem` 增加对切片部分维度为0的功能支持，即允许切片索引结果为空。([#37313](https://github.com/PaddlePaddle/Paddle/pull/37313)) 
+
+- 为 `Tensor.setitem` 支持 int 和 bool 类型 Tensor 使用 bool 索引。([#37761](https://github.com/PaddlePaddle/Paddle/pull/37761)) 
+
+- 为 `paddle.nn.functional.interpolate` 支持 nearest 模式时输入 shape 为 5D。([#38868](https://github.com/PaddlePaddle/Paddle/pull/38868)) 
+
+- 为 `paddle.nn.Embedding`、`paddle.gather` 增加 int16 支持。([#40964](https://github.com/PaddlePaddle/Paddle/pull/40964), [#40052](https://github.com/PaddlePaddle/Paddle/pull/40052)) 
+
+- 为 `paddle.distributed.spawn`添加 CPU 单机数据并行。 ([#35745](https://github.com/PaddlePaddle/Paddle/pull/35745), [#36758](https://github.com/PaddlePaddle/Paddle/pull/36758), [#36637](https://github.com/PaddlePaddle/Paddle/pull/36637)) 
+
+- 新增`depthwise_conv2d`MKLDNN 算子。([#38484](https://github.com/PaddlePaddle/Paddle/pull/38484)) 
+
+- 为`paddle.abs`、`paddle.transpose`、`paddle.squeeze`、`paddle.unsqueeze`、 `paddle.matmul`、`paddle.full` 静态图数据类型检测中增加复数类型。([#40113](https://github.com/PaddlePaddle/Paddle/pull/40113))
+
+- 为 `paddle.autograd.PyLayer` 支持 tuple/list 类型的参数。([#38146](https://github.com/PaddlePaddle/Paddle/pull/38146))
+
+- 为 `paddle.autograd.PyLayer` 增加检查 inplace 策略下，输入叶子节点的 Tensor 的检查报错机制。([#37931](https://github.com/PaddlePaddle/Paddle/pull/37931))
+
+- 为 `paddle.autograd.PyLayer` 支持 HIP 库。([#38184](https://github.com/PaddlePaddle/Paddle/pull/38184)) 
+
+- 为 `paddle.take_along_axis`、`paddle.put_along_axis` 支持更多 size 的输入，允许 index 矩阵的 shape size 大于 arr 矩阵的 shape size。 ([#39072](https://github.com/PaddlePaddle/Paddle/pull/39072))
+
+- 优化 API `paddle.nn.Pad2D`在 replicate 为0时的报错信息。([#36510](https://github.com/PaddlePaddle/Paddle/pull/36510/files))
+
+- 支持 API `paddle.nn.Pad2D`在 tuple 格式的 pad 输入。([#35985](https://github.com/PaddlePaddle/Paddle/pull/35985/files))
+
+- 新增 `paddle.distributed.InMemoryDataset` 中 tdm_sample API 以支持 TDM 算法中的采样操作。([#37044](https://github.com/PaddlePaddle/Paddle/pull/37044)) 
+
+- 新增对于`paddle.jit.save`的 Pre-saving Hooks 机制。([#38186](https://github.com/PaddlePaddle/Paddle/pull/38186)）
+
+- 新增高阶微分相关 API：
+  
+  - `elementwise_add` 增加三阶 Kernel，支持三阶微分的计算。([#36508](https://github.com/PaddlePaddle/Paddle/pull/36508), [#36618](https://github.com/PaddlePaddle/Paddle/pull/36618)) 
+  
+  - `matmul_v2` 增加三阶 Kernel，支持三阶微分的计算。([#36459](https://github.com/PaddlePaddle/Paddle/pull/36459))
+  
+  - `elementwise_mul` 增加三阶 Kernel，支持三阶微分的计算。 ([#37152](https://github.com/PaddlePaddle/Paddle/pull/37547)) 
+
+- 完善`paddle.amp.GradScaler`调用 check_finite_and_unscale op 的逻辑，消除该处创建 bool 变量所引入的 cudaMemcpy。([#37770](https://github.com/PaddlePaddle/Paddle/pull/37770))
+
+- 新增对 unstack 和 unique op 元素个数为0的 Tensor 增加检查。([#36021](https://github.com/PaddlePaddle/Paddle/pull/36021)) 
 
 #### IR(Intermediate Representation)
+
 - 动态图转静态图
-    - 优化动转静报错格式，隐藏框架层不必要的报错栈，添加用户代码报错行定位标识和上下文。([#35365](https://github.com/PaddlePaddle/Paddle/pull/35365), [#35320](https://github.com/PaddlePaddle/Paddle/pull/35320))
-    - 优化控制流中 ``list.append`` 语法的转换逻辑。([#35212](https://github.com/PaddlePaddle/Paddle/pull/35212)) 
-    - 优化了动转静训练代码逻辑，升级内部 ``Program`` 缓存机制，新增输入 ``Tensor`` 的提前 copy 策略，提升训练性能。 ([#34181](https://github.com/PaddlePaddle/Paddle/pull/34181), [#33796](https://github.com/PaddlePaddle/Paddle/pull/33796))
-    - 优化动转静内部执行器显存回收策略，减少训练时显存占用量。 ([#34177](https://github.com/PaddlePaddle/Paddle/pull/34177))
-    - 集成了 ``Gast`` 三方依赖库的源码，解耦了版本依赖。 ([#34556](https://github.com/PaddlePaddle/Paddle/pull/34556)) 
-    - 动转静报错时显示部分框架层报错信息，使得定位问题更加容易。([#36765](https://github.com/PaddlePaddle/Paddle/pull/36765))
-    - 移除动转静报错模块中重复的临时文件删除函数`remove_static_file()`。([#36375](https://github.com/PaddlePaddle/Paddle/pull/36375))
-    - 优化对RegisterPass中`input_specs`参数处理，支持图优化时作为匹配子图条件。([#36453](https://github.com/PaddlePaddle/Paddle/pull/36453))
+  
+  - 优化动转静下 `ProgramCache.last` 接口行为，使其返回最近使用的 Program，而非最后生成的Program。([#39541](https://github.com/PaddlePaddle/Paddle/pull/39541)) 
+  
+  - 优化动转静下 `paddle.reshape` API 的报错信息，新增推荐用法提示。([#40599](https://github.com/PaddlePaddle/Paddle/pull/40599)) 
+  
+  - 优化动转静代码转写时 `is_api_in_module` 函数中异常捕获类型。([#40243](https://github.com/PaddlePaddle/Paddle/pull/40243)) 
+  
+  - 优化动转静模块报错提示，默认隐藏warning信息。([#39730](https://github.com/PaddlePaddle/Paddle/pull/https://github.com/PaddlePaddle/Paddle/pull/39730)) 
+  
+  - 增加动转静对于type hint语法的支持，提高变量类型分析的准确性。([#39572](https://github.com/PaddlePaddle/Paddle/pull/39572)) 
+  
+  - 优化 `paddle.cond` 功能，允许bool、int等基本类型支持值相等。([#37888](https://github.com/PaddlePaddle/Paddle/pull/37888)) 
+  
+  - 优化动转静`@to_static` 装饰普通函数时，允许切换train/eval模式。([#37383](https://github.com/PaddlePaddle/Paddle/pull/37383)) 
+  
+  - 优化动转静报错栈，突出用户相关代码，减少框架冗余报错栈。([#36741](https://github.com/PaddlePaddle/Paddle/pull/36741)) 
+  
+  - 移除`paddle.cond` 返回值中 `no_value` 占位符。([#36513](https://github.com/PaddlePaddle/Paddle/pull/36513)、[#36826](https://github.com/PaddlePaddle/Paddle/pull/36826)) 
+  
+  - 为动转静 run_program op 适配新动态图模式。([#40198](https://github.com/PaddlePaddle/Paddle/pull/40198), [#40355](https://github.com/PaddlePaddle/Paddle/pull/40355)) 
+  
+  - 新增对于 zip 语法的检查。 ([#37846](https://github.com/PaddlePaddle/Paddle/pull/https://github.com/PaddlePaddle/Paddle/pull/37846)) 
+  
+  - 修复 `paddle.signal.frame`、`paddle.signal.stft`、`paddle.signal.istft` 因维度和类型判断错误导致的动转静失败问题。([#40113](https://github.com/PaddlePaddle/Paddle/pull/40113))
+  
+  - 为 mean、pad3d ops 新增注册复数类型 Kernel。([#40113](https://github.com/PaddlePaddle/Paddle/pull/40113))
 
+#### 混合精度训练
 
-#### 分布式训练
-- 分布式训练基础功能
-    - 增强静态图流水线并行 stage 以及 persist var 的检查。([#34193](https://github.com/PaddlePaddle/Paddle/pull/34193), [#34870](https://github.com/PaddlePaddle/Paddle/pull/34870), [#35453](https://github.com/PaddlePaddle/Paddle/pull/35453))
-    - 优化静态图流水线并行，1F1B 调度使显存不随 global batch size 增大而增大。([#34230](https://github.com/PaddlePaddle/Paddle/pull/34230))
-    - GPU 参数服务器优化构建阶段 hashmap，构建阶段性能在某些任务上提升可达7倍。([#34175](https://github.com/PaddlePaddle/Paddle/pull/34175)) 
-    - GPU 参数服务器 pull/push 阶段新增多流并行。([#34276](https://github.com/PaddlePaddle/Paddle/pull/34276)) 
-    - GPU 参数服务器支持多机训练模式下，机器间远程拉取参数。([#35396](https://github.com/PaddlePaddle/Paddle/pull/35396))
-    - CPU 参数服务器支持 SSD存储。 ([#33031](https://github.com/PaddlePaddle/Paddle/pull/33031))
-    - `paddle.io.Dataset` 支持动态库解析数据。 ([#33969](https://github.com/PaddlePaddle/Paddle/pull/33969))
-    - 新增 `paddle.distributed.fleet.dataset.DatasetBase` 中对`use_var_list`和 `pipe_command` 生成数据的一致性检查函数。 ([#34463](https://github.com/PaddlePaddle/Paddle/pull/34463))
-    - 新增 `paddle.fluid.layers.embedding` 的 `emd` 维度与 `fleet` 中` sparse table` 的 `emb` 维度的一致性检查。 ([#34249](https://github.com/PaddlePaddle/Paddle/pull/34249))
-    - 动态图混合并行支持Pure FP16训练。([#36707](https://github.com/PaddlePaddle/Paddle/pull/36707))
-    - 静态图混合并行支持dropout使用固定随机种子生成器，以确保模型并行中全局变量的一致性与局部变量的随机性。([#36682](https://github.com/PaddlePaddle/Paddle/pull/36682))
-    ‘
-    - 实现了CPU并行，并支持调用 spawn 或 launch 时可以添加自定义的backend参数。可用的backend选择为 "gloo", "nccl", "bkcl", "auto" ，分别表示CPU并行，GPU并行，XPU并行和按照Paddle版本自动选择。([#35745](https://github.com/PaddlePaddle/Paddle/pull/35745))
-    - 优化动态图混合并行 HybridParallelClipGrad 策略，支持4D混合并行+Pure FP16训练。([#36707](https://github.com/PaddlePaddle/Paddle/pull/36707))
-    - 添加 SlotRecordDataset 类支持GPU参数服务器训练。([#36710](https://github.com/PaddlePaddle/Paddle/pull/36710))
-    - GPU参数服务器构建阶段支持使用SlotRecordDataset。([#36723](https://github.com/PaddlePaddle/Paddle/pull/36723))
+- 为 amp 添加 GPU Compute Capability 环境检查，对无法产生训练加速效果的 GPU 环境添加使用警告。([#38086](https://github.com/PaddlePaddle/Paddle/pull/38086)) 
 
-- 静态图混合并行
-    - 优化混合并行 loss scale，减少 scale op 插入个数。([#35775](https://github.com/PaddlePaddle/Paddle/pull/35775))
-    - 优化 pipeline 的调度器，cache 重复的 cpu 工作，降低 cpu 开销。([#35680](https://github.com/PaddlePaddle/Paddle/pull/35680))
-    - 优化流水线并行 + recompute 时 checkpoint send/recv 的次数。([#34248](https://github.com/PaddlePaddle/Paddle/pull/34248))
+- 添加`paddle.amp.decorate`与`paddle.DataParallel`同时使用时调用顺序的检查。([#38785](https://github.com/PaddlePaddle/Paddle/pull/38785)) 
 
+#### 分布式训练
+
+- 分布式训练基础功能
+  
+  - 优化 Fleet API 和 DistributedStrategy 配置以使用动态图并行功能，提升动态图易用性。([#40408](https://github.com/PaddlePaddle/Paddle/pull/40408))
+  
+  - 优化动态图混合并行 HybridParallelClipGrad 策略，支持4D混合并行 + Pure FP16 训练。([#36237](https://github.com/PaddlePaddle/Paddle/pull/36237), [#36555](https://github.com/PaddlePaddle/Paddle/pull/36555)) 
+  
+  - 重构动态图数据并行策略，以支持新动态图和新通信库功能。([#40389](https://github.com/PaddlePaddle/Paddle/pull/40389), [#40593](https://github.com/PaddlePaddle/Paddle/pull/40593), [#40836](https://github.com/PaddlePaddle/Paddle/pull/40836), [#41119](https://github.com/PaddlePaddle/Paddle/pull/41119), [#41413](https://github.com/PaddlePaddle/Paddle/pull/41413), [#39987](https://github.com/PaddlePaddle/Paddle/pull/39987))  
+  
+  - 为 fused_attention op 支持分布式张量模型并行。([#40101](https://github.com/PaddlePaddle/Paddle/pull/40101)) 
+  
+  - 为 fused_feedforward op 支持分布式张量模型并行。([#40160](https://github.com/PaddlePaddle/Paddle/pull/40160)) 
+
+- 图检索引擎
+  
+  - 优化图引擎的图采样接口返回的数据格式，采样速度提升3倍。([#37315](https://github.com/PaddlePaddle/Paddle/pull/37315)) 
+  
+  - 减少图引擎线程量以提升性能。([#37098](https://github.com/PaddlePaddle/Paddle/pull/37098)) 
+  
+  - 优化图引擎数据传输以提升性能。([#37341](https://github.com/PaddlePaddle/Paddle/pull/37341))
+  
+  - 利用模型中 embedding op 的拓扑关系，优化 embedding op 的合并逻辑以提升性能。[(#35942)](https://github.com/PaddlePaddle/Paddle/pull/35942) 
+
+- 通信库：重构通信库，提升通信库的易扩展性和二次开发性，支持异构通信。 ([#41398](https://github.com/PaddlePaddle/Paddle/pull/41398), [#39720](https://github.com/PaddlePaddle/Paddle/pull/39720), [#40911](https://github.com/PaddlePaddle/Paddle/pull/40911), [#40579](https://github.com/PaddlePaddle/Paddle/pull/40579), [#40629](https://github.com/PaddlePaddle/Paddle/pull/40629), [#40437](https://github.com/PaddlePaddle/Paddle/pull/40437), [#40430](https://github.com/PaddlePaddle/Paddle/pull/40430), [#40228](https://github.com/PaddlePaddle/Paddle/pull/40228), [#40181](https://github.com/PaddlePaddle/Paddle/pull/40181), [#40100](https://github.com/PaddlePaddle/Paddle/pull/40100), [#40097](https://github.com/PaddlePaddle/Paddle/pull/40097), [#39892](https://github.com/PaddlePaddle/Paddle/pull/39892), [#39384](https://github.com/PaddlePaddle/Paddle/pull/39384), [#39737](https://github.com/PaddlePaddle/Paddle/pull/39737), [#40040](https://github.com/PaddlePaddle/Paddle/pull/40040)) 
 
 #### 其他
+
 - 报错调试优化
-    - 统一第三方库报错信息机制，优化 ``CURAND、CUDNN、CUBLAS、CUSOLVER、NCCL`` 五种 CUDA API 的报错信息，使报错内容更加详细与规范。 ([#33003](https://github.com/PaddlePaddle/Paddle/pull/33003), [#33743](https://github.com/PaddlePaddle/Paddle/pull/33743))
-    - 优化 avx 与 no_avx 相关的安装报错信息，简化冗余复杂内容。 ([#33818](https://github.com/PaddlePaddle/Paddle/pull/33818)) 
-    - 优化 ``paddle.nn.functional.gather_tree``，``paddle.nn.Transformer``，``paddle.nn.TransformerDecoderLayer``，``paddle.nn.TransformerEncoderLayer``，``paddle.nn.MultiHeadAttention`` 报错信息。 ([#34322](https://github.com/PaddlePaddle/Paddle/pull/34322), [#33859](https://github.com/PaddlePaddle/Paddle/pull/33859))
-    - 支持在动态图下配置 ``FLAGS_check_nan_inf``环境变量， 用于模型 ``nan`` 和 ``inf`` 的运行时检查与定位。 ([#32635](https://github.com/PaddlePaddle/Paddle/pull/32635))
-    - 移除 Signal 类报错信息中由于捕获 Signal 引入的栈信息，避免误导用户。([#34842 ](https://github.com/PaddlePaddle/Paddle/pull/34842))
-    - 修复 ``elementwise`` 类算子在输入x或y为空 Tensor 时的报错信息。 ([#33928](https://github.com/PaddlePaddle/Paddle/pull/33928))
-
-- 模型保存与载入
-	- 修正 ``paddle.jit.save``  接口和模型裁剪的逻辑，不再为输出变量增加一个关联的 ``scale_op``，可以正确导出含有 ``bool``，``float16`` 类型输出的模型。([#35730](https://github.com/PaddlePaddle/Paddle/pull/35730), [#36132](https://github.com/PaddlePaddle/Paddle/pull/36132))
-- 自定义OP
-	- 移除 ``paddle::Tensor`` 的 ``copy`` 方法中不必要的 ``cudaStreamSynchronize`` 操作，以提升性能。([#35802](https://github.com/PaddlePaddle/Paddle/pull/35802))
-- 新增C++对GeneratePass开发注册的支持，开发方式与Python侧对齐。([#36302](https://github.com/PaddlePaddle/Paddle/pull/36302))
-- 自动稀疏化训练(Automic SParsity)
-	- 新增`paddle.static.sparsity`，支持生成`n:m`稀疏模式的稀疏参数，目前只支持静态图ASP训练。A100上FP32、FP16分别设置`1:2`、`2:4`的稀疏模式，训练保存的稀疏模型，可通过调用TensorRT 8利用Ampere架构的稀疏Tensor Core加速推理任务。当前版本共提供了5个API：([#32995](https://github.com/PaddlePaddle/Paddle/pull/32995)、[#33132](https://github.com/PaddlePaddle/Paddle/pull/33132)、[#33558](https://github.com/PaddlePaddle/Paddle/pull/33558)、[#36525](https://github.com/PaddlePaddle/Paddle/pull/36525))
-	- `paddle.static.sparsity.calculate_density`，计算输入Tensor的密度。
-	- `paddle.static.sparsity.decorate`，将给定的优化器包装为`OptimizerWithSparsityGuarantee`，在调用 `optimizer.minimize()`时自动为ASP工作流插入必要的操作。
-	- `paddle.static.sparsity.prune_model`，依据`mask_algo`指定的掩码生成函数裁剪`main_program`中支持的层的参数。
-	- `paddle.static.sparsity.set_excluded_layers`，设置不会被裁剪的层的参数名称。
-	- `paddle.static.sparsity.reset_excluded_layers`，重置与`main_program`相对应的`excluded_layers`设置。
+  
+  - 优化 cross_entropy op 对 `label` 的边界检查报错信息。([#40001](https://github.com/PaddlePaddle/Paddle/pull/40001)) 
+  
+  - 为动态图添加 op 执行时`infer_shape`和`compute`方法的 profile record，用于在 timeline 中展示其开销。([#39023](https://github.com/PaddlePaddle/Paddle/pull/39023)) 
+  
+  - 替换了 Windows 下容易出现未知异常的 `pybind::index_error` 报错提示。([#40538](https://github.com/PaddlePaddle/Paddle/pull/40538)) 
+  
+  - 添加用户 scatter op 越界检查的报错信息。([#37429](https://github.com/PaddlePaddle/Paddle/pull/37429))
 
+- 下载工具：针对`paddle.utils.download.get_path_from_url`中解压含多文件目录速度慢的问题，将原先循环遍历目录下文件逐一解压的方式替换为在目录上调用 extractall 一次解压的方式，解压速度大幅提升。([#37311](https://github.com/PaddlePaddle/Paddle/pull/37311))
 
+- 加速 `fake_quantize_range_abs_max`、`fake_quantize_abs_max`、`fake_quantize_dequantize_abs_max`、 `fake_quantize_moving_average_abs_max` 等量化训练。([#40491](https://github.com/PaddlePaddle/Paddle/pull/40491))
 
 ### （3）性能优化
 
-#### 分布式训练-静态图混合并行
+#### 分布式训练
+
+- 混合并行优化器 sharding 支持 optimize_cast 优化，将前反向参数 cast 移到优化器阶段，性能提升7%。([#35878](https://github.com/PaddlePaddle/Paddle/pull/35878)) 
+
+- GPUPS 优化：支持梯度 fuse allreduce 训练，训练提升20%。 ([#35131](https://github.com/PaddlePaddle/Paddle/pull/35131))
+
+- GPUPS 优化：dump CPU 优化提速3.21倍。 ([#40068](https://github.com/PaddlePaddle/Paddle/pull/40068)) 
 
-- 优化模型并行 + AMP 时 AMP 的灰名单列表，支持模型并行算子，性能提升8%。([#33660](https://github.com/PaddlePaddle/Paddle/pull/33660))
-- 优化流水线并行时反向梯度累加的 `device` 属性设置，性能提升1-3%。([#33946](https://github.com/PaddlePaddle/Paddle/pull/33946))
-- 优化流水线并行执行器中 debug 的部分，性能提升60-140%。 ([#33948](https://gifothub.com/PaddlePaddle/Paddle/pull/33948))
-- 支持流水线并行下 `Program` cache的功能，性能提升10-40%。([#33998](https://github.com/PaddlePaddle/Paddle/pull/33998), [#33954](https://github.com/PaddlePaddle/Paddle/pull/33954))
-- 优化流水线并行 `send` 的通信等待，性能提升0.3-2%。([#34086](https://github.com/PaddlePaddle/Paddle/pull/34086)) 
-- 优化模型并行 + 流水线并行时 `send/recv`发送数据量的大小，8机测试性能提升36%。([#34110](https://github.com/PaddlePaddle/Paddle/pull/34110))
-- 优化混合并行 + AMP时参数的 cast，通过`optimize_cast`控制，性能可提升5-7%。([#34965](https://github.com/PaddlePaddle/Paddle/pull/34965))
-- 优化流水线并行 + recompute + amp 时的性能，性能提升13%。([#34519](https://github.com/PaddlePaddle/Paddle/pull/34519))
-- 支持流水线并行 + 数据并行时使用``float16``通信，通过`distributed_strategy.fp16_allreduce`控制，性能可提升13%。([#34762](https://github.com/PaddlePaddle/Paddle/pull/34762))
+- CPU 参数服务器流式训练优化：支持稀疏参数统计量自动统计、稀疏参数增量保存等功能，训练性能提升20%。([#36465](https://github.com/PaddlePaddle/Paddle/pull/36465), [#36601](https://github.com/PaddlePaddle/Paddle/pull/36601), [#36734](https://github.com/PaddlePaddle/Paddle/pull/36734), [#36909](https://github.com/PaddlePaddle/Paddle/pull/36909), [#36943](https://github.com/PaddlePaddle/Paddle/pull/36943), [#37181](https://github.com/PaddlePaddle/Paddle/pull/37181), [#37194](https://github.com/PaddlePaddle/Paddle/pull/37194), [#37515](https://github.com/PaddlePaddle/Paddle/pull/37515), [#37626](https://github.com/PaddlePaddle/Paddle/pull/37626), [#37995](https://github.com/PaddlePaddle/Paddle/pull/37995), [#38582](https://github.com/PaddlePaddle/Paddle/pull/38582), [#39250](https://github.com/PaddlePaddle/Paddle/pull/39250), [#40762](https://github.com/PaddlePaddle/Paddle/pull/40762), [#41234](https://github.com/PaddlePaddle/Paddle/pull/41234), [#41320](https://github.com/PaddlePaddle/Paddle/pull/41320), [#41400](https://github.com/PaddlePaddle/Paddle/pull/41400)) 
 
 #### 算子优化
 
-- 设计并实现了通用的Reduce CUDA算法，应用于7个Reduce算子，加速1.0x ~ 22.7x。([#32697](https://github.com/PaddlePaddle/Paddle/pull/32697), [#32974](https://github.com/PaddlePaddle/Paddle/pull/32974), [#33267](https://github.com/PaddlePaddle/Paddle/pull/33267), [#32885](https://github.com/PaddlePaddle/Paddle/pull/32885), [#33144](https://github.com/PaddlePaddle/Paddle/pull/33144),  [#33761](https://github.com/PaddlePaddle/Paddle/pull/33761), [#33901](https://github.com/PaddlePaddle/Paddle/pull/33901), [#34143](https://github.com/PaddlePaddle/Paddle/pull/34143),  [#34436](https://github.com/PaddlePaddle/Paddle/pull/34436))
-- 设计并实现了通用的Elementwise和Broadcast CUDA算法。([#32512](https://github.com/PaddlePaddle/Paddle/pull/32512), [#32928](https://github.com/PaddlePaddle/Paddle/pull/32928), [#33976](https://github.com/PaddlePaddle/Paddle/pull/33976), [#32148](https://github.com/PaddlePaddle/Paddle/pull/32148), [#32414](https://github.com/PaddlePaddle/Paddle/pull/32414))：应用于41个一元、激活算子。([#32348](https://github.com/PaddlePaddle/Paddle/pull/32348), [#32622](https://github.com/PaddlePaddle/Paddle/pull/32622), [#32823](https://github.com/PaddlePaddle/Paddle/pull/32823))，性能提升1.1x ~ 1.4x；应用于19个二元（9个基础计算类、6个比较类、4个逻辑类）算子。([#33050](https://github.com/PaddlePaddle/Paddle/pull/33050), [33052](https://github.com/PaddlePaddle/Paddle/pull/33052), [#33053](https://github.com/PaddlePaddle/Paddle/pull/33053), [#33051](https://github.com/PaddlePaddle/Paddle/pull/33051), [#33089](https://github.com/PaddlePaddle/Paddle/pull/33089))，性能提升1.02x ~ 3.21x。
-- 优化``roll``算子CUDA实现 ，单维度、多维度输入时，性能分别提升10%、50%以上。([#32880](https://github.com/PaddlePaddle/Paddle/pull/32880))
-- 优化``roll``算子index计算，单维度、多维度性能分别提升15%和70%。([#33909](https://github.com/PaddlePaddle/Paddle/pull/33909))
-- 优化`update_loss_scaling_op`算子CUDA实现，性能提升2.06x。([#32554](https://github.com/PaddlePaddle/Paddle/pull/32554))
-- 优化 ``softmax_with_cross_entropy (hard label)`` GPU 算子性能，加速比1.0x ~ 10.0x。([#35660](https://github.com/PaddlePaddle/Paddle/pull/35660))
-- 优化``index_select``前、反向算子的CPU实现，加速比达到2.09x~9.34x。([#32863](https://github.com/PaddlePaddle/Paddle/pull/32863),  [#32955](https://github.com/PaddlePaddle/Paddle/pull/32955))
-- 优化``batch_norm``算子二维输入情况下的CPU实现，提升达到22.68x~30.00x。([#34585](https://github.com/PaddlePaddle/Paddle/pull/34585))
-- 优化``batch_norm``算子在初始化方式及二维输入下的GPU性能，提升1.25x~25x。([#33851](https://github.com/PaddlePaddle/Paddle/pull/33851), [#33887](https://github.com/PaddlePaddle/Paddle/pull/33887))
-- ``log_softmax``算子性能优化及该相关bug修复，优化后较优化前kernel性能对比4.22x~32.29x。 ([#31630](https://github.com/PaddlePaddle/Paddle/pull/31630), [#32180](https://github.com/PaddlePaddle/Paddle/pull/32180), [#32396](https://github.com/PaddlePaddle/Paddle/pull/32396), [#32937](https://github.com/PaddlePaddle/Paddle/pull/32937))
-- 优化``concat_and_split``算子，解决动态图在海光DCU芯片上训练BERT时计算和通信无法overlap的问题，在海光DCU芯片上BERT分布式训练性能提升约27%。([#33982](https://github.com/PaddlePaddle/Paddle/pull/33982))
-- 优化``fused_elemwise_act``算子，MB计算规模下有十余倍性能提升。([#33480](https://github.com/PaddlePaddle/Paddle/pull/33480))
-	
-#### 策略优化
-
-- 增加``build_strategy.fix_op_run_order``策略，固定op执行的次序，ResNet模型单机8卡速度提升1.8%。([#34427](https://github.com/PaddlePaddle/Paddle/pull/34427))
-- 动态图反向计算支持并自动开启部分算子inplace策略，动态图gpt模型pure float16训练性能提升4.8%。 ([#35412](https://github.com/PaddlePaddle/Paddle/pull/35412))
-- 优化动态图性能，将只在静态图执行的逻辑从动态图的执行路径中剥离。([#34024](https://github.com/PaddlePaddle/Paddle/pull/34024))
-- IR Pass优化能力作为通用能力露出，同时支持单机和分布式优化。在GPT混合并行场景性能提升3%-5%。([#34955](https://github.com/PaddlePaddle/Paddle/pull/34955), [#35704](https://github.com/PaddlePaddle/Paddle/pull/35704), [#34730](https://github.com/PaddlePaddle/Paddle/pull/34730), [#34524](https://github.com/PaddlePaddle/Paddle/pull/34524))
-- 优化 ctc loss grad 计算速度，提速~3x，但相应增加了GPU显存占用。([#34729](https://github.com/PaddlePadle/Paddle/pull/34729))
-- transformer encoder 性能优化
-	- 优化思路：通过新增 `paddle.incubate.nn.FusedMultiHeadAttention` 和 `paddle.incubate.nn.FusedFeedForward` 的方式，在实现中采用 q, k, v gemm融合及多种kernel融合优化技术，提升transformer encoder的性能。
-		- FusedAttention
-			- 新增 `paddle.incubate.nn.functional.fused_multi_head_attention` ，支持multi-head attention的融合计算。([#35905](https://github.com/PaddlePaddle/Paddle/pull/35905) [35903](https://github.com/PaddlePaddle/Paddle/pull/35903) [#36803](https://github.com/PaddlePaddle/Paddle/pull/36803) [#36793](https://github.com/PaddlePaddle/Paddle/pull/36793) [36185](https://github.com/PaddlePaddle/Paddle/pull/36185))
-			- 新增 `paddle.incubate.nn.FusedMultiHeadAttention` ，用于融合multi-head attention的layer层组网。 ([#36498](https://github.com/PaddlePaddle/Paddle/pull/36498) )
-			- 该模块使用q, k, v gemm融合和bias add + dropout + residual add + layer_norm kernel融合优化技术，可带来1.08x-1.45x加速。
-		
-		- FusedFeedForward
-			- 新增 `paddle.incubate.nn.functional.fused_feedforward` ，支持 feedforward的融合计算。([#36729](https://github.com/PaddlePaddle/Paddle/pull/36729) [#36730](https://github.com/PaddlePaddle/Paddle/pull/36730))
-			- 新增 `paddle.incubate.nn.FusedFeedForward` ，用于融合feedforward的layer层组网。 ([#36776](https://github.com/PaddlePaddle/Paddle/pull/36776))
-			- 性能较优化前有1.04x~1.22x左右的提升。
-		- 新增 `paddle.incubate.nn.FusedTransformerEncoderLayer`，支持使用融合multi-head attention和融合feedforward计算的layer层组网。 ([#36776](https://github.com/PaddlePaddle/Paddle/pull/36776))
+- 优化 `FasterTokenizer` 性能，性能与优化前相比提升10%。 ([#36701](https://github.com/PaddlePaddle/Paddle/pull/36701)) 
+
+- 优化 `index_select` 反向计算，性能较优化前有3.7~25.2倍提升。([#37055](https://github.com/PaddlePaddle/Paddle/pull/37055)) 
+
+- 优化 `paddle.nn.ClipByGlobalNorm` 的性能，以10*10的 `paddle.nn.Linear` 为例，性能与优化前相比提升30%左右。 ([#38209](https://github.com/PaddlePaddle/Paddle/pull/38209)) 
+
+- 优化 `pnorm` 在 `axis` 维度极大或极小情况下的性能，前向速度提升31~96倍，反向速度提升1.1~19倍。([#37685](https://github.com/PaddlePaddle/Paddle/pull/37685), [#38215](https://github.com/PaddlePaddle/Paddle/pull/38215), [#39011](https://github.com/PaddlePaddle/Paddle/pull/39011)) 
+
+- 优化 `softmax` 前、反向性能，对于 `axis!=-1` 的配置加速比为2倍左右。([#38602](https://github.com/PaddlePaddle/Paddle/pull/38602), [#38609](https://github.com/PaddlePaddle/Paddle/pull/38609), [#32387](https://github.com/PaddlePaddle/Paddle/pull/32387), [#37927](https://github.com/PaddlePaddle/Paddle/pull/37927/files))
+
+- 优化 `log_softmax` 前、反向性能，对于 `axis!=-1`的配置加速比为6~20倍左右。([#38992](https://github.com/PaddlePaddle/Paddle/pull/38992), [#40612](https://github.com/PaddlePaddle/Paddle/pull/40612)) 
+
+- 优化 `softmax_with_cross_entropy` 前、反向性能，对于 `hard_label` 的配置加速比为1.3倍左右。([#39553](https://github.com/PaddlePaddle/Paddle/pull/39553), [#40424](https://github.com/PaddlePaddle/Paddle/pull/40424), [#40643](https://github.com/PaddlePaddle/Paddle/pull/40643)) 
+
+- 优化 `top_k` 性能，对于一维且 `k` 较大时(k=5000)的配置加速比为22倍以上。([#40941](https://github.com/PaddlePaddle/Paddle/pull/40941)) 
+
+- 优化 `elementwise_mul` 反向计算，较优化前有1.85~12.16倍性能提升。([#37728](https://github.com/PaddlePaddle/Paddle/pull/37728)) 
+
+- 优化 `elementwise_min` 反向和 `elementwise_max` 反向，较优化前打平或有1.05~18.75倍性能提升。([#38236](https://github.com/PaddlePaddle/Paddle/pull/38236), [#37906](https://github.com/PaddlePaddle/Paddle/pull/37906)) 
+
+- 优化 `nearest_interp` 前向和反向计算，前向较优化前性能有1.5~2.3倍提升；反向性能较优化前有60%~1.8倍提升。([#38528](https://github.com/PaddlePaddle/Paddle/pull/38528), [#39067](https://github.com/PaddlePaddle/Paddle/pull/39067)) 
+
+- 优化 `bilinear_interp` 前向和反向计算，前向较优化前性能有0.4~2.3倍提升；反向性能较优化前有10%~30%提升。([#39243](https://github.com/PaddlePaddle/Paddle/pull/39243), [#39423](https://github.com/PaddlePaddle/Paddle/pull/39423)) 
 
+- 优化 `dropout` 前向和反向计算，性能提升约20%。([#39795](https://github.com/PaddlePaddle/Paddle/pull/39795), [#38859](https://github.com/PaddlePaddle/Paddle/pull/38859), [#38279](https://github.com/PaddlePaddle/Paddle/pull/38279), [#40053](https://github.com/PaddlePaddle/Paddle/pull/40053))  
+
+- 优化 `grid_sampler`前向和反向计算，前向较优化前性能有10%~30%提升；反向性能较优化前有10%~60%提升。([#39751](https://github.com/PaddlePaddle/Paddle/pull/39751)) 
+
+- 优化 `group_norm` 前向和反向计算，前向性能提升1.04~2.35倍，反向性能提升1.12~1.18倍。([#39944](https://github.com/PaddlePaddle/Paddle/pull/39944), [#40657](https://github.com/PaddlePaddle/Paddle/pull/40657), [#39596](https://github.com/PaddlePaddle/Paddle/pull/39596))
+
+- 优化 `conv1d` 前向和反向计算，前向性能提升1.00~2.01倍，反向性能提升1.01~474.56倍。([#38425](https://github.com/PaddlePaddle/Paddle/pull/38425)) 
+
+- 优化 `elementwise_div` 反向计算，反向性能提升1.02~29.25倍。([#38044](https://github.com/PaddlePaddle/Paddle/pull/38044)) 
+
+- 优化 `gelu` 前向和反向计算，前向性能提升1.13~1.43倍，反向性能提升1.10～1.55倍。([#38188](https://github.com/PaddlePaddle/Paddle/pull/38188), [#38263](https://github.com/PaddlePaddle/Paddle/pull/38263)) 
+
+- 优化 `elementwise_sub` 反向计算，反向性能提升1.04~15.64倍。([#37754](https://github.com/PaddlePaddle/Paddle/pull/37754)) 
+
+- 优化 `flip` 在输入一维数据时前向性能，性能提升100%。([#37825](https://github.com/PaddlePaddle/Paddle/pull/37825))
+
+- 优化 `layer_norm` 前向和反向计算，前向较优化前提升2-5倍，反向较优化前提升20%~50%。([#39167](https://github.com/PaddlePaddle/Paddle/pull/39167), [#39247](https://github.com/PaddlePaddle/Paddle/pull/39247))
+
+- 优化 `embedding` 前向和反向计算，前向较优化前最大提升1.51倍，反向较优化前提升1.03~7.79倍。([#39856](https://github.com/PaddlePaddle/Paddle/pull/39856), [#39886](https://github.com/PaddlePaddle/Paddle/pull/398866))
+
+- 优化 `gelu` FP16 前向和反向计算，前向较优化前提升9%~12%，反向较优化前提升2%~9%。([#38980](https://github.com/PaddlePaddle/Paddle/pull/38980))
+
+- 移除 `gather_nd`前反向算子中的 CPU -> GPU 显式数据传输操作，移除 `index_select` 前反向算子中的显式同步操作，将 `scatter_nd` 中的 GPU -> GPU 数据传输由同步操作改成异步操作。([#40933](https://github.com/PaddlePaddle/Paddle/pull/40933)) 
+
+- 优化 `Lars optimzier` 计算，优化后 Resnet50 PF16 模型训练性能较优化前提升5.1%。 ([#35652](https://github.com/PaddlePaddle/Paddle/pull/35652), [#35476](https://github.com/PaddlePaddle/Paddle/pull/35476)) 
+
+- 优化 `AvgPool2dGrad` 计算，优化后性能较优化前提升2.6倍。 ([#35389](https://github.com/PaddlePaddle/Paddle/pull/35389)) 
+
+- 优化 `Elementwise` 类计算对于多元输出的功能支持，优化后计算性能较优化前提升最多可达15% 。（[#38329](https://github.com/PaddlePaddle/Paddle/pull/38329), [#38410](https://github.com/PaddlePaddle/Paddle/pull/38410)） 
+
+- 优化 `Categorical`的 `probs`计算，简化计算逻辑，性能提升 4 ~ 5 倍。([#42178](https://github.com/PaddlePaddle/Paddle/pull/42178)) 
 
 ### （4）问题修复
 
 #### API
 
--  优化`depthwise_conv` 数值稳定性。 ([#35161](https://github.com/PaddlePaddle/Paddle/pull/35161))
-- 添加参数创建时的形状检查，以保证参数每个轴的 `size` 都大于 0 。([#33265](https://github.com/PaddlePaddle/Paddle/pull/33265))
-- 优化``paddle.nn.LayerNorm``的计算，并修复数据溢出相关bug。([#34432](https://github.com/PaddlePaddle/Paddle/pull/34432), [#33658](https://github.com/PaddlePaddle/Paddle/pull/33658))
-- 支持Windows应用场景，将PaddlePaddle 框架能力集成到 MFC/QT/C# 等桌面端软件环境中，修复进程嵌套导致系统崩溃问题。([#34312](https://github.com/PaddlePaddle/Paddle/pull/34312))
-- 修复Reduce 数据初始化导致NLP 模型loss有误的问题。([#34941](https://github.com/PaddlePaddle/Paddle/pull/34941))
-- 修复``paddle.nn.LayerNorm``在`batch_size=1`时候的bug问题。([#35480](https://github.com/PaddlePaddle/Paddle/pull/35480))
-- 修复``paddle.static.nn.group_norm``在空输入下不能正确捕获错误的问题。([#35613](https://github.com/PaddlePaddle/Paddle/pull/35613))
-- 修复``paddle.nn.functional.batch_norm``在 `is_test=True` 的情况下mean/variance为空的问题。([#35328](https://github.com/PaddlePaddle/Paddle/pull/35328))
-- 修复``paddle.nn.functional.instance_norm``和``paddle.nn.functional.batch_norm``输入为空时，访问越界的问题。([#35341](https://github.com/PaddlePaddle/Paddle/pull/35341), [#34107](https://github.com/PaddlePaddle/Paddle/pull/34107))
-- 修复量化模型不统计``paddle.nn.LayerNorm`` 的输出的问题。([#33610](https://github.com/PaddlePaddle/Paddle/pull/33610))
-- 修复``paddle.nn.SyncBatchNorm.convert_sync_batchnorm()``不支持1D/3D的问题 。([#32989](https://github.com/PaddlePaddle/Paddle/pull/32989))
-- 修复``paddle.nn.BatchNorm1D, paddle.nn.BatchNorm2D, paddle.nn.BatchNorm3D``在`is_test=True`的情况下无法添加反向的问题。([#32678](https://github.com/PaddlePaddle/Paddle/pull/32678))
-- 修复`Tensor.cuda`不支持`device_id`为`None`的问题。 ([#34416](https://github.com/PaddlePaddle/Paddle/pull/34416))
-- 修复``paddle.to_tensor``不支持 `Tensor.dtype, core.Tensor`等内置类型的问题。 ([#31931](https://github.com/PaddlePaddle/Paddle/pull/31931), [#33430](https://github.com/PaddlePaddle/Paddle/pull/33430))
-- 修复`paddle.nn.functional.log_softmax`不支持输入维度为0的问题。([#34635](https://github.com/PaddlePaddle/Paddle/pull/34635))
-- 修复``paddle.nn.GroupNorm`` 在float32下CPU计算结果和准确值的相对误差大于1e-5的问题。([#33176](https://github.com/PaddlePaddle/Paddle/pull/33176))
-- 修复``paddle.trace`` 在参数 `offset` 超出维度大小时返回结果不为0的问题，在参数 `axis1`和`axis2` 输入不合法值时的栈溢出问题。([#33922](https://github.com/PaddlePaddle/Paddle/pull/33922), [#35419](https://github.com/PaddlePaddle/Paddle/pull/35419))
-- 修复``paddle.sum``输入参数为bool类型时，输出类型不为int的问题。输入参数类型和输出参数类型不一致且 axis 轴对应的reduce元素个数为1时，输出类型错误问题。([#34313](https://github.com/PaddlePaddle/Paddle/pull/34313), [#36123](https://github.com/PaddlePaddle/Paddle/pull/36123))
-- 修复 ``paddle.nn.conv2d/conv3d/conv2d_transpose/conv3d_transpose``非法输入时除0错误和数组越界的问题。([#35337](https://github.com/PaddlePaddle/Paddle/pull/35337))
-- 修复``paddle.nn.conv2d_transpose``非法输入时堆缓冲区溢出的问题。([#35340](https://github.com/PaddlePaddle/Paddle/pull/35340))
-- 修复 ``paddle.bmm`` 写空地址导致程序运行时崩溃的问题。([#35098](https://github.com/PaddlePaddle/Paddle/pull/35098))
-- 修复 ``cast`` 算子无法支持 Tensor 从int16 转换到float32的问题。([#35156](https://github.com/PaddlePaddle/Paddle/pull/35156))
--  修复 `assign` 不支持float16和uint8的问题。([#35153](https://github.com/PaddlePaddle/Paddle/pull/35153))
--  修复 `concat` 在输入大shape tensor时，容易溢出的问题。([#34319](https://github.com/PaddlePaddle/Paddle/pull/34319))
-- 修复动态图 `concat` 不支持空tensor作为输入的问题。([#35845](https://github.com/PaddlePaddle/Paddle/pull/35845))
-- 修复 ``paddle.where``不支持broadcast的问题。([#35092](https://github.com/PaddlePaddle/Paddle/pull/35092))
-- 修复 ``paddle.reshape`` 空tensor 时输入合法性未检查问题。([#35642](https://github.com/PaddlePaddle/Paddle/pull/35642))
-- 修复 ``layernorm`` 算子在大shape下cuda kernel配错错误问题。 ( [#33748](https://github.com/PaddlePaddle/Paddle/pull/33748))
-- 修复 ``random``类算子静态图下stop_gradient属性设置错误问题。( [#33959](https://github.com/PaddlePaddle/Paddle/pull/33959))
-- 修复 ``split`` 算子输入为空tensor的错误行为。([#334356](https://github.com/PaddlePaddle/Paddle/pull/334356))
-- 修复 tensor 的 slice 左值赋值显存泄漏问题。([#35013](https://github.com/PaddlePaddle/Paddle/pull/35013))
-- 修复动态图Layer无法被cloudpickle dump和load的问题。([#35538](https://github.com/PaddlePaddle/Paddle/pull/35538))
-- 修复simple_rnn_cell, gru_cell和lstm_cell API 非法参数设置导致除零错误问题。([#34627](https://github.com/PaddlePaddle/Paddle/pull/34627))
-- 修复``paddle.nn.functional.linear``在非法输入时空指针解引用的问题。([#34696](https://github.com/PaddlePaddle/Paddle/pull/34696))
-- 修复``paddle.strided_slice``,``paddle.transpose``存在内存越界问题。([#35062](https://github.com/PaddlePaddle/Paddle/pull/35062), [#35079](https://github.com/PaddlePaddle/Paddle/pull/35079))
-- 修复``roll``算子非法输入时除0错误的问题。([#34499](https://github.com/PaddlePaddle/Paddle/pull/34499))
-- 修复``gather``算子非法输入时的数组越界问题。([#34096](https://github.com/PaddlePaddle/Paddle/pull/34096), [#34138](https://github.com/PaddlePaddle/Paddle/pull/34138), [#34200](https://github.com/PaddlePaddle/Paddle/pull/34200))
-- 修复``prelu``，``softlax``算子非法输入时除0错误的问题。([#34499](https://github.com/PaddlePaddle/Paddle/pull/34499))
-- 修复``split``算子未对输入参数做合法性检查问题。([#34630](https://github.com/PaddlePaddle/Paddle/pull/34630))
-- 修复``memcpy``算子无法支持海光DCU芯片的问题。([#35394](https://github.com/PaddlePaddle/Paddle/pull/35394))
-- 修复``slice``算子在`batch_size=1`下训练会报错问题。([#34265](https://github.com/PaddlePaddle/Paddle/pull/34265))
-- 修复``reduce_sum``算子在 AMP 下容易溢出问题。([#33960](https://github.com/PaddlePaddle/Paddle/pull/33960))
-- 修复ANSI转义代码在windows下显示错乱问题。([#33689](https://github.com/PaddlePaddle/Paddle/pull/33689))
-- 修复``paddle.hub``解析文件名字和下载保存文件不一致问题。([#33214](https://github.com/PaddlePaddle/Paddle/pull/33214))
-- 修复``matmul``, ``diag_embed``, `` auc ``算子输入空tensor时内存泄露问题。 ([#34978](https://github.com/PaddlePaddle/Paddle/pull/34978))
-- 修复 ``paddle.less_equal, paddle.less_than, paddle.greater_equal, paddle.greater_than`` 计算broadcast计算精度误差大的BUG。([#32941](https://github.com/PaddlePaddle/Paddle/pull/32941))
-- 修复 ``interpolate`` 算子在大输入shape下的崩溃问题。([#35577](https://github.com/PaddlePaddle/Paddle/pull/35577))
-- 修复 ``interpolate``, ``unfold``, ``spectral_norm`` 算子输入为空tensor的合法性检查问题。([#33941](https://github.com/PaddlePaddle/Paddle/pull/33941), [#34943](https://github.com/PaddlePaddle/Paddle/pull/34943), [#35005](https://github.com/PaddlePaddle/Paddle/pull/35005))
-- 修复`paddle.flops`在计算输出的FLOPs可能出现负号（整数溢出）的问题。([#33576](https://github.com/PaddlePaddle/Paddle/pull/33576))
-- 修复``paddle.summary``遇到返回值含非Tensor元素的层时报错的问题。([#34160](https://github.com/PaddlePaddle/Paddle/pull/34160))
-- 修复``pool``算子非法输入时计算输出shape错误的问题。([#35106](https://github.com/PaddlePaddle/Paddle/pull/35106))
-- 修复 ``unfold, dice_loss, reshape``算子输入shape的合法性检查问题。([#34673](https://github.com/PaddlePaddle/Paddle/pull/34673), [#34757](https://github.com/PaddlePaddle/Paddle/pull/34757), [#35016](https://github.com/PaddlePaddle/Paddle/pull/35016))
-- 修复``unique, unstack``算子输入zero tensor的问题。([#36021](https://github.com/PaddlePaddle/Paddle/pull/36021))
-- 修复stack算子的反向输入为空时的问题。([#362877](https://github.com/PaddlePaddle/Paddle/pull/32877))
-- 修复 ``paddle.inverse``在输入Tensor的形状为`[0, 0, 0]`时，CPU执行会出现除0错误的问题。([#34996](https://github.com/PaddlePaddle/Paddle/pull/34996))
-- 修复``paddle.nn.functional.grid_sample``在特殊输入情况下报出的CUDA错误。([#33100](https://github.com/PaddlePaddle/Paddle/pull/33100))
-- 修复``paddle.flatten``在静态图特殊输入情况下编译期计算维度错误的问题。([#35321](https://github.com/PaddlePaddle/Paddle/pull/35321))
-- 修复``paddle.nn.conv2d/conv3d/conv2d\_transpose/conv3d\_transpose``计算输出shape时编译期检查报错的问题。([#35693](https://github.com/PaddlePaddle/Paddle/pull/35693))
-- 修复``paddle.data.flowers``在多卡训练情况下容易出现数据读取错误的问题。([#33738](https://github.com/PaddlePaddle/Paddle/pull/33738))
-- 修复pact量化se模块时loss为nan的问题。([#35392](https://github.com/PaddlePaddle/Paddle/pull/35392))
-- 修复量化`flatten_contiguous_range`报错的问题。([35410](https://github.com/PaddlePaddle/Paddle/pull/35410))
-- 修复动态图模式下pact量化的问题。([#35407](https://github.com/PaddlePaddle/Paddle/pull/35407))
-- 修复channel-wise量化bert报错的问题。([#34948](https://github.com/PaddlePaddle/Paddle/pull/34948))
-- 修复量化在参数全为0时的问题。([#34647](https://github.com/PaddlePaddle/Paddle/pull/34647))
-- 修复channel-wise量化在channel数为1时的bug。([#33753](https://github.com/PaddlePaddle/Paddle/pull/33753))
-- 修复动态图``@no_grad``线程不安全的问题。([#34649](https://github.com/PaddlePaddle/Paddle/pull/34649))
-- 修复``paddle.grad``接口在部分场景下会hang住的bug。([#34023](https://github.com/PaddlePaddle/Paddle/pull/34023))
-- 修复 ``paddle.masked_select``在静态图下形状推导的bug。([#33167](https://github.com/PaddlePaddle/Paddle/pull/33167))
-- 修复``paddle.slice``在部分场景下不支持`numpy.ndarray`类型索引的问题，以及`axes`参数为`tuple`类型时出错的问题。([#35748](https://github.com/PaddlePaddle/Paddle/pull/35748), [#35267](https://github.com/PaddlePaddle/Paddle/pull/35267))
-- 修复`set_value`反向梯度截断的问题。([#34304](https://github.com/PaddlePaddle/Paddle/pull/34304))
-- 修复``paddle.regularizer.L1Decay`` 在非inplace计算下的gradient重复设置问题。 ([32710](https://github.com/PaddlePaddle/Paddle/pull/32710))
-- 修复``adamw``参数分组时，学习率不生效问题。([#34468](https://github.com/PaddlePaddle/Paddle/pull/34468))
-- 优化卷积类API中非法``dilate``输入检查。([#35894](https://github.com/PaddlePaddle/Paddle/pull/35894))
-- 修复`paddle.io.DataLoader`迭代中途break报错问题。([#34501](https://github.com/PaddlePaddle/Paddle/pull/34501)) DataLoader内存泄漏问题。([#34140](https://github.com/PaddlePaddle/Paddle/pull/34140)) DataLoader误报warning信息。 ([#33712](https://github.com/PaddlePaddle/Paddle/pull/33712)) DataLoader子进程random state一致问题。([#33310](https://github.com/PaddlePaddle/Paddle/pull/33310))
-- 修复IterableDataset中drop_last不生效问题。([#34801](https://github.com/PaddlePaddle/Paddle/pull/34801))
-- 修复 ``paddle.optimizer.lr.LRScheduler`` 导致的 optimizer 状态恢复的问题。( [#33984](https://github.com/PaddlePaddle/Paddle/pull/33984))
-- 修复``gather``算子，在使用`axis`进行infershape的bug。([#33413](https://github.com/PaddlePaddle/Paddle/pull/33413))
-- 修复Executor中fetch_list类型为tuple时可能导致执行卡住的问题。([#35726](https://github.com/PaddlePaddle/Paddle/pull/35726))
-- 修复``paddle.nn.GroupNorm``除零错误，并添加channel可以被group整除检查。([#35644](https://github.com/PaddlePaddle/Paddle/pull/35644))
-- 修复tensor formatter中引用已释放内存的问题。([#35399](https://github.com/PaddlePddle/Paddle/pull/35399))
-- 修复Adam优化器在``float64``精度下 `beta` 参数精度损失的问题。([#33381](https://github.com/PaddlePaddle/Paddle/pull/33381))
-- 修复张量并行非切分参数初始化时未广播带来的精度对不齐问题。([#35326](https://github.com/PaddlePaddle/Paddle/pull/35326))
-- 迁移``paddle.static.accuracy`` API中的`topk`算子到`topk_v2`算子。([#35494](https://github.com/PaddlePaddle/Paddle/pull/35494))
-- 迁移``paddle.nn.dynamic_decode``中`expand`算子到`tile`算子，迁移`paddle.nn.BeamSearchDecoder`中`topk`算子到`topk_v2`算子。([#35656](https://github.com/PaddlePaddle/Paddle/pull/35656))
-- 迁移``paddle.nn.functional.dice_loss``API中的`one_hot`算子到`one_hot_v2`算子。([#35734](https://github.com/PaddlePaddle/Paddle/pull/35734))
-- 修复 ``paddle.summary`` 静态图模式下使用 bug。([#35303](https://github.com/PaddlePaddle/Paddle/pull/35303))
-- 修复 ``paddle.Model.prepare`` 静态图模式下多卡启动的 bug。([#34311](https://github.com/PaddlePaddle/Paddle/pull/34311))
-- 修复`paddle.nn.functional.cross_entropy` 给定`weight`，且指定`axis`为除-1外的其他合法维度时会报错的问题。([#36647](https://github.com/PaddlePaddle/Paddle/pull/36647))
-- 修复`paddle.utils.dlpack.to_dlpack`无法编码多维 `Tensor` 的问题，修复其所生成的 DLPack 对象无法进行跨深度学习框架共享的问题。([#36177](https://github.com/PaddlePaddle/Paddle/pull/36177))
-- 修复使用`paddle.distribution.Categorical`的`sample`方法报错的问题，具体原因是multinomial op的cuda kernel中数组访问越界，该bug会导致访问超出数组下标的值，引起报错。 ([#36511](https://github.com/PaddlePaddle/Paddle/pull/36511))
-- 修复动态图`_BatchNormBase`基类中修改了 default_dtype，导致后续组网参数类型错误的问题，受影响的API有`paddle.nn.BatchNorm1D`，`paddle.nn.BatchNorm2D`，`paddle.nn.BatchNorm3D`，`paddle.nn.SyncBatchNorm`。具体原因是当 `get_default_dtype() == 'float16'` 时，通过 `set_default_dtype('float32')`修改默认参数数据类型，动态图组网的参数类型是通过 default_dtype 来创建的，因此当默认参数类型被修改后导致后续的组网参数类型错误。 ([#36376](https://github.com/PaddlePaddle/Paddle/pull/36376))
-- 修复`paddle.nn.functional.grid_sample`因特殊输入导致的异常问题。([#36625](https://github.com/PaddlePaddle/Paddle/pull/36625))
-- 修复 `paddle.fft.fft`, `paddle.fft.ifft`, `paddle.fft.rfft` , `paddle.fft.irfft`, `paddle.fft.hfft`, `paddle.fft.ihfft` 在输入 `axis=0` 情况下的计算错误问题。([#36537](https://github.com/PaddlePaddle/Paddle/pull/36537))
-- 修复 `paddle.fft.fftshift`  和 `paddle.fft.ifftshift` 在静态图下出错的问题。([#36537](https://github.com/PaddlePaddle/Paddle/pull/36537))
-- 修复 `paddle.fft.ifftshift` 计算结果不正确的问题。([#36835](https://github.com/PaddlePaddle/Paddle/pull/36835))
-- 修复`paddle.nn.functional.pad`在`replicate`模式下的报错信息提示。([#36531](https://github.com/PaddlePaddle/Paddle/pull/36531))
+- 修复 `paddle.sum` 输入参数类型和输出参数类型不一致且 `axis` 轴对应的 reduce 元素个数为1时，输出类型错误问题。([#36123](https://github.com/PaddlePaddle/Paddle/pull/36123)) 
+
+- 修复 `paddle.flops` 在 layer 输出类型为 tuple 时的 `AttributeError`。([#38850](https://github.com/PaddlePaddle/Paddle/pull/38850))
+
+- 修复 `paddle.diag` 因为没有反向 Kernel 而无法传播梯度的问题。([#40447](https://github.com/PaddlePaddle/Paddle/pull/40447)) 
+
+- 修复 `paddle.sort` 输入存在 NaN 值排序错误。 ([#41070](https://github.com/PaddlePaddle/Paddle/pull/41070)) 
+
+- 修复 `paddle.full_like` 输入存在 Inf 值构建 Tensor 错误。 ([#40232](https://github.com/PaddlePaddle/Paddle/pull/40232)) 
+
+- 修复 `paddle.strided_slice` 在输入 starts 中数据小于 -rank 时，strided_slice 结果与 slice 不一致的 bug。 ([#39066](https://github.com/PaddlePaddle/Paddle/pull/39066)) 
+
+- 修复 `max_pool` 系列算子在返回 index 时 infer_shape 计算错误的问题，受影响的 API 有 `paddle.nn.functional.max_pool1d/2d/3d`, `paddle.nn.functional.adaptive_max_pool1d/2d/3d`, `paddle.nn.MaxPool1D/2D/3D`, `paddle.nn.AdaptiveMaxPool1D/2D/3D`。([#40139](https://github.com/PaddlePaddle/Paddle/pull/40139))
+
+- 修复 `max_pool` 系列算子返回的 pooling_mask 的 dtype 错误的问题，现在 pooling_mask 的 dtype 为 int32，受影响的 API 有 `paddle.nn.functional.max_pool1d/2d/3d`, `paddle.nn.functional.adaptive_max_pool1d/2d/3d`, `paddle.nn.MaxPool1D/2D/3D`, `paddle.nn.AdaptiveMaxPool1D/2D/3D`。([#39314](https://github.com/PaddlePaddle/Paddle/pull/39314))
+
+- 修复 `paddle.shape` 默认存在反向梯度导致计算错误的问题。([#37340](https://github.com/PaddlePaddle/Paddle/pull/37340)) 
+
+- 修复 `paddle.nn.Layer` 的 `to` 方法同时转换 dtype 和 place 存在的 bug。([#37007](https://github.com/PaddlePaddle/Paddle/pull/38007)) 
+
+- 修复 `paddle.amp.decorate` 无法对非叶子网络层的参数改写为 FP16 的 bug。([#38402](https://github.com/PaddlePaddle/Paddle/pull/38402)) 
+
+- 修复 `paddle.amp.decorate` 将 `paddle.nn.BatchNorm1D`、`paddle.nn.BatchNorm2D`、`paddle.nn.BatchNorm3D` 非输入参数改写为 FP16 的 bug。([#38541](https://github.com/PaddlePaddle/Paddle/pull/38541)) 
+
+- 修复 `paddle.amp.decorate` 将 `paddle.nn.SyncBatchNorm` 非输入参数改写为 FP16 的 bug。([#40943](https://github.com/PaddlePaddle/Paddle/pull/40943)) 
+
+- 修复 `paddle.nn.Layer.to` 当中多余的 warning。([#36700](https://github.com/PaddlePaddle/Paddle/pull/36700))
+
+- 修复 `paddle.nn.RNN` 在控制流下使用报错的问题。([#41162](https://github.com/PaddlePaddle/Paddle/pull/41162)) 
+
+- 修复 `paddle.to_tensor` 无法指定 Tensor 的 CUDA Place 的问题。([#39662](https://github.com/PaddlePaddle/Paddle/pull/39662)) 
+
+- 修复 `paddle.nn.Identity` 没有公开的问题。([#39615](https://github.com/PaddlePaddle/Paddle/pull/39615))
+
+- 修复动态图重构后，`fill_` 和 `zero_` inplace API的输入在 CUDAPinned Place上时，输出值不正确的 bug。([#41229](https://github.com/PaddlePaddle/Paddle/pull/41229)) 
+
+- 动态图重构后，修复使用 append op 的方式调用 assign op 导致输出 Tensor 的 inplace version 值不正确的bug，修改为使用 `_C_ops` 的方式调用 assign op。([#41118](https://github.com/PaddlePaddle/Paddle/pull/41118)) 
+
+- 移除 `elementwise_add` 三阶 Kernel 中不合理的代码，修复组网过程未初始化问题。 ([#36618](https://github.com/PaddlePaddle/Paddle/pull/36618)) 
+
+- 修复 `conv2d` 执行 cuDNN Kernel 时属性缺失的问题。([#38827](https://github.com/PaddlePaddle/Paddle/pull/38827)) 
+
+- 修复 `multiclass_nms3` 输出 shape 不正确的问题。([#40059](https://github.com/PaddlePaddle/Paddle/pull/40059)) 
+
+- 修复 `yolo_box` 输出 shape 不正确的问题。([#40056](https://github.com/PaddlePaddle/Paddle/pull/40056)) 
+
+- 修复高阶微分 `gradients` 接口在指定 target_grad 时未按预期生效的问题。([#40940](https://github.com/PaddlePaddle/Paddle/pull/40940/)) 
+
+- 修复动态图 op`_BatchNormBase` 基类中修改了 default_dtype，导致后续组网参数类型错误的问题，受影响的API有 `paddle.nn.BatchNorm1D`，`paddle.nn.BatchNorm2D`，`paddle.nn.BatchNorm3D`，`paddle.nn.SyncBatchNorm`。具体原因是当 `get_default_dtype() == 'float16'` 时，通过 `set_default_dtype('float32')`修改默认参数数据类型，动态图组网的参数类型是通过 default_dtype 来创建的，因此当默认参数类型被修改后导致后续的组网参数类型错误。 ([#36376](https://github.com/PaddlePaddle/Paddle/pull/36376)) 
+
+- 修复 batchnorm op 中，当数据类型为 FP32 ，且数据维度 `dims = 2，data_layout = NHWC` 时，反向 op 内中间变量未定义问题。 ([#37020](https://github.com/PaddlePaddle/Paddle/pull/37020)) 
+
+- 修复静态图模式下，`paddle.static.nn.prelu` 对于 `NHWC` 输入格式且 `mode==channel` 权重的 shape 错误问题。([#38310](https://github.com/PaddlePaddle/Paddle/pull/38310)) 
+
+- 修复多机情况下，`paddle.nn.functional.class_center_sample` CUDA 种子设置 bug。([#38815](https://github.com/PaddlePaddle/Paddle/pull/38815)) 
+
+- 修复 `paddle.nn.functional.one_hot` 在输入不正确参数时，CUDA 版本无法正确报错的问题。([#41335](https://github.com/PaddlePaddle/Paddle/pull/41335)) 
+
+- 修复 DCU 设备上回收显存的 callback 未及时触发导致显存 OOM 的问题。([#40445](https://github.com/PaddlePaddle/Paddle/pull/40445)) 
+
+- 修复 `setitem` 索引赋值反向梯度传递异常以及动态图部分场景下 inplace 逻辑处理异常的问题。 ([#37023](https://github.com/PaddlePaddle/Paddle/pull/37023), [#38298](https://github.com/PaddlePaddle/Paddle/pull/38298)) 
+
+- 修复动转静下 Tensor array 使用 Slice 索引异常的问题。([#39251](https://github.com/PaddlePaddle/Paddle/pull/39251)) 
+
+- 修复 `paddle.Tensor.register_hook` 接口使用时临时变量未析构，从而导致内存或显存泄漏的问题。([#40716](https://github.com/PaddlePaddle/Paddle/pull/40716))
+
+- 修复 `Tensor.getitem` 当索引是全为 False 的 bool Tensor 时无法取值的问题。([#41297](https://github.com/PaddlePaddle/Paddle/pull/41297)) 
+
+- 修复 `Tensor.getitem` 当索引是 bool scalar Tensor 时无法取值的问题。([#40829](https://github.com/PaddlePaddle/Paddle/pull/40829)) 
+
+- 修复 `paddle.index_select` 在 index 为 0-shape Tensor 时报错的问题。([#41383](https://github.com/PaddlePaddle/Paddle/pull/41383)) 
+
+- 修复 `paddle.index_select`，`paddle.index_sample` 申请的 GPU 线程数超过有限机器资源时报错的问题。([#41127](https://github.com/PaddlePaddle/Paddle/pull/41127), [#37816](https://github.com/PaddlePaddle/Paddle/pull/37816), [#39736](https://github.com/PaddlePaddle/Paddle/pull/39736), [#41563](https://github.com/PaddlePaddle/Paddle/pull/41563)) 
+
+- 修复 ReduceConfig、elemwise_grad、gather、gather_nd、scatter ops 申请 GPU 线程数超过有限机器资源时报错的问题。([#40813](https://github.com/PaddlePaddle/Paddle/pull/40813), [#41127](https://github.com/PaddlePaddle/Paddle/pull/41127)) 
+
+- 修复 Kernel Primitive API 中 ReadData，ReadDataBc，ReadDataReduce 在 NX != 1 时访存越界的问题。([#36373](https://github.com/PaddlePaddle/Paddle/pull/36373))
+
+- 修复 IndexRandom 数据类型错误导致数据溢出计算结果异常的问题。([#39867](https://github.com/PaddlePaddle/Paddle/pull/39867), [#39891](https://github.com/PaddlePaddle/Paddle/pull/39891))
+
+- 修复 reduce op 在 reduce_num = 1 计算结果返回错误的问题。([#38771](https://github.com/PaddlePaddle/Paddle/pull/38771))
+
+- 修复 reduce op 在 HIP 环境下 reduce 中间维度出现访存越界的问题。([#41273](https://github.com/PaddlePaddle/Paddle/pull/41273))
+
+- 修复 matmul op 两个 FP16 一维向量计算时 Kernel 无法正常释放的问题。
+
+- 修复部分算子在 CUDA 上因整型计算溢出导致的问题，包括：bernoulli、gaussian_random、gumbel_softmax、multinomial、truncated_gaussian_random、uniform_random_inplace、uniform_random ops。 ([#37670](https://github.com/PaddlePaddle/Paddle/pull/37670))
+
+- 修复 `paddle.nn.Sequential` 在 for 循环遍历 sublayers 时会报 KeyError 错误的 bug。([#39372](https://github.com/PaddlePaddle/Paddle/pull/39372))
+
+- 修复 `paddle.nn.functional.unfold` 在静态图下编译时检查 shape 错误的 bug。([#38907](https://github.com/PaddlePaddle/Paddle/pull/38907), [#38819](https://github.com/PaddlePaddle/Paddle/pull/38819)) 
+
+- 修复静态图使用 dropout 时如果指定了 `axis` 后会报错的问题。([#37223](https://github.com/PaddlePaddle/Paddle/pull/37223))
+
+- 迁移 `paddle.nn.MultiHeadAttention`中 matmul 算子到 matmul_v2 算子。([#36222](https://github.com/PaddlePaddle/Paddle/pull/36222))
+
+- 修复 `paddle.nn.functional.label_smooth`在输入为空 Tensor 时抛出 FPE 的问题。([#35861](https://github.com/PaddlePaddle/Paddle/pull/35861)）
 
+- 修复 reshape op 空 Tensor 形变问题， 支持将空 Tensor rehape 成[-1]。 ([#36087](https://github.com/PaddlePaddle/Paddle/pull/36087)) 
+
+- 修复 `fill_diagonal`参数 offset 非零时会造成修改值跨行问题。([#36212](https://github.com/PaddlePaddle/Paddle/pull/36212))
+
+- 修改动态图模式下 range op 返回 stop gradient 设置成 True。([#37486](https://github.com/PaddlePaddle/Paddle/pull/37486)) 
+
+- 修复 Lamb 优化器当 Beta1Pow 和 Beta2Pow 在 GPU 上时更新错误的 bug。([#38518](https://github.com/PaddlePaddle/Paddle/pull/38518))
+
+- 修复 conv2d 算子 FLAGS_cudnn_deterministic 设置不生效的问题。([#37173](https://github.com/PaddlePaddle/Paddle/pull/37173)) 
+
+- 修复因早期版本的 cufft 没有定义 CUFFT_VERSION 引发的问题。([#37312](https://github.com/PaddlePaddle/Paddle/pull/37312)) 
+
+- 修复 `paddle.ifftshit` , `paddle.fftshift` 计算错误问题。([#36834](https://github.com/PaddlePaddle/Paddle/pull/36834), [#36748](https://github.com/PaddlePaddle/Paddle/pull/36748)) 
+
+- 修复 `paddle.fft` 系列 API 中的 `axis` 计算错误。 ([#36321](https://github.com/PaddlePaddle/Paddle/pull/36321)) 
 
 #### IR(Intermediate Representation)
 
 - 动态图转静态图
-    - 修复了动转静后，在 ``paddle.no_grad`` 语义下显存异常增长的问题。([#35725](https://github.com/PaddlePaddle/Paddle/pull/35725))
-    - 修复了对 ``paddle.no_grad`` 接口的错误识别和转换问题。([#34136](https://github.com/PaddlePaddle/Paddle/pull/34136)) 
-    - 修复了部分场景下模型中间设置 stop_gradient=True 时，动转静训练报错的问题。([#36353](https://github.com/PaddlePaddle/Paddle/pull/36353))
-    - 修复了在控制流 if 的部分场景转换时，对返回结果检查会报错的问题。([#36830](https://github.com/PaddlePaddle/Paddle/pull/36830))
-    - 修复了在 ifelse 分支返回不等长结果时，动转静会额外对齐返回长度导致返回类型意外改变的问题。([#36565](https://github.com/PaddlePaddle/Paddle/pull/36565))
-    - 修复使用 jit.save/load 接口加载模型后，在 train 模式和 no_grad 上下文中，显存会一直增长的问题。([#36463](https://github.com/PaddlePaddle/Paddle/pull/36463))
-
+  
+  - 修复 `tensor_array` 搭配控制流使用时，在反向梯度累加时存在的类型推导错误问题。([#39585](https://github.com/PaddlePaddle/Paddle/pull/39585), [#39689](https://github.com/PaddlePaddle/Paddle/pull/39689)) 
+  
+  - 修复动转静 AMP 训练时参数梯度类型未被正确设置的问题。([#40938](https://github.com/PaddlePaddle/Paddle/pull/40938)) 
+  
+  - 修复代码中存在错位注释时，动转静代码解析报错的问题。([#39035](https://github.com/PaddlePaddle/Paddle/pull/39035), [#38003](https://github.com/PaddlePaddle/Paddle/pull/38003)) 
+  
+  - 修复动转静代码中调用非 forward 函数时，Tensor 未被正确转化为 Variable 的问题。([#37296](https://github.com/PaddlePaddle/Paddle/pull/37296), [#38540](https://github.com/PaddlePaddle/Paddle/pull/38540)) 
+  
+  - 修复动转静代码转写时 `paddle` 被错误地作为变量传递的问题。([#37999](https://github.com/PaddlePaddle/Paddle/pull/37999)) 
+  
+  - 修复模型动转静后调用 `paddle.flops` 时模型参数统计错误的问题。([#36852](https://github.com/PaddlePaddle/Paddle/pull/36852)) 
+  
+  - 修复使用 `paddle.jit.save/load` 接口加载模型后，在 train 模式和 no_grad 上下文中，显存会一直增长的问题。([#36434](https://github.com/PaddlePaddle/Paddle/pull/36434)) 
+  
+  - 添加在 convert_call 对 generator function 转换时的警告。([#35369](https://github.com/PaddlePaddle/Paddle/pull/35369)) 
+  
+  - 修复 run_program op 依赖分析的问题。 ([#38470](https://github.com/PaddlePaddle/Paddle/pull/38470)) 
+  
+  - 修复控制流 For 中返回单值时代码转换的问题。([#40683](https://github.com/PaddlePaddle/Paddle/pull/40683)) 
+  
+  - 修复控制流 cond 的输入包含 LoDTensorArray 时，生成反向 op 会报错的问题。([#39585](https://github.com/PaddlePaddle/Paddle/pull/39585)) 
 
 #### 分布式训练
 
 - 分布式训练基础功能
-    - 修复图引擎潜在的栈溢出问题。 ([#33055](https://github.com/PaddlePaddle/Paddle/pull/33055)) 
-    - 修复分布式训练可能出现的死锁问题。 ([#34461](https://github.com/PaddlePaddle/Paddle/pull/34461))
-    - 修复张量并行在 transformer 类模型的多头注意力计算中切分不正确的问题，优化张量并行在混合精度计算时的速度。 ([#33015](https://github.com/PaddlePaddle/Paddle/pull/33015)) 
-    - 修复模型并行下使用 `paddle.nn.ClipGradientByGlobalNorm` 时，非 distributed 的 vars 的 norm 被多次计算的问题。([#35713](https://github.com/PaddlePaddle/Paddle/pull/35713))
-    - 修复模型并行`paddle.distributed.split` Parallel Linear 行切分bias加法位置出错的问题。([#35186](https://github.com/PaddlePaddle/Paddle/pull/35186))
-    - 修复流水线并行初始化通信组可能 hang 的问题。 ([#33476](https://github.com/PaddlePaddle/Paddle/pull/33476))
-    - 修复流水线并行中 `Tensor` 显存在实际使用完成前被释放的问题。 ([#33996](https://github.com/PaddlePaddle/Paddle/pull/33996))
-    - 修复流水线并行时反向梯度累加 `op_device`为空的问题。([#33875](https://github.com/PaddlePaddle/Paddle/pull/33875))
-    - 修复流水线并行运行 `sub-block` 报错的问题。([#32727](https://github.com/PaddlePaddle/Paddle/pull/32727))
-    - 修复流水线并行时反向梯度累加 `op_device`为空的问题。([#33875](https://github.com/PaddlePaddle/Paddle/pull/33875))
-    - 修复 Sharding 并行通信初始化时偶尔 hang 住的问题。 ([#33327](https://github.com/PaddlePaddle/Paddle/pull/33327))
-    - 修复 `paddle.distributed.barrier` 同步流错误。 ([#33476](https://github.com/PaddlePaddle/Paddle/pull/33476))
-    - 修复 `paddle.distributed.alltoall` 通信组设置错误的问题。([#32890](https://github.com/PaddlePaddle/Paddle/pull/3492890))
-    - 修复静态图张量并行参数初始换广播错误导致的精度对不齐问题。([35326](https://github.com/PaddlePaddle/Paddle/pull/35326))
-    - 修复动态图数据并行不支持 `recompute` 等继承 `PyLayer` 类实现的自定义算子的问题。([#35401](https://github.com/PaddlePaddle/Paddle/pull/35401))
-    - 修复混合并行下流水线并行 + 数据并行 hang 住的问题。([#34142](https://github.com/PaddlePaddle/Paddle/pull/34142))
-    - 修复开启 AMP 时，`fleet.get_loss_scaling` 失败的问题。([#33935](https://github.com/PaddlePaddle/Paddle/pull/33935))
-    - 修复 `fleet` 多机未 wait server ready 的问题。([#32889](https://github.com/PaddlePaddle/Paddle/pull/32889))
-    - 修复分布式预测 `infer_from_dataset` 仍旧更新参数梯度的问题。([#35698](https://github.com/PaddlePaddle/Paddle/pull/35698))
-    - 修复 `data_feed` 中 dense 特征 LOD 属性设置错误的问题。([#35000](https://github.com/PaddlePaddle/Paddle/pull/35000))
-    - 修复静态图使用 `gradientmerge`时 `gradient_merge_cond` 变量的 save 问题。([#35578](https://github.com/PaddlePaddle/Paddle/pull/35578))
-    - 修复 `paddle.hub`下载文件名字和 `nt_merge_cond` 变量的 save 问题。([#35578](https://github.com/PaddlePaddle/Paddle/pull/35578))
-    - 修复 `fleet` 开启 `dump_slot` 时报错不明显的问题。 ([#34173](https://github.com/PaddlePaddle/Paddle/pull/34173))
-    - 修复混合并行训练在海光 DCU 芯片上的 RCCL 的问题。([#32808](https://github.com/PaddlePaddle/Paddle/pull/32808))
-    - 修复 GPU 参数服务器退出报错问题。([#33724](https://github.com/PaddlePaddle/Paddle/pull/33724))
-    - 修复 hdfs 工具upload/download功能不可用问题。([#33903](https://github.com/PaddlePaddle/Paddle/pull/33903))
-    - 修复 GPU 参数服务器训练过程中由于样本不能整除worker数而卡住的问题。([#32640](https://github.com/PaddlePaddle/Paddle/pull/32640))
-    - 修复 GPU 参数服务器使用非0卡训练报错问题。([#33078](https://github.com/PaddlePaddle/Paddle/pull/33078))
-    - 修复 GPU 参数服务器 delta score，scale show问题。([#33492](https://github.com/PaddlePaddle/Paddle/pull/33078), [#33492](https://github.com/PaddlePaddle/Paddle/pull/33492))
-    - 修复 GPU 参数服务器训练结束后未 merge dense，g2sum 计算有误，data norm 添加了optimize op 等问题。 ([#35029](https://github.com/PaddlePaddle/Paddle/pull/35029))
-    - 修复使用 fuse all reduce ops 开关时，如果梯度出现 empty 时会报错的问题。([#36231](https://github.com/PaddlePaddle/Paddle/pull/36231))
-    - 修复 dist_transformer 文件出现未定义的变量问题。([#36211](https://github.com/PaddlePaddle/Paddle/pull/36211))
-	
-	
-	
+  
+  - 修复分布式多机训练时，端口报错的问题。([#37274](https://github.com/PaddlePaddle/Paddle/pull/37274)) 
+  
+  - 修复 brpc 编译依赖问题。([#37064](https://github.com/PaddlePaddle/Paddle/pull/37064)) 
+  
+  - 修复 Fleet 启动时，由于 tcp 自连接产生的端口被占用的问题。([#38174](https://github.com/PaddlePaddle/Paddle/pull/38174)) 
+  
+  - 修复数据并行下，由于 FP16 参数在多卡下初始化不一致，导致精度下降的问题。([#38838](https://github.com/PaddlePaddle/Paddle/pull/38838), [#38563](https://github.com/PaddlePaddle/Paddle/pull/38563), [#38405](https://github.com/PaddlePaddle/Paddle/pull/38405))
+  
+  - 修复数据并行下，由于 FP16 梯度同步时，没有除以卡数，导致精度下降的问题。([#38378](https://github.com/PaddlePaddle/Paddle/pull/38378))
+
 - 动态图混合并行
-	- 修复流水线并行计算错误的问题。([#35556](https://github.com/PaddlePaddle/Paddle/pull/35556))
-	- 修复张量并行下，c_spilt 的方向计算的问题。([#33207](https://github.com/PaddlePaddle/Paddle/pull/33207))
-	- 修复张量并行下，精度无法对齐的问题。([#32897](https://github.com/PaddlePaddle/Paddle/pull/32897))
-	- 修复new_group() 创建通信组创建时，出现随机hang的情况。([#33141](https://github.com/PaddlePaddle/Paddle/pull/33141))
-	- 修复数据并行下 reducer 遍历反向图的问题。( [#32715](https://github.com/PaddlePaddle/Paddle/pull/32715))
-	- 修复数据并行下参数同步的属性缺失的问题。 ([#33955](https://github.com/PaddlePaddle/Paddle/pull/33955))
+  
+  - 修复在混合并行下，通过使用新 update 接口，FP16 模式不更新参数的问题。([#36017](https://github.com/PaddlePaddle/Paddle/pull/36017))
 
 - 静态图混合并行
-    - 解决 TensorParallel 在 Multi-Head Attention 网络中的切分错误问题，优化 TensorParallel 与混合精度共同使用时的训练速度。([#32897](https://github.com/PaddlePaddle/Paddle/pull/32897))
-	
+  
+  - 修复分布式 dp 模式下 grad merge 与 ClipGradientByGlobalNorm 不兼容的问题。([#36334](https://github.com/PaddlePaddle/Paddle/pull/36334)) 
+  
+  - 修复混合并行下，张量模型并行的非分布式参数在初始化阶段未被广播，导致各卡非分布式参数不一致的问题。([#36186](https://github.com/PaddlePaddle/Paddle/pull/36186)) 
+  
+  - 修复 sharding 开启 offload 时，sharding 的 save_persistables 接口未保存 FP16 参数和 offload 持久化变量的问题。([#40477](https://github.com/PaddlePaddle/Paddle/pull/40477)) 
+  
+  - 修复开启 sharding 训练时，ema 参数在非0号卡上无法保存的问题。([#39860](https://github.com/PaddlePaddle/Paddle/pull/39860))
+  
+  - 修复 FC 按照列切分梯度计算错误的问题。([#38724](https://github.com/PaddlePaddle/Paddle/pull/38724))
+  
+  - 修复 DistributedStrategy 设置为 without_graph_optimizer 时和 rnn 一起使用报错的问题。 ([#36176](https://github.com/PaddlePaddle/Paddle/pull/36176)) 
+
+- GPUPS 参数服务器训练
+  
+  - 修复 GPUPS 宏定义触发 CPU 分支编译问题。([#37248](https://github.com/PaddlePaddle/Paddle/pull/37248))
+  
+  - 修复 GPUPS 流水线训练时在保存 delta 和 pullsparse 并发时引发的偶发报错问题。([#37233](https://github.com/PaddlePaddle/Paddle/pull/37233))
+  
+  - 修复 HDFSClient 查询目录未返回全路径，引发下载报错问题。 ([#36590](https://github.com/PaddlePaddle/Paddle/pull/36590))
+  
+  - 修复 GPUPS 流水线训练时拉取老参数问题。([#36512](https://github.com/PaddlePaddle/Paddle/pull/36512)) 
+  
+  - 修复 GPUPS 多流 allocation 问题。([#37476](https://github.com/PaddlePaddle/Paddle/pull/37476))
+  
+  - 修复 GPUPS pybind 出 core 的问题。([#37287](https://github.com/PaddlePaddle/Paddle/pull/37287))
+
 #### 其他
-- 自定义OP
-    - 修复 ``paddle::Tensor`` 的 ``cast`` 方法在 GPU 下不生效的问题。([#34884](https://github.com/PaddlePaddle/Paddle/pull/34884))
-    - 修复自定义算子不能同时加载多个模块的问题。([#34505](https://github.com/PaddlePaddle/Paddle/pull/34505))
-    - 修复联合编译 .cc 和 .cu 文件时，``PADDLE_WITH_CUDA`` 宏未生效的问题。([#35448](https://github.com/PaddlePaddle/Paddle/pull/35448))
-- 去除对 ``logging`` 库全局设置的修改。 ([#32673](https://github.com/PaddlePaddle/Paddle/pull/32673))
-- 新增 ``GlooParallelContext``，适配 `Reducer` 模块逻辑，为 `DataParallel` 后续支持CPU并行提供底层通信组件支持。 ([#35154](https://github.com/PaddlePaddle/Paddle/pull/35154))
-- 迁移 `paddle.metric.accuracy` 中的 `top_k` op 为 `top_k_v2` op。 ([#35789](https://github.com/PaddlePaddle/Paddle/pull/35789))
-- 修复 `MKLDNN` 下运行找不到默认 `attr` 的问题。([#34567](https://github.com/PaddlePaddle/Paddle/pull/34567))
-- 修复 `optimizer` 中没有给 `clear_float_status` OP添加 `device_key` 的问题。([#34431](https://github.com/PaddlePaddle/Paddle/pull/34431))
 
+- 修复动态图量化训练保存模型时 clip_extra 的问题。([#38323](https://github.com/PaddlePaddle/Paddle/pull/38323))
+
+- 修复动态图量化训练 abs_max scale 初始化的问题。([#39307](https://github.com/PaddlePaddle/Paddle/pull/39307))
+
+- 修复动态图量化训练保存模型节点异常的问题。([#38102](https://github.com/PaddlePaddle/Paddle/pull/38102), [#38012](https://github.com/PaddlePaddle/Paddle/pull/38012))
+
+- 修复离线量化 flatten op 输出错误问题。([#37722](https://github.com/PaddlePaddle/Paddle/pull/37722)) 
+
+- 修复了反量化 matmul op 时，维度对不上的问题。([#36982](https://github.com/PaddlePaddle/Paddle/pull/36982))
+
+- 修复了量化无权重的 matmul_v2 时，错误添加量化 op 的问题。([#36593](https://github.com/PaddlePaddle/Paddle/pull/36593))
+
+- 修复 conv op channel wise 量化在保存模型时 quant_axis 属性保存错误。([#39054](https://github.com/PaddlePaddle/Paddle/pull/39054)) 
+
+- 修复 ChannelWise 量化训练速度慢的问题。([#40772](https://github.com/PaddlePaddle/Paddle/pull/40772))
+
+- 修复量化训练初始化为0的 Tensor 出 NAN 的问题。([#36762](https://github.com/PaddlePaddle/Paddle/pull/36762))
+
+- 修复多线程场景下混合精度 amp_level 设置错误问题。([#39198](https://github.com/PaddlePaddle/Paddle/pull/39198)) 
+
+- 修复混合精度训练与 PyLayer，Recompute 等一起使用时，PyLayer 和 Recompute 中未正确设置混合精度的问题。([#39950](https://github.com/PaddlePaddle/Paddle/pull/39950), [#40042](https://github.com/PaddlePaddle/Paddle/pull/40042)) 
+
+- 修复了 Mac 下编译自定义算子时 `D_GLIBCXX_USE_CXX11_ABI` 未生效的问题。([#37878](https://github.com/PaddlePaddle/Paddle/pull/37878)) 
+
+- 修复 initializer 相关 API 在 block=None 时动静行为不统一的问题。([#37827](https://github.com/PaddlePaddle/Paddle/pull/37827)) 
+
+- 修复 python3.6 环境下没有 fluid 模块的 bug。([#35862](https://github.com/PaddlePaddle/Paddle/pull/35862)) 
+
+- 修复优化器 `paddle.optimizer.Adamw` 错误调用 adam op 的 bug。([#36028](https://github.com/PaddlePaddle/Paddle/pull/36028)) 
+
+- 修复 multi tensor 策略下 `paddle.optimizer.Momentum` 优化器参数 `regularizer` 属性为 None 时的逻辑错误。([#38344](https://github.com/PaddlePaddle/Paddle/pull/38344)) 
+
+- 修复 multi tensor 策略下 `paddle.optimizer.Momentum`、`paddle.optimizer.Adam` 优化器会对 `multi_precision` 属性进行修改的错误。([#38991](https://github.com/PaddlePaddle/Paddle/pull/38991)) 
+
+- 修复最终态 API amp 与 optional 类型 Tensor 组合使用的代码编译错误。([#40980](https://github.com/PaddlePaddle/Paddle/pull/40980)) 
+
+- 修复 paddle+lite+xpu 预测库调用 lite CPU 预测时会报错的 bug，修复 paddle+lite(without NNAdapter) 编译时会报错的 bug。 ([#37449](https://github.com/PaddlePaddle/Paddle/pull/37449)) 
+
+- 修复 Debug 编译模式下 LoDTensorArray 因 Pybind11 绑定不一致导致 crash 的 bug。([#37954](https://github.com/PaddlePaddle/Paddle/pull/37954)) 
+
+- 修复 shape 参数为 Tensor 和 int 构成列表的极端情况下，无法正确构建 Tensor 的 bug。([#38284](https://github.com/PaddlePaddle/Paddle/pull/38284)) 
+
+- 修复 `paddle.optimizer.AdamW` API 兼容性问题。([#37905](https://github.com/PaddlePaddle/Paddle/pull/37905)) 
+
+- 修复 _InstanceNormBase 中 extra_repr 的返回错误。([#38537](https://github.com/PaddlePaddle/Paddle/pull/38537)) 
+
+- 修复联编开启 -DWITH_DISTRIBUTED 生成 Paddle Inference 缺少符号 `paddle::distributed::TensorTable` 的问题。 ([#41128](https://github.com/PaddlePaddle/Paddle/pull/41128)) 
+
+- matmul_v2 op 新增 shape check，在 shape 中存在0值进行信息报错。 ([#35791](https://github.com/PaddlePaddle/Paddle/pull/35791)) 
+
+- 修复动态图 recompute 对于没有梯度输入提示信息反复打印，改成用 warning 只打印一次的方式。([#38293](https://github.com/PaddlePaddle/Paddle/pull/38293)) 
+
+- 修复 gelu op 在视觉模型中训练后期在验证集上精度低的问题。([#38450](https://github.com/PaddlePaddle/Paddle/pull/38450)) 
+
+- 修复 adamw op 在数值计算上误差问题。([#37746](https://github.com/PaddlePaddle/Paddle/pull/37746)) 
+
+- 补充 sparse_momentum `_C_ops` 接口 MasterParam 和 MasterParamOut 参数。([#39969](https://github.com/PaddlePaddle/Paddle/pull/39969)) 
+
+- 修复 python3.6 环境下没有 `distributed` 模块的 bug。([#35848](https://github.com/PaddlePaddle/Paddle/pull/35848)) 
+
+- 修复 eigh 单元测试数据初始化问题。([#39568](https://github.com/PaddlePaddle/Paddle/pull/39568)) 
+
+- 修复 eigvalsh 单元测试数据初始化问题。([#39841](https://github.com/PaddlePaddle/Paddle/pull/39841)) 
+
+- 修复 segment op 在 V100上寄存器使用过多导致不能正常运行的问题。([#38113](https://github.com/PaddlePaddle/Paddle/pull/38113)) 
+
+- 修复 conv 相关算子稀疏化维度错误的问题。([#36054](https://github.com/PaddlePaddle/Paddle/pull/36054)) 
 
+- 提供自动稀疏训练（Automatic SParsity）静态图相关功能 Alias 至 `Paddle.static.sparsity`。([#36525](https://github.com/PaddlePaddle/Paddle/pull/36525)) 
+
+- 修复 divide op 整数除法还是整数的 bug。([#40890](https://github.com/PaddlePaddle/Paddle/pull/40890)) 
+
+- 修复 `paddle.multiplex` 候选 Tensor 大小为0崩溃问题。([#34972](https://github.com/PaddlePaddle/Paddle/pull/34972)) 
+
+- 修复 `paddle.kl_div` 参数 `reduction` 给定情况下速度异常的问题。([#37283](https://github.com/PaddlePaddle/Paddle/pull/37283)) 
+
+- 修复 Cifar 数据集加载 data source 无序的问题。 ([#37272](https://github.com/PaddlePaddle/Paddle/pull/37272)) 
+
+- 修复 ProgressBar 类中 loss 从 uint16 到 float 的转换。([#39231](https://github.com/PaddlePaddle/Paddle/pull/39231)) 
+
+- 修复 ShareBufferWith 共享数据类型的问题。([#37464](https://github.com/PaddlePaddle/Paddle/pull/37464), [#37247](https://github.com/PaddlePaddle/Paddle/pull/37247)) 
+
+- 修复 `paddle.io.DataLoader` 使用 IterableDataset 并且 num_workers>0 时的性能问题。([#40541](https://github.com/PaddlePaddle/Paddle/pull/40541)) 
+
+- 修复 `paddle.vision.ops.yolo_loss` 动态图返回值不全的问题。([#40185](https://github.com/PaddlePaddle/Paddle/pull/40185)) 
+
+- 移出 `paddle.io.BatchSampler` 对输入参数 dataset 需要是 `paddle.io.Dataset` 类型的限制，扩大对用户自定义数据集的支持。([#40184](https://github.com/PaddlePaddle/Paddle/pull/40184)) 
+
+- 修复 `paddle.summary` 报错op_flops不存在的问题。([#36489](https://github.com/PaddlePaddle/Paddle/pull/36489)) 
+
+- 修复 lars_momentum op 在 lars_weight_decay=0 时公式错误的问题。([#40892](https://github.com/PaddlePaddle/Paddle/pull/40892))
+
+- 修复 optimize-offload 无法保存 presistable var 的问题。([#36433](https://github.com/PaddlePaddle/Paddle/pull/36433))
+
+- 修复 optimizer-offload 不支持 adamw op type 的问题。 ([#36432](https://github.com/PaddlePaddle/Paddle/pull/36432))
+
+- 修复多线程场景下，Tracer 中 enable_program_desc_tracing_数据不安全的问题。([#39776](https://github.com/PaddlePaddle/Paddle/pull/39776)) 
+
+- 修复模型读取时模型档案大小未初始化的问题。([#40518](https://github.com/PaddlePaddle/Paddle/pull/40518)) 
+
+- 修复 Expand op 逻辑 bug，当输入Tensor X 的维度，小于要拓展的 shape 时，可能导致取得 Out.Shape 是错误的。([#38677](https://github.com/PaddlePaddle/Paddle/pull/38677))
+
+- 修复 Expand_As op 只取 y.shape，而没有 Y 变量输入时，导致的动转静报错。([#38677](https://github.com/PaddlePaddle/Paddle/pull/38677))
+
+- 修复 Expand_As op 计算输出 shape 时逻辑的错误。([#38677](https://github.com/PaddlePaddle/Paddle/pull/38677))
+
+- 框架功能修复
+  
+  - 修复 `core.VarDesc.VarType.STRINGS` 类型的变量获取 `lod_level` 属性报错的问题，并且设置其 `lod_level` 为None。([#39077](https://github.com/PaddlePaddle/Paddle/pull/39077))
+  
+  - 修复框架功能 `Pylayer` 不支持不同 dtype 的问题。 ([#37974](https://github.com/PaddlePaddle/Paddle/pull/37974))
+
+- API修复
+  
+  - 修复了学习率衰减 API `paddle.optimizer.lr.PolynomialDecay` 的零除问题。 ([#38782](https://github.com/PaddlePaddle/Paddle/pull/38782)) 
+  
+  - 修复调用 DisableGlogInfo() 接口后依旧残留部分日志的问题。 ([#36356](https://github.com/PaddlePaddle/Paddle/pull/36356)) 
+
+- 修复 SimpleRNN、GRU和LSTM API CPU训练时多层RNN（dropout 设置为0时）反向计算出错的问题。 ([#37080](https://github.com/PaddlePaddle/Paddle/pull/37080)) 
+
+- 为 cufft 和 hipfft 后端的 fft 添加了 cache。 ([#36646](https://github.com/PaddlePaddle/Paddle/pull/36646)) 
+
+- 使 `paddle.roll` 的 shifts 参数支持传入 Tensor。 ([#36727](https://github.com/PaddlePaddle/Paddle/pull/36727)) 
+
+- 为 fft 添加 onemkl 作为可选的计算后端。 ([#36414](https://github.com/PaddlePaddle/Paddle/pull/36414)) 
 
 ## 4. 部署方向（Paddle Inference）
-### （1）新增功能
 
-#### 后端能力增强
-- 新增 TensorRT 子图模式下动态 shape 自动配置功能
-  增加TensorRT离线tune动态shape设置方式，对于模型被切分成多个TensorRT子图的场景，提升易用性[#34806](https://github.com/PaddlePaddle/Paddle/pull/34806) [#35771](https://github.com/PaddlePaddle/Paddle/pull/35771)，使用示例可参考[demo](https://github.com/PaddlePaddle/Paddle-Inference-Demo/tree/master/c%2B%2B/paddle-trt/tuned_dynamic_shape)。
+### （1）新增特性
 
-    - 易用性优化的基本思想是：使用Paddle原生运行的方式针对用户输入的批量数据，统计计算图中所有临时tensor的shape范围，并将统计到的shape范围设置到TensorRT子图的输入，从而避免了用户去手动计算内部子图输入tensor的shape范围，提升易用性。
-    - 离线tuned动态shape使用的基本流程：用户代码完成后，通过配置config，启用shape范围收集能力c++接口`config.CollectShapeRangeInfo("shape_range.pbtxt")`或python接口`config.collect_shape_range_info('shape_range.pbtxt')`，将获得的shape范围以prototxt的格式存储在本地，修改config配置，关闭shape收集，开启tensorrt和动态shape能力，c++接口`config.EnableTunedTensorRtDynamicShape("shape_range.pbtxt", true)`或python接口`config.enable_tuned_tensorrt_dynamic_shape('shape_range.pbtxt', True)`即可直接运行。  
+#### 新增API
 
+- 增加 Java API，Java 开发者可以通过简单灵活的接口实现在服务端和云端的高性能推理。([#37162](https://github.com/PaddlePaddle/Paddle/pull/37162))
 
-- 新增对昇腾(Ascend)系列硬件的原生支持
-    - 子图通过支持Paddle-Lite NNAdapter接入ascend310硬件预测 [#35226](https://github.com/PaddlePaddle/Paddle/pull/35226)， 示例可参考[demo](https://github.com/PaddlePaddle/Paddle-Inference-Demo/tree/master/c%2B%2B/ascend310_lite_subgraph/image_classification_demo)。
-    - 新增晟腾910 推理支持 [#34101](https://github.com/PaddlePaddle/Paddle/pull/34101)
+- 增加 `GetTrtCompileVersion` 和 `GetTrtRuntimeVersion` 接口，用于获取 TensorRT 版本信息。([#36429](https://github.com/PaddlePaddle/Paddle/pull/36429))
 
-- 新增pool3d算子支持TensorRT的功能。([#36545](https://github.com/PaddlePaddle/Paddle/pull/36545))
+- 增加 `ShareExternalData` 接口，避免推理时对输入数据进行内存拷贝。([#39809](https://github.com/PaddlePaddle/Paddle/pull/39809))
 
-### （2）功能优化
+#### 新增功能
+
+- 新增 ONNX Runtime 后端支持，当前集成版本只支持 CPU。([#39988](https://github.com/PaddlePaddle/Paddle/pull/39988), [#40561](https://github.com/PaddlePaddle/Paddle/pull/40561))
+
+- 基于 Paddle Lite 子图方式，新增昇腾310推理支持。([#35226](https://github.com/PaddlePaddle/Paddle/pull/35226))
+
+- 新增原生 GPU FP16 推理功能。([#40531](https://github.com/PaddlePaddle/Paddle/pull/40531))
+
+- switch_ir_debug 接口增加 dump 模型的功能。([#36581](https://github.com/PaddlePaddle/Paddle/pull/36581))
+
+- 新增 TensorRT config 的配置接口：`void UpdateConfigInterleaved(paddle_infer::Config* c, bool with_interleaved)`，用于 int8 量化推理中特殊的数据排布。([#38884](https://github.com/PaddlePaddle/Paddle/pull/38884)) 
+
+- log 中增加 TensorRT inspector 输出信息，仅在 TensorRT 8.2及以上版本有效。 ([#38362](https://github.com/PaddlePaddle/Paddle/pull/38362)，[#38200](https://github.com/PaddlePaddle/Paddle/pull/38200))) 
+
+- 增加 TensorRT ASP 稀疏推理支持。([#36413](https://github.com/PaddlePaddle/Paddle/pull/36413))
+
+### （2）底层优化
+
+#### CPU性能优化
+
+- 优化 MKLDNN 的缓存机制。([#38336](https://github.com/PaddlePaddle/Paddle/pull/38336), [#36980](https://github.com/PaddlePaddle/Paddle/pull/36980), [#36695](https://github.com/PaddlePaddle/Paddle/pull/36695)) 
+
+- 新增 matmul_scale_fuse pass。([#37962](https://github.com/PaddlePaddle/Paddle/pull/37962)) 
+
+- 新增 MKLDNN reshape_transpose_matmul_v2_mkldnn_fuse_pass。([#37847](https://github.com/PaddlePaddle/Paddle/pull/37847), [#40948](https://github.com/PaddlePaddle/Paddle/pull/40948)) 
+
+- 新增 MKLDNN conv_hard_sigmoid_mkldnn_fuse_pass。([#36869](https://github.com/PaddlePaddle/Paddle/pull/36869))
+
+- 新增 MKLDNN matmul_v2_transpose_reshape_fuse_pass。([#36481](https://github.com/PaddlePaddle/Paddle/pull/36481))
+
+- 新增 MKLDNN softplus_activation_mkldnn_fuse_pass。([#36657](https://github.com/PaddlePaddle/Paddle/pull/36657))
+
+- 新增 MKLDNN elt_act_mkldnn_fuse_pass。([#36541](https://github.com/PaddlePaddle/Paddle/pull/36541))
 
-#### 框架及API更新
+- 新增 MKLDNN mish 算子及 conv_mish_mkldnn_fuse_pass。([#38623](https://github.com/PaddlePaddle/Paddle/pull/38623))
 
-- 量化支持 
-    - 动态图量化推理 pass 的重构，支持非模拟量化的 OP和模拟量化的 OP。([#35907](https://github.com/PaddlePaddle/Paddle/pull/35907))
-  - 增加 int8 的模拟量化OP matmul（权重乘以 tensor的情况）。([#34359](https://github.com/PaddlePaddle/Paddle/pull/34359))
-  - 修复MobileNetV3模型在量化训练过程中因量化参数为0导致的Loss出NAN问题。([#36763](https://github.com/PaddlePaddle/Paddle/pull/36763))
+#### GPU 性能优化
 
+- 将推理默认的显存分配策略由 `naive_best_fit` 变更为 `auto_growth`，解决部分模型占满 GPU 显存问题。([#41491](https://github.com/PaddlePaddle/Paddle/pull/41491))
 
-- API 增强
-    - 基于新版CAPI重构GO API，[#33113](https://github.com/PaddlePaddle/Paddle/pull/33113)，使用示例可参考[demo](https://github.com/PaddlePaddle/Paddle-Inference-Demo/tree/master/go/resnet50)。
-    - 预测python api `copy_from_cpu` 和 `copy_to_cpu` 接口支持float16数据类型 。([#34676](https://github.com/PaddlePaddle/Paddle/pull/34676))
-    - 增加 `config.Summary()` 接口，打印config配置信息。([#34122](https://github.com/PaddlePaddle/Paddle/pull/34122))
-    - 预测库中 `version.txt` 记录trt版本信息补全，如v7.2.3.4 而不是v7。( [#33690](https://github.com/PaddlePaddle/Paddle/pull/33690))
+- 支持 gelu、FC+gelu ops 使用 TensorRT 推理。([#38399](https://github.com/PaddlePaddle/Paddle/pull/38399))合作团队
 
-- 库体积压缩
-    - linux 下对预测库进行strip裁剪库体积，体积压缩30M。([#34895](https://github.com/PaddlePaddle/Paddle/pull/34895))
+- 支持 `deformable_conv` 在静态 shape下使用 TensorRT 推理。([#36612](https://github.com/PaddlePaddle/Paddle/pull/36612) [#36850](https://github.com/PaddlePaddle/Paddle/pull/36850) [#37345](https://github.com/PaddlePaddle/Paddle/pull/37345))
 
-- 其他更新
-    - 新增捕获运行异常报错并将其转换为相应错误状态的辅助工具。([#35624](https://github.com/PaddlePaddle/Paddle/pull/35624))
-    - 新增相关基础数据结构，增强飞桨算子定义的精确性。([#33098](https://github.com/PaddlePaddle/Paddle/pull/33098))
+- 支持 nearest_interp_v2 op 使用 TensorRT 推理。([#34126](https://github.com/PaddlePaddle/Paddle/pull/34126))
 
-#### 后端能力增强
+- 增加 `yolo_box`TensorRT plugin，支持输入参数 `iou_aware` 和 `iou_aware_factor`，使推理计算得到的 IoU 作为置信度的因子。([#34128](https://github.com/PaddlePaddle/Paddle/pull/34128))
 
-- CPU 相关更新
-    - 升级oneDNN版本为2.3.2。( [#35040](https://github.com/PaddlePaddle/Paddle/pull/35040))
-    - 新增 quant-aware LSTM oneDNN INT8 模型支持。([#35382](https://github.com/PaddlePaddle/Paddle/pull/35382))
-    - 新增 post-training LSTM oneDNN INT8 模型支持。([#35334](https://github.com/PaddlePaddle/Paddle/pull/35334), [#33295](https://github.com/PaddlePaddle/Paddle/pull/33295))
-    - 新增 fusion_gru 和 multi_gru 融合和 post-training INT8的支持。([#33749](https://github.com/PaddlePaddle/Paddle/pull/33749))
-    - 优化oneDNN 的 cache机制。([#35664](https://github.com/PaddlePaddle/Paddle/pull/35664),  [#35331](https://github.com/PaddlePaddle/Paddle/pull/35331), [#35132](https://github.com/PaddlePaddle/Paddle/pull/35132), [#35030](https://github.com/PaddlePaddle/Paddle/pull/35030), [#35002](https://github.com/PaddlePaddle/Paddle/pull/35002), [#34830](https://github.com/PaddlePaddle/Paddle/pull/34830), [#33515](https://github.com/PaddlePaddle/Paddle/pull/33515), [#33048](https://github.com/PaddlePaddle/Paddle/pull/33048), [#32922](https://github.com/PaddlePaddle/Paddle/pull/32922), [#32499](https://github.com/PaddlePaddle/Paddle/pull/32499))
-    - 通过新增多个 op (如clip, scale等) 的oneDNN kernel 实现,  ch_ppocr_mobile_v1.1_det_infer、DPN68, fastscnn, hrnet、HRNet_W18_C、 icnet、Res2Net50_26w_4s、 ssdlite_mobilenet_v3_large 等模型打开oneDNN 比关闭 oneDNN 在 Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz 单核性能提升 47.8%。([#35601](https://github.com/PaddlePaddle/Paddle/pull/35601), [#32975](https://github.com/PaddlePaddle/Paddle/pull/32975))
-    - 优化了oneDNN LSTM INT8 模型，在Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz 单核上，INT8 LSTM 模型为 FP32 LSTM 模型性能的 1.59 倍。([#35382](https://github.com/PaddlePaddle/Paddle/pull/35382), [#35334](https://github.com/PaddlePaddle/Paddle/pull/35334), [#34820](https://github.com/PaddlePaddle/Paddle/pull/34820), [#34137](https://github.com/PaddlePaddle/Paddle/pull/34137))
+- 支持 `elementwise_sub` 和 `elementwise_div` 调用 TensorRT 推理。([#40806](https://github.com/PaddlePaddle/Paddle/pull/40806) [#41253](https://github.com/PaddlePaddle/Paddle/pull/41253))
 
+- 支持 `multiclass_nms3` 使用 TensorRT 推理。([#41181](https://github.com/PaddlePaddle/Paddle/pull/41181) [#41344](https://github.com/PaddlePaddle/Paddle/pull/41344))
 
-- GPU 及 TensorRT 子图引擎相关更新
+- 支持 flatten_contiguous_rang op 使用 TensorRT 推理。([#38922](https://github.com/PaddlePaddle/Paddle/pull/38922)) 
 
-    - 增加TensorRT 8.0的支持，在将来的某个版本我们会放弃对TensorRT 6.x的支持。([#34403](https://github.com/PaddlePaddle/Paddle/pull/34403), [#34294](https://github.com/PaddlePaddle/Paddle/pull/34294), [#34157](https://github.com/PaddlePaddle/Paddle/pull/34157), [#33777](https://github.com/PaddlePaddle/Paddle/pull/33777), [#33680](https://github.com/PaddlePaddle/Paddle/pull/33680), [#33662](https://github.com/PaddlePaddle/Paddle/pull/33662), [#33654](https://github.com/PaddlePaddle/Paddle/pull/33654))
-  - 增加TensorRT `layer_norm` plugin对动态shape的支持。([#33448](https://github.com/PaddlePaddle/Paddle/pull/33448))
-  - 增加TensorRT `hard_swish` plugin对动态shape的支持。([#35214](https://github.com/PaddlePaddle/Paddle/pull/35214))
-  - 增加TensoRT `reduce_sum` 和 `gather_nd` 的支持。([#33324](https://github.com/PaddlePaddle/Paddle/pull/33324))
-  - 增加TensorRT `qkv_context` plugin 对int8的支持([#34917](https://github.com/PaddlePaddle/Paddle/pull/34917), [#35504](https://github.com/PaddlePaddle/Paddle/pull/35504))
-  - 增加TensorRT conv3d的支持。([#35507](https://github.com/PaddlePaddle/Paddle/pull/35507))
-  - 增加对 `multihead_matmul` 融合算子的输入进行广播的支持。([#35780](https://github.com/PaddlePaddle/Paddle/pull/35780))
-  - Inference 支持 TensorRT8 稀疏推理，[测试环境](https://github.com/PaddlePaddle/Paddle-Inference-Demo/tree/master/c%2B%2B/sparsity)下，ERNIE 模型变长输入在不同的 batch_size 下性能提升10%-30%，ResNeXt101_32x4d模型在不同的batch_size下性能提升10%。([#36659](https://github.com/PaddlePaddle/Paddle/pull/36659))
+- 支持 `pool2d` 属性 `padding` 的维度为4、`global_pooling` 和 `ceil_mode` 为 True 情况下使用 TensorRT 推理。([#39545](https://github.com/PaddlePaddle/Paddle/pull/39545))
 
-- Nvidia Jetson 原生支持能力增强
-    - 新增 Op 支持，针对Jetson Nano/TX2这两款算力较低的设备，我们做了针对性的优化，目前新增了 `pool2d`, `pool_max`, `conv3d_transpose` 等 17个OP的支持。([#35378](https://github.com/PaddlePaddle/Paddle/pull/35378))
-    - 针对Jetson Nano，新增模型：DPN68, EfficientNetB0, ttfnet, fcn_hrnetw18, hardnet。([#35378](https://github.com/PaddlePaddle/Paddle/pull/35378))
-    - 针对Jetson TX2，新增模型：deeplabv3p_resnet50, deeplabv3_resnet50, fcn_hrnetw18, hardnet, pspnet, ttfnet, unet。([#35378](https://github.com/PaddlePaddle/Paddle/pull/35378))
+- 支持 batch_norm 和 elementwise_add 为5维时使用 TensorRT 推理。([#36446](https://github.com/PaddlePaddle/Paddle/pull/36446))
 
-- 昆仑XPU接口功能扩展 
-  - 新增 `set_xpu_device_id` 接口，支持设置推理时的昆仑芯片的设备号([#35572](https://github.com/PaddlePaddle/Paddle/pull/35572))
+- 新增 pool3d 使用 TensorRT 推理。([#36545](https://github.com/PaddlePaddle/Paddle/pull/36545), [#36783](https://github.com/PaddlePaddle/Paddle/pull/36783)) 
 
-- Inference python `copy_from_cpu`接口加入输入类型检查，错误类型输入下提前报错。([#36552](https://github.com/PaddlePaddle/Paddle/pull/36552))
+- 增加 `reduce` int32 和 float 类型使用 TensorRT 推理，增加 `reduce_mean` GPU 算子 int32、int64 注册。([#39088](https://github.com/PaddlePaddle/Paddle/pull/39088))
+
+- 修改 MatmulV2ToMul pass，修改限定条件（不支持广播）和 op_teller 映射条件。([#36652](https://github.com/PaddlePaddle/Paddle/pull/36652)) 
+
+- 增加 TenorRT plugin 接口 AddPluginV2IOExt 的支持 。([#36493](https://github.com/PaddlePaddle/Paddle/pull/36493)) 
+
+- 增加 roi_align op 中 aligned 属性并支持 TensorRT 推理。([#38905](https://github.com/PaddlePaddle/Paddle/pull/38905))
+
+- 增加 concat 属性 `axis = -1` 时支持 TensorRT 推理。([#39096](https://github.com/PaddlePaddle/Paddle/pull/39096)) 
+
+- 新增 TensorRT plugin ：preln_emb_eltwise_layernorm、 preln_skip_la、rnorm ops， 用于 ERNIE 类模型性能优化。([#39570](https://github.com/PaddlePaddle/Paddle/pull/39570))
+
+- 新增 TensorRT fuse pass：preln_embedding_eltwise_layernorm_fuse_pass, preln_skip_layernorm_fuse_pass，用于 ERNIE 类模型性能优化。([#39508](https://github.com/PaddlePaddle/Paddle/pull/39508))
+
+- 将 matmul 融合相关的 pass 基于不同的后端（GPU、CPU、TensorRT）拆开，支持 FC 权重的转置功能。([#39369](https://github.com/PaddlePaddle/Paddle/pull/39369))
+
+- 量化支持
+  
+  - `PostTrainingQuantization` API新增支持`paddle.io.DataLoader` 对象或者 `Python Generator`的输入。([#38686](https://github.com/PaddlePaddle/Paddle/pull/38686))
+  
+  - ERNIE 全量化模型推理支持 interleaved 数据排布。([#39424](https://github.com/PaddlePaddle/Paddle/pull/39424)) 
+  
+  - 支持 PaddleSlim 新量化模型格式推理。([#41049](https://github.com/PaddlePaddle/Paddle/pull/41049)) 
+  
+  - 新增 matmul int8 量化的推理 op converter 和 plugin。([#37285](https://github.com/PaddlePaddle/Paddle/pull/37285))
+  
+  - 新增判断模型所有 op 能否支持 int8 量化的 pass。([#36042](https://github.com/PaddlePaddle/Paddle/pull/36042))
+  
+  - 支持 multihead attention 非变长分支中 FC 部分的量化推理。([#39660](https://github.com/PaddlePaddle/Paddle/pull/39660)) 
+
+#### 昇腾NPU 相关功能
+
+- - 重构 shape 算子前向计算逻辑，支持在 NPU 上执行。([#39613](https://github.com/PaddlePaddle/Paddle/pull/39613))
+  
+  - 重构 reshape 算子前向计算逻辑，支持 ShapeTensor 输入。([#38748](https://github.com/PaddlePaddle/Paddle/pull/38748))
+  
+  - 模型权重加载时精度类型统一。([#39160](https://github.com/PaddlePaddle/Paddle/pull/39160))
 
 ### （3）问题修复
 
 #### 框架及API修复
 
-- 算子修复
-    - 修复split op当axis输入小于0时，转换TensorRT时会发生地址访问错误的情况，同时将axis等于0时静动态shape均不支持的情况进行过滤。([#35127](https://github.com/PaddlePaddle/Paddle/pull/35127))
-    - 修复transpose静态shape在axis为`[0, 1]`时错误的情况。([#35138](https://github.com/PaddlePaddle/Paddle/pull/35138))
-    - 修复 gather op与原生 paddle op的功能对齐，并完善 op teller 过滤的条件。([#35784](https://github.com/PaddlePaddle/Paddle/pull/35784))
-    - 修复fc op 的 int8 分支。([#34787](https://github.com/PaddlePaddle/Paddle/pull/34787), [#32671](https://github.com/PaddlePaddle/Paddle/pull/32671))
-    - 修复reshape 的 op teller 过滤条件。([#34787](https://github.com/PaddlePaddle/Paddle/pull/34787), [#34583](https://github.com/PaddlePaddle/Paddle/pull/34583))
-    - 修复recurrent op多线程推理效率差问题。([#36053](https://github.com/PaddlePaddle/Paddle/pull/36053))
-    - 修复gather和scatter op中int值溢出的问题。([#35544](https://github.com/PaddlePaddle/Paddle/pull/35544))
-    - 修复 ctc op 除零错误。 ([#34724](https://github.com/PaddlePaddle/Paddle/pull/34724))
-    - 修复模型输入包含bool类型时，插入scale op导致的崩溃。([#35176](http://github.com/PaddlePaddle/Paddle/pull/35176))
-    - 修复复数scaler 和Tensor 运算失败的问题。([#33699](https://github.com/PaddlePaddle/Paddle/pull/33699))
+- 修复保存静态图时模型剪裁的问题。([#37579](https://github.com/PaddlePaddle/Paddle/pull/37579))
 
-- 框架功能修复
-    - 修复部分ernie模型批处理数据时显存访问越界的问题。([#35077](https://github.com/PaddlePaddle/Paddle/pull/35077))
-    - 修复ernie模型FP16精度运行时可能出现的精度问题。([#34771](https://github.com/PaddlePaddle/Paddle/pull/34711))
-    - 修复ernie变长情况下，输入的顺序不一致导致输出不对的问题。([#33575](https://github.com/PaddlePaddle/Paddle/pull/33575))
-    - 修复多流状态下分配器功能异常的问题。([#32932](https://github.com/PaddlePaddle/Paddle/pull/33575))
+- C API 增加对的字符串的封装 PD_Cstr，并提供构造和析构的方式，避免用户直接使用 C 运行时库来析构字符串。 ([#38667](https://github.com/PaddlePaddle/Paddle/pull/38667)) 
 
-- 修复 ERNIE 模型在 TRT8 下可能出现的崩溃问题。([#36769](https://github.com/PaddlePaddle/Paddle/pull/36769))
-- 修复使用 Pool, Slice 时可能出现的崩溃及精度问题。([#36666](https://github.com/PaddlePaddle/Paddle/pull/36666))
-- 修复 yolo_box op因为计算公式错误导致的精度问题。([#36365](https://github.com/PaddlePaddle/Paddle/pull/36365))
-- 修复量化后的 matmul_v2 在TRT下无法正常推理的问题。([#36821](https://github.com/PaddlePaddle/Paddle/pull/36821))
-- 修复了量化 matmul_v2 时错误地添加量化op的问题。([#36820](https://github.com/PaddlePaddle/Paddle/pull/36820))
-- 修复算子 batch_norm 和 elementwise_add 在3D应用场景下开启 TRT 报错的问题。([#36446](https://github.com/PaddlePaddle/Paddle/pull/36446))
-- 修复高层 linear api保存得到的预测模型无法被 Pass 融合优化的问题。([#36500](https://github.com/PaddlePaddle/Paddle/pull/36500))
-- 修改 MatmulV2ToMul 的 Pass，重新限定 (matmul_v2 to mul) 映射的 Pass，增加 MatmulV2ToMatmul 的 Pass，限定 (matmul_v2 to matmul) 映射的 Pass条件(不支持广播)，修改 (matmul, mul) 的 op_teller 映射条件。([#36652](https://github.com/PaddlePaddle/Paddle/pull/36652))
+- 修复预测时内存复用的逻辑问题。([#37324](https://github.com/PaddlePaddle/Paddle/pull/37324))
 
+- 修复多线程下内存复用报错问题。([#37894](https://github.com/PaddlePaddle/Paddle/pull/37894)) 
+
+- 在没有权重文件时，允许传递空字符串进行推理。([#38579](https://github.com/PaddlePaddle/Paddle/pull/38579))
+
+- 修复开启 TensorRT dynamic shape 后不支持 clone 问题。([#38520](https://github.com/PaddlePaddle/Paddle/pull/38520))
+
+- 修复开启 TensorRT dynamic shape 后多线程 clone 报错问题。([#40067](https://github.com/PaddlePaddle/Paddle/pull/40067))
+
+- 修复 TensorRT engine 析构问题。([#35842](https://github.com/PaddlePaddle/Paddle/pull/35842), [#35938](https://github.com/PaddlePaddle/Paddle/pull/35938))
+
+- lite xpu 接口修复无法选择 xpu 卡的问题。([#36610](https://github.com/PaddlePaddle/Paddle/pull/36610)) 
+
+- TensorRT 动态 shape 参数自动生成接口增加文件存在性检查。([#36628](https://github.com/PaddlePaddle/Paddle/pull/36628))
 
 #### 后端能力修复
 
-- TensorRT 子图引擎修复
-    - 修复TensorRT动态shape时slice plugin的ends参数越界报错问题。([#35357](https://github.com/PaddlePaddle/Paddle/pull/35357))
-    - 修复reduce op转换TensorRT的reduce_all = 1时候不支持keepdim=false的情况。([#35145](https://github.com/PaddlePaddle/Paddle/pull/35145))
-    - 修复slice op转换TensorRT的decrease_axis参数问题。([#35100](https://github.com/PaddlePaddle/Paddle/pull/35100))
-    - 修复nearest_interp op转换TensorRT动态shape下scale为负数不支持的情况。修正scale比outh和outw有更高优先级。([#35405](https://github.com/PaddlePaddle/Paddle/pull/35405))
-    - 修复pad op的paddings参数和tensorrt不一样的问题。([#35371](https://github.com/PaddlePaddle/Paddle/pull/35371))
-    - 添加conv2d op转换TensorRT的4维padding支持，过滤conv2d op转换TensorRT时padding_algorithm 为 SAME 和 VALID 的情况。([#35627](https://github.com/PaddlePaddle/Paddle/pull/35627))
-    - 添加pool2d op转换TensorRT时对padding_algorithm 为 SAME 的处理，过滤 exclusive mode下 ksize 小于等于 padings 的情况。([#35923](https://github.com/PaddlePaddle/Paddle/pull/35923))
-    - 修复clip op转换TensorRT时不支持 Min和Max 输入的情况。([#35694](https://github.com/PaddlePaddle/Paddle/pull/35694))
-    - 修复gelu op转换TensorRT时不支持 approximate 属性的情况。([#35529](https://github.com/PaddlePaddle/Paddle/pull/35529))
-    - 修复affine_channel转换TensorRT时不支持2维输入的情况。([#35496](https://github.com/PaddlePaddle/Paddle/pull/35496))
-    - 修复TensorRT子图匹配不稳定的问题。([#35147](https://github.com/PaddlePaddle/Paddle/pull/35147))
-    - 修复预测引擎析构后，TensorRT engine没有释放的问题。([#35842](https://github.com/PaddlePaddle/Paddle/pull/35842), [#35938](https://github.com/PaddlePaddle/Paddle/pull/35938))
-    - paddle-trt static模式下，如果reshape的shape属性 batch维度为-1，修复paddle算子错误转换为trt的问题。([#34007](https://github.com/PaddlePaddle/Paddle/pull/34007))
-    - 修复roi_align 转换TensorRT不支持RoisNum属性的情况，同时修复在动态shape时aligned 为True、Sampling_ratio = -1计算错误的情况。([#35549](https://github.com/PaddlePaddle/Paddle/pull/35549))
-    - 修复concat 转换TensorRT不支持AxisTensor属性的情况。([#35545](https://github.com/PaddlePaddle/Paddle/pull/35545))
-    - 修复scale 转换TensorRT不支持ScaleTensor属性以及静态shape 不支持1维输入的情况。([#35225](https://github.com/PaddlePaddle/Paddle/pull/35225))
-    - 修复batchnorm 转换TensorRT不支持MomentumTensor属性的情况。([#35527](https://github.com/PaddlePaddle/Paddle/pull/35527))
-    - 修复reshape 转换TensorRT不支持ShapeTensor 、Shape属性以及静态shape 不支持1维输入的情况。([#35166](https://github.com/PaddlePaddle/Paddle/pull/35166))
-    - 增加 TensorRT tile 算子支持。([#34388](https://github.com/PaddlePaddle/Paddle/pull/34388))
-    - 增加 TensorRT reduce mean 算子支持。([#34204](https://github.com/PaddlePaddle/Paddle/pull/34204))
-    - 修复使用gather op时可能出现的崩溃问题。([#33999](https://github.com/PaddlePaddle/Paddle/pull/33999))
-    - 修复 TensorRT int8 的一个错误使用 debug 的 flag（会只运行 int8的 kernel，导致性能下降）。([#34704](https://github.com/PaddlePaddle/Paddle/pull/34704))
-    - 修复gather_nd op在2维输入调用TensorRT时计算错误问题。([#35464](https://github.com/PaddlePaddle/Paddle/pull/35464))
-    - 修复hard_sigmoid op在2维输入调用TensorRT时计算错误问题。([#35908](https://github.com/PaddlePaddle/Paddle/pull/35908))
-    - 修复prelu op在2维输入调用TensorRT时计算错误问题。([#35512](https://github.com/PaddlePaddle/Paddle/pull/35512))
-    - 修复windows下 TensorRT 推理时，有用右斜杠作为路径分隔符导致的崩溃问题。([#33853](http://github.com/PaddlePaddle/Paddle/pull/33853))
-
-
-#### 其他修复
-
-- 修复裁剪反向算子脚本遇到中文字符注释出错的问题。([#33937](https://github.com/PaddlePaddle/Paddle/pull/33937), [#33919](https://github.com/PaddlePaddle/Paddle/pull/33919))
-- 修复编译时单测模型下载不全导致单测推理时的错误，增加测试模型下载的 MD5下载验证。([#33264](https://github.com/PaddlePaddle/Paddle/pull/33264), [#33217](https://github.com/PaddlePaddle/Paddle/pull/33217))
-- 修复 blazeface model 中mkldnn elementwise op 不支持 broadcast 问题。([#33549](https://github.com/PaddlePaddle/Paddle/pull/33549))
-- 修复 swin_transformer mkldnn 推理报错问题。([#35740](https://github.com/PaddlePaddle/Paddle/pull/35740))
-- 修复 paddlex.deploy.Predictor oneDNN多线程执行 unet 报错问题。([#35231](https://github.com/PaddlePaddle/Paddle/pull/35231))
-- 修复 oneDNN setCacheCapacity无法限制内存问题。([#33571](https://github.com/PaddlePaddle/Paddle/pull/33571))
-
-
-
-
-## 环境适配
+- 修复预测时 cuDNN 默认算法选择配置，使用非 deterministic 策略。 ([#41491](https://github.com/PaddlePaddle/Paddle/pull/41491))
+
+- 修复 deformable_conv op 在 TensorRT plugin 资源回收处理错误的问题。 ([#38374](https://github.com/PaddlePaddle/Paddle/pull/38374)) 
+
+- 修复 deformable_conv op 在 TensorRT plugin 序列化错误问题。 ([#38057](https://github.com/PaddlePaddle/Paddle/pull/38057)) 
+
+- 适配 TensorRT 8.0 新的构建引擎和系列化 API。 ([#36769](https://github.com/PaddlePaddle/Paddle/pull/36769)) 
+
+- 修复 Flatten2MatmulFusePass、Squeeze2MatmulFusePass、Reshape2MatmulFusePass 没有生效问题。([#37644](https://github.com/PaddlePaddle/Paddle/pull/37644)) 
+
+- 修复 TensorRT 输入数据在上时报错的问题。([#37427](https://github.com/PaddlePaddle/Paddle/pull/37427))
+
+- 增加输入维度错误时的报错信息。([#38962](https://github.com/PaddlePaddle/Paddle/pull/38962))
+
+- 修复 EmbEltwiseLayernorm 输出类型错误的问题。 ([#40015](https://github.com/PaddlePaddle/Paddle/pull/40015))
+
+- 删除 conv_affine_channel_fuse_pass 以及对应的单元测试。([#39817](https://github.com/PaddlePaddle/Paddle/pull/39817))
+
+- 修复 adaptive_pool2d pass 错误替换 pool 属性的问题。([#39600](https://github.com/PaddlePaddle/Paddle/pull/39600))
+
+- 修复 shuffle_channel_detect_pass 错误生成 shuffle_channel op 的问题。([#39242](https://github.com/PaddlePaddle/Paddle/pull/39242))
+
+- 修复 transpose 参数错误。([#39006](https://github.com/PaddlePaddle/Paddle/pull/39006))
+
+- 修复 nearest_interp_v2 输入 scale 维度小于1时崩溃的问题。([#38725](https://github.com/PaddlePaddle/Paddle/pull/38725))
+
+- 修复 prelu 在 dynamic shape 时不支持一维输入的问题。([#39389](https://github.com/PaddlePaddle/Paddle/pull/39389))
+
+- 修复 slice 的 special_slice_plugin 的核函数计算错误的问题。([#39875](https://github.com/PaddlePaddle/Paddle/pull/39875)) 
+
+- 暂时禁用 skip_layernorm 变长下的 int8 分支，防止精度下降。([#39991](https://github.com/PaddlePaddle/Paddle/pull/39991)) 
+
+- 修复关于支持 preln_ernie 模型的一些 bug。([#39733](https://github.com/PaddlePaddle/Paddle/pull/39733))
+
+- 修复 slice 在 ERNIE 中 threads 可能超过限制的 bug，修复 spacial_slice 误触的 bug。([#39096](https://github.com/PaddlePaddle/Paddle/pull/39096)) 
+
+- 修复 elementwise 在维度相同时不支持广播的问题。([#37908](https://github.com/PaddlePaddle/Paddle/pull/37908)) 
+
+- 修复 nearest_interp op 当 align_corners 为 True 时，TensorRT layer 的结果和原生 op 的结果有 diff，底层实现不一样。([#37525](https://github.com/PaddlePaddle/Paddle/pull/37525)) 
+
+- 修复qkv_plugin: 核函数计算错误。([#37096](https://github.com/PaddlePaddle/Paddle/pull/37096)) 
+
+- 修复动态量化的推理 pass 的问题。([#35879](https://github.com/PaddlePaddle/Paddle/pull/35879)) 
+
+- 当 Tensor 请求的内存容量低于已分配的 size 时直接复用。([#37880](https://github.com/PaddlePaddle/Paddle/pull/37880))
+
+- 修复 ERNIE 定长模型开启 TensorRT 出现的 hang 问题。([#37839](https://github.com/PaddlePaddle/Paddle/pull/37839))
+
+- 修复 TensorRT int8 时缺失 dynamic range 信息崩溃问题。([#36900](https://github.com/PaddlePaddle/Paddle/pull/36900))
+
+- 修复 slice 反序列化代码问题。([#36588](https://github.com/PaddlePaddle/Paddle/pull/36588))
+
+- 修复 yolo box 计算公式错误问题。([#36240](https://github.com/PaddlePaddle/Paddle/pull/36240))
+
+- 修复老版本模型在使用新版本 roi_align 时崩溃问题。([#38788](https://github.com/PaddlePaddle/Paddle/pull/38788)) 外部开发者
+
+- 修复 softmax 在 python 和 C++上性能差异较大的问题。([#37130](https://github.com/PaddlePaddle/Paddle/pull/37130)) 
+
+- 修复 matmul 在静态 shape 2维输入和动态 shape 3维输入情况下推理失败问题。([#36849](https://github.com/PaddlePaddle/Paddle/pull/36849))
+
+- 修复 reshape_transpose_matmul_mkldnn_fuse_pass 对 shape 处理不当问题。([#36731](https://github.com/PaddlePaddle/Paddle/pull/36731))
+
+- 修复输入为2维，但 TensorRT 获取到4维的问题。([#36614](https://github.com/PaddlePaddle/Paddle/pull/36614))
+
+- 修复 interpolate_v2 MKLDNN 算子在 scale 属性为空时报错问题。([#36623](https://github.com/PaddlePaddle/Paddle/pull/36623))
+
+- 修复 recurrent 算子在多线程场景性能差问题。([#36052](https://github.com/PaddlePaddle/Paddle/pull/36052))
+
+- 移除 relu、sigmoid、tanh、relu6、batch_norm、clip、concat、gelu、hard_sigmoid、prelu、softmax、split、swish 对 TensorRT 2维输入的限制。([#37097](https://github.com/PaddlePaddle/Paddle/pull/37097))
+
+- 修复 reshape op 使用 TensorRT 推理。([#41090](https://github.com/PaddlePaddle/Paddle/pull/41090))
+
+- 修复 matmul 相关 pass，兼容 matmul_v2。([#36424](https://github.com/PaddlePaddle/Paddle/pull/36424))
+
+- 开启 TensorRT 时，conv2d 算子中 padding 方式支持 VALID 及 SAME 属性。([#38999](https://github.com/PaddlePaddle/Paddle/pull/38999))
+
+- 修复 MKLDNN 多输入算子量化问题。([#39593](https://github.com/PaddlePaddle/Paddle/pull/39593), [#39346](https://github.com/PaddlePaddle/Paddle/pull/39346), [#40717](https://github.com/PaddlePaddle/Paddle/pull/40717)) 
+
+- 修复 MKLDNN 量化场景下 conv+activation 的 scale 错误问题。([#38331](https://github.com/PaddlePaddle/Paddle/pull/38331)) 
+
+- 修复 MKLDNN 无参数算子量化中，根据后续算子量化情况不同需做不同处理的问题。([#39342](https://github.com/PaddlePaddle/Paddle/pull/39342)) 
+
+- 修复 MKLDNN cpu_bfloat16_placement_pass 中的数据类型相关问题。([#38702](https://github.com/PaddlePaddle/Paddle/pull/38702)) 
+
+- 修复 MKLDNN bfloat16 推理中 split 算子执行问题。([#39548](https://github.com/PaddlePaddle/Paddle/pull/39548)) 
+
+- 修复 MKLDNN matmul_v2 算子不支持6维问题。([#36342](https://github.com/PaddlePaddle/Paddle/pull/36342), [#38665](https://github.com/PaddlePaddle/Paddle/pull/38665)) 
+
+- 修复 MKLDNN matmul_v2_transpose_reshape 中的 MKLDNN DeviceContext 错误问题。([#38554](https://github.com/PaddlePaddle/Paddle/pull/38554)) 
+
+- 修复分割模型在 MKLDNN 推理场景计算结果错误问题。([#37310](https://github.com/PaddlePaddle/Paddle/pull/37310)) 
+
+- 修复 MKLDNN bfloat16 placement 算子列表并添加缺失算子。([#36291](https://github.com/PaddlePaddle/Paddle/pull/36291)) 
+
+- 修复 MKLDNN 算子的格式问题，包括: FC、conv_transpose、6维 Tensor 报错问题、conv 对 `NHWC` 输入的输出 format 错误问题。([#38890](https://github.com/PaddlePaddle/Paddle/pull/38890), [#37344](https://github.com/PaddlePaddle/Paddle/pull/37344), [#37175](https://github.com/PaddlePaddle/Paddle/pull/37175), [#38553](https://github.com/PaddlePaddle/Paddle/pull/38553), [#40049](https://github.com/PaddlePaddle/Paddle/pull/40049), [#39097](https://github.com/PaddlePaddle/Paddle/pull/39097)) 
+
+- 修复 MKLDNN 多线程推理场景因 cache 机制报错问题。([#36290](https://github.com/PaddlePaddle/Paddle/pull/36290), [#35884](https://github.com/PaddlePaddle/Paddle/pull/35884)) 
+
+- 修复 MKLDNN 因 matmul 及 FC 引起的量化模型精度异常问题。([#38023](https://github.com/PaddlePaddle/Paddle/pull/38023), [#37618](https://github.com/PaddlePaddle/Paddle/pull/37618)) 
+
+- 修复 MKLDNN 量化转换脚本因 pass 缺少引起的量化模型精度异常问题。([#37619](https://github.com/PaddlePaddle/Paddle/pull/37619), [#40542](https://github.com/PaddlePaddle/Paddle/pull/40542),  
+  [#38912](https://github.com/PaddlePaddle/Paddle/pull/38912)) 
+
+- 修复 MKLDNN 开启量 op 因为数据类型不匹配崩溃的问题。([#38133](https://github.com/PaddlePaddle/Paddle/pull/38133))
+
+- 修复 MKLDNN 某些 op 修改 layout 后需要改回原 layout 的问题。([#39422](https://github.com/PaddlePaddle/Paddle/pull/39422))
+
+- 修复针对昇腾910推理场景下，由于未释放 GIL 锁，导致与昇腾软件栈冲突，python API 下报错的问题。 ([#38605](https://github.com/PaddlePaddle/Paddle/pull/38605)) 
+
+## 5. 环境适配
 
 ### 编译安装
-- Windows 全新支持 `Ninja编译构建方式`，编译速度、易用性、稳定性都较VS IDE方式有很好提升，Windows用户可`pip install ninja`，进行本地源码编译Paddle。([#31161](https://github.com/PaddlePaddle/Paddle/pull/31161), [#31449](https://github.com/PaddlePaddle/Paddle/pull/31449), [#32987](https://github.com/PaddlePaddle/Paddle/pull/32987), [#33140](https://github.com/PaddlePaddle/Paddle/pull/33140), [#33155](https://github.com/PaddlePaddle/Paddle/pull/33155))
-- 发版镜像中只保留python3.7，删除了python3.5、python3.6、python3.8、python3.9及相应python版本的paddle包，缩小镜像大小。镜像大小缩小30%~50%。([#32688](https://github.com/PaddlePaddle/Paddle/pull/32688))
-- TensorRT库为推理时使用，发版镜像中仅paddle训练基础功能，不需要支持TensorRT。删除了发版镜像中的TensorRT库，避免用户错误使用该镜像。([#34266](https://github.com/PaddlePaddle/Paddle/pull/34266))
+
+- 飞桨在 PIP 源上发布的安装包 CUDA 架构调整为11.0，如需安装其他 CUDA 版本的 paddle，请到 [飞桨官网](https://www.paddlepaddle.org.cn/install/quick) 进行下载。
+
+- 飞桨2.3.0-rc0 PIP 源发布的 CUDA11.0的安装包新增了对 Ampere 架构的支持。GPU 架构为8.0或8.6的用户可以直接通过 `pip install paddlepaddle-gpu`的方式进行升级。
+
+- 从2.3.0-rc0版本开始，飞桨对框架支持的 GPU 架构种类进行了调整和升级。(更多请参考: [飞桨支持的 GPU 架构](https://www.paddlepaddle.org.cn/documentation/docs/zh/2.3rc/install/Tables.html#gpu))
+
+备注：
+
+- PIP 源安装是指用 `pip install paddlepaddle` 或 `pip install paddlepaddle-gpu`从 PIP 官网下载安装包及依赖库的安装方式，支持架构种类少，安装包更轻量，下载源来自国外（相比bos源支持架构种类精简，安装包更轻量，只提供一种 CUDA 版本的安装包）。
+  
+  - 2.3版本之前，飞桨 PIP 源安装包（CUDA10.2）支持的 GPU 架构为：3.5, 5.0, 5.2, 6.0, 6.1, 7.0, 7.5。
+  
+  - 2.3版本之后，飞桨 PIP 源安装包（CUDA11.0）支持的 GPU 架构为：6.0, 6.1, 7.0, 7.5, 8.0
+
+- 飞桨官网 bos 源是指从飞桨官网下载安装包及依赖库的安装方式，支持的 GPU 架构更多，下载源来自国内，速度较快。（相比PIP源支持架构种类多，提供多个 CUDA 版本的安装包）：
+  
+  - 2.3版本之前，飞桨官网 bos 源安装包支持的 GPU 架构：
+    
+    - CUDA10 : 3.5, 5.0, 5.2, 6.0, 6.1, 7.0, 7.5；
+    
+    - CUDA11 : 5.2，6.0，6.1，7.0，7.5，8.0。
+  
+  - 2.3版本之后，飞桨官网 bos 源安装包支持的 GPU 架构
+    
+    - CUDA10 : 3.5, 5.0, 5.2, 6.0, 6.1, 7.0, 7.5；
+    
+    - CUDA11 : 3.5, 5.0, 6.0, 6.1, 7.0, 7.5, 8.0。
+
+- Windows 平台支持 Visual Studio 2019 编译。 ([#38719](https://github.com/PaddlePaddle/Paddle/pull/38719)) 
+
+- 消除 Windows 平台编译时出现的各种 warning。 ([#38034](https://github.com/PaddlePaddle/Paddle/pull/38034), [#37890](https://github.com/PaddlePaddle/Paddle/pull/37890), [#37442](https://github.com/PaddlePaddle/Paddle/pull/37442), [#37439](https://github.com/PaddlePaddle/Paddle/pull/37439), [#36857](https://github.com/PaddlePaddle/Paddle/pull/36857)) 
+
+- 修复底层数据结构升级引入的 jetson 编译问题。 ([#39669](https://github.com/PaddlePaddle/Paddle/pull/39669), [#39441](https://github.com/PaddlePaddle/Paddle/pull/39441))
 
 ### 新硬件适配
 
-- 海光DCU芯片训练和推理支持，支持模型数量达9个分类70个模型。
-    - 海光DCU新增 PaddleDetection 模型支持5个。
-    - 海光DCU新增 PaddleGAN 模型支持6个。
-    - 海光DCU新增 PaddleSeg 模型支持13个。
-    - 海光DCU新增 PaddleNLP 模型支持3个。
-    - 海光DCU新增 PaddleOCR 模型支持4个。
-    - 海光DCU新增 PaddleVideo 模型支持3个。
-- 昆仑芯第2代芯片(XPU-2)训练支持，支持ResNet50、SSD、Bert、Transformer等多个模型 ，支持静态图+动态图训练，支持混合精度训练。
+- 自定义新硬件接入：提供一种插件式扩展 PaddlePaddle 硬件后端的方式。通过该功能，开发者无需为特定硬件修改 PaddlePaddle 代码，只需实现标准接口，并编译成动态链接库，则可作为插件供 PaddlePaddle 调用。降低为 PaddlePaddle 添加新硬件后端的开发难度。当前支持自定义 Runtime 接入和自定义 Kernel 接入。
 
-## Thanks to our Contributors
+- 华为 NPU 芯片（Ascend910）训练/推理支持，支持ResNet50、YoloV3、BERT、Transformer等多个模型，支持静态图与混合精度训练，支持单卡、单机、多机分布式训练。
+
+- Graphcore IPU芯片（包括IPU Mk2 GC200 和 Bow IPU）训练/推理支持，支持ResNet50、BERT等模型，支持静态图训练，支持单芯片、单机、多机分布式训练。
+
+- 寒武纪MLU芯片（MLU370x4）训练/推理支持，支持ResNet50等模型，支持静态图+动态图训练，支持混合精度训练，支持单卡、单机、多机分布式训练。
 
-This release contains contributions from:
+- 昆仑芯2代芯片（昆仑芯 AI加速卡 R200、R300）训练/推理支持，支持ResNet50、YoloV3、OCR-DB、SSD、MobilnetV3、UNet、BERT、Transformer、GPT-2、Wide&Deep、DeepFM，支持静态图+动态图训练，支持混合精度训练，支持单机单卡、单机多卡训练。
+
+## Thanks to our Contributors
 
-0x45f, 123malin, Adam Osewski, Aganlengzi, Aurelius84, Baibaifan, Bo Liu, CheQiXiao, Chen Long, Chen Weihang, CtfGo, Double\_V, Ethanzjp, Fan Zhang, Feiyu Chan, Feng Xing, From00, GT-Zhang, Guanghua Yu, Guoxia Wang, Haipeng Wang, Hao Lin, Haohongxiang, Hui Zhang, Huihuang Zheng, HydrogenSulfate, IMMORTAL, JYChen, JZ-LIANG, Jacek Czaja, Jack Zhou, Jackwaterveg, Jeng Bai-Cheng, Jiangxinz, Jiaqi Liu, Jiawei Wang, JingZhuangzhuang, June Weng, Kaipeng Deng, Kqnonrime, LJQ❤️, Leo Chen, Li Min, LielinJiang, Lijunhui, Linjie Chen, Liu-xiandong, LiuWei, Ming-Xu Huang, MissPenguin, PaddlePM, Pei Yang, Peihan, Qi Li, QingshuChen, Ren Wei (任卫), Roc, Shang Zhizhou, ShenLiang, Shibo Tao, Siming Dai, Sing\_chan, TCChenLong, TTerror, TeslaZhao, Thomas Young, Thunderbrook, Tongxin Bai, WJJ1995, WangXi, Wangzheee, Wei Shengyu, WeiXin, Weilong Wu, Wenyu, Wilber, XGZhang, XYZ, XYZ916829, XiangGao, Xiaoxu Chen, YUNSHEN XIE, Yanxing Shi, Yiqun Liu, YuanRisheng, Yuang Liu, Yulong Ao, Zeng Jinle, Zhang Ting, Zhang Zheng, Zhanlue Yang, Zhen Wang, Zhong Hui, Zhou Wei, andreazanetti, andyjpaddle, arlesniak, baoachun, cc, ceci3, chajchaj, chenenquan, chenjian, chentianyu03, crystal, cuicheng01, danleifeng, denglin-github, duanboqiang, dyning, feng626, feng_shuai, furnace, gongweibao, heliqi, hlygit66666, hong, hong19860320, houj04, huangjun12, huangxu96, huzhiqiang, iducn, jakpiase, jiangcheng, joanna.wozna.intel, jzhang533, kuizhiqing, levi131, lidanqing, lilong12, limingshu, littletomatodonkey, liu zhengxi, liutiexing, liuyuhui, liym27, lyuwenyu, lzzyzlbb, niuliling123, pangyoki, parap1uie-s, ronnywang, root, seemingwang, shangliang Xu, shiyutang, smallv0221, sunli, sunzhongkai588, taixiurong, tangwei12, tianshuo78520a, veyron95, wangguanqun, wangguanzhong, wanghuancoder, wangna11BD, wangxinxin08, wangzhen38, wangzhuang01, wawltor, wenbin, whs, will-jl944, wuhuachaocoding, wuhuanzhou, xiaoting, xiaoxiaohehe001, xiayanming, xiegegege, xiemoyuan, xiongkun, yaoxuefeng, yeliang2258, yingyibiao, zhangbo9674, zhangchunle, zhangkaihuo, zhaoyingli, zhiboniu, zhoujun, zhouzj, zhulei, zhupengyang, zlsh80826, zmx, zyfncg, 李季, 津, 王明冬, 石晓伟
+This release contains contributions from the project core team as well as :
 
+Adam Osewski, Allen Guo, arlesniak, chenenquan, chenyanlann, fengkuangxiaxia, fuqianya, fwenguang, guguguzi, helen88, houj04, Jacek Czaja, jakpiase, jianghaicheng, joanna.wozna.intel, joeqiao12, Leo Chen, Leo Guo, Li-fAngyU, lidanqing, Liyulingyue, Matsumoto GAO, maxhuiy, Ming-Xu Huang, Nyakku Shigure, piotrekobi, piotrekobiIntel, QingshuChen, qipengh, Skr Bang, Sylwester Fraczek, Sławomir Siwek, taixiurong, tanzhipeng, Tomasz Socha, TTerror, Webbley, yaozhixin, ykkk2333, yujun, Zhangjingyu06, zhangxiaoci, zhangyikun02, zhangyk0314, zlsh80826, zn, Zuza

From 8f81716ef2e6d3a5310874dc6565958967c9c5be Mon Sep 17 00:00:00 2001
From: Chen Long <1300851984@qq.com>
Date: Sat, 30 Apr 2022 15:59:29 +0800
Subject: [PATCH 02/11] Update release_note_en.md

---
 docs/release_note_en.md | 3086 +++++++++++++++++++++++++--------------
 1 file changed, 2018 insertions(+), 1068 deletions(-)

diff --git a/docs/release_note_en.md b/docs/release_note_en.md
index 71a9e7512bf..39b34923d5b 100644
--- a/docs/release_note_en.md
+++ b/docs/release_note_en.md
@@ -1,1195 +1,2145 @@
-﻿
-# 2.2.2 Release Note
 
-## 1. Important Updates
+# 2.3.0-rc0 Release Note
 
-This version fixed some function and performance issues of PaddlePaddle 2.2.1 and optimized some functions. 
+## 1. **Important Updates**
 
-## 2. Training Framework (distributed included)
+We are pleased to announce the release the PaddlePaddle Framework V2.3.0-rc0. This version contains the following important updates.
 
-### （1）New functions
+### API
 
-#### API
+- Added more than 100 new APIs, covering automatic differentiation, linear algebra, probability distribution, sparse tensor, framework performance analysis, hardware device management, vision domain, etc.
+  
+- Added 4 new automatic differentiation APIs, 11 new linear algebra APIs, and 21 new probability distribution APIs to better support use cases in scientific computing, reinforcement learning, xand other application areas.
+  
+- Added 11 new Sparse Tensor APIs including basic functions of sparse tensor construction and conversion. The COO and CSR formats are supported.
+  
+- Added 9 new framework performance analysis APIs. The new performance profiling APIs, centered around Paddle.Profiler.Profiler, help users collect and analyze performance statistics during training and inference.
+  
+- Added 7 APIs for device management, facilitating hardware information acquistion.
+  
+- Added several visual and text domain APIs to facilitate ~~the~~ reusability of MobileNetV3, ResNeXt and other backbone networks, to achieve the fast networking.
+  
 
-- Add the `paddle.nn.Mish` and `paddle.nn.functional.mish` which support the element-by-element calculation of the mish activation function. ([#38803](https://github.com/PaddlePaddle/Paddle/pull/38803))
+### **Paddle** HIgh reusability operator l**ibrary**
 
-#### Others
+- We anounce PHI as the new Paddle HIgh reusability operator library. PHI provides Primitive API, enabling kernel reuse for operator development. As a refactored functional operator library, PHI aims to solve legacy problems that harm the framework's performance and reusability, in particular on the operator development. Such problems include inefficient ways of cross using operators, unclear operator interfaces and lacking direct calls to the operator library in C++. With PHI, new operators can be easily implemented by composing functions available in the functional library. The library provides over 200 C++ operator class APIs and nearly 500 kernels. Composing new operators through these built-in functions can greatly reduce the user's development effort. PHI supports different types of hardware (e.g., GPU and XPU). In addition, PHI is extensible with plugins for accommodating third party accelerators (such as NPU) in a low cost and reusable fashion. In short, PHI supports low level operator composabilty, the reuse of kernels through Primitives, and accelerators through plugins.
 
-- The `paddle.nn.PReLU`, `paddle.nn.functional.prelu`, and `paddle.nn.static.prelu` newly support the `data_format` parameter. You can set input data type. ([#38495](https://github.com/PaddlePaddle/Paddle/pull/38495))
-- The `paddle.index_select` supports `float16` data type. ([#38751](https://github.com/PaddlePaddle/Paddle/pull/38751))
-- Optimize error message of `paddle.multiplex` when tensor `size` in `inputs` is 0. ([#38757](https://github.com/PaddlePaddle/Paddle/pull/38757))
-- Add initialization parameter `data_loader` for `paddle.fluid.contrib.slim.quantization.PostTrainingQuantization`, and support input of the `paddle.io.DataLoader` object or Python Generator. ([#38729](https://github.com/PaddlePaddle/Paddle/pull/38729))
+### **Distributed Training**
 
-### （2）Bug Fixes
+- Fully upgrade the adaptive distributed training architecture, including multiple modules such as elastic resource management, asynchronous pipelined executor, heterogeneous communication, and automatic parallelism, and support the hard-aware distributed training and inference under a variety of heterogeneous hardware.
+  
+- Add MoE parallel strategy, GroupSharded parallel strategy, and Pure FP16 under dynamic graph hybrid Parallelism, which further supports the efficient distributed training of large models under the dynamic graph.
+  
+- Comprehensively upgrade and optimize the architecture of general heterogeneous parameter server, and simplify each module, such as communication and storage, to improve the secondary development experience of parameter server. The performance of GPU parameter server is improved by 2.38 times under 100 billion parameters and 10 billion data.
+  
 
-#### API
+### **Compile and Install**
+
+- The CUDA architecture of the installation package on the PIP source is adjusted to V11.0. If you need to install other CUDA versions please visit the [PaddlePaddle website - Installation](https://www.paddlepaddle.org.cn/install/quick) to download and install.
+  
+- From version 2.3.0-rc0, PaddlePaddle upgrades GPU architectures supported.
+  
 
-- Fix operation error of `paddle.max` in input of `x.ndim > 6 and axis < 0`. ([#38070](https://github.com/PaddlePaddle/Paddle/pull/38070))
-- Fix bug of `paddle.max` and `paddle.min`: Result is incorrect on the CPU device when the parameter axis is the list type and `len(axis) == x.ndim and axis[i] < 0`. ([#38478](https://github.com/PaddlePaddle/Paddle/pull/38478))
-- Fix bug that `paddle.nn.functional.unfold` does not distinguish between compile time and runtime in InferShape calculation. ([#38925](https://github.com/PaddlePaddle/Paddle/pull/38925)) ([#38834](https://github.com/PaddlePaddle/Paddle/pull/38834))
-- Fix bug where GPU unnecessarily synchronizes with the CPU when `paddle.nn.functional.cross_entropy` checks `labels`. （[#38849](https://github.com/PaddlePaddle/Paddle/pull/38849)）
-- Fix bug of input gradient result error in backward computing when `paddle.distributed.split` slices the FC along columns. ([#38724](https://github.com/PaddlePaddle/Paddle/pull/38724))
-- Fix bug where `paddle.nn.Layer.to` does not support `paddle.dtype` type. ([#38108](https://github.com/PaddlePaddle/Paddle/pull/38108))
-- Fix bug that output tensor's shape is different between dynamic and static graphs when `full_matrics=True` in `paddle.linalg.svd` under static graphs. ([#37744](https://github.com/PaddlePaddle/Paddle/pull/37744))
-- Fix bug of the result dimension exception when the `Tensor` slice index uses multiple None type indexes. ([#37400](https://github.com/PaddlePaddle/Paddle/pull/37400))
-- Fix memory leak bug of `Tensor` index assignment in some scenarios. ([#38098](https://github.com/PaddlePaddle/Paddle/pull/38098))
-- Fix bug of `conv2d` reporting an error with missing attributes after model is exported using `save_inference_model` and backward pass is added for training. ([#38832](https://github.com/PaddlePaddle/Paddle/pull/38832))
+### **Inference Deployment**
 
-#### IR(Intermediate Representation)
+- Add the Java API and ONNX Runtime CPU backend.
+  
+- Support the TensorRT 8.0 / 8.2 and structured sparsity, with deep performance optimization for ERNIE-like structural models.
+  
 
-- Dynamic Graph to Static Graph
+### **Hardware Backend Extention**
+
+- Add custom device support: provide a plug-in way to extend PaddlePaddle hardware backend.
   
-	- Fix bug of inconsistency between dynamic and static behaviors of some initialization-related APIs. ([#37827](https://github.com/PaddlePaddle/Paddle/pull/37827))
-	- Fix bug where `paddle` will be used as a variable when dynamic to static code is transcribed. ([#37999](https://github.com/PaddlePaddle/Paddle/pull/37999))
-	- Fix bug that highlighted code comments lead to an error report when dynamic to static code is transcribed. ([#38003](https://github.com/PaddlePaddle/Paddle/pull/38003))
-	- Fix endless loop of `for … zip …`  statement in dynamic to static graph. ([#37846](https://github.com/PaddlePaddle/Paddle/pull/37846))
+- Add training/inference support for multiple heterogeneous chips such as HUAWEI Ascend 910 / GraphCore IPU / Cambricon MLU / Kunlunxin 2.
   
-- Model quantization
 
-	- Fix problem of redundant nodes in model derived from quantitative training of dynamic graph. ([#38122](https://github.com/PaddlePaddle/Paddle/pull/38122)) ([#38025](https://github.com/PaddlePaddle/Paddle/pull/38025))
-	- To solve the problem that the quantitative model cannot be predicted on Paddle Lite, remove `clip_extra` settings of quantitative export models. ([#38343](https://github.com/PaddlePaddle/Paddle/pull/38343))
-	- Fix `flatten_contiguous_range` quantization settings for `flatten_contiguous_range` operator output configuration error in quantization. ([#37741](https://github.com/PaddlePaddle/Paddle/pull/37741))
+### **Framework Architecture**
 
-#### Others
+- In this version, we did a lot of work on the framework executor. For details, please see [New Dynamic Graph Execution Mechanism](https://ku.baidu-int.com/knowledge/HFVrC7hq1Q/pKzJfZczuc/7UhIeLfrn3/0rDW-MD4RXSfkx#anchor-088a55e0-b962-11ec-a8b3-f52dfa102ded) and [New Static Graph Executor](https://ku.baidu-int.com/knowledge/HFVrC7hq1Q/pKzJfZczuc/7UhIeLfrn3/0rDW-MD4RXSfkx#anchor-e81120c0-c233-11ec-a2f2-c9306d79e3c2).
 
-- Custom OP
-	- Fix bug that user-defined operator may report an error due to incomplete files when loading Python APIs under multiple processes. ([#38128](https://github.com/PaddlePaddle/Paddle/pull/38128))
-	- Fix compilation failure caused by `D_GLIBCXX_USE_CXX11_ABI` not taking effect as expected when compiling on CentOS platforms. ([#37878](https://github.com/PaddlePaddle/Paddle/pull/37878))
+## **2. Incompatibility Upgrade**
 
-- Dynamic graph inplace strategy
+- When `paddle.to_tensor` converts a python int scalar to a Tensor, the default data type on Windows changes from int32 to int64, thus alignment with Linux/Mac. ([#39662](https://github.com/PaddlePaddle/Paddle/pull/39662))
   
-  - Fix problem that accumulator reports an error when multiple inplace OPs execute continuously. ([#38406](https://github.com/PaddlePaddle/Paddle/pull/38406))
-  - Fix problem that the `setitem` method of `Tensor` causes the backward graph construction error when performing the inplace operation on leaf nodes. ([#38014](https://github.com/PaddlePaddle/Paddle/pull/38014))
-- NHWC strategy
+- To keep consistency with division behavior under python3, the division symbol `/` has been changed from “rounding divide” to “true divide”, and the data type of the computed output has been switched from int to float. ([#40890](https://github.com/PaddlePaddle/Paddle/pull/40890))
   
-  - Fix bug of undefined intermediate variables in backward Op in batchnorm_op when data type is FP32, with dims = 2 and data_layout = NHWC. ([#37020](https://github.com/PaddlePaddle/Paddle/pull/37020))
 
-## 3. Paddle Inference
+|     |     |
+| --- | --- |
+| 2.2 | 2.3.0-rc0 |
 
-### （1）Function Optimization
+> > > import paddle
 
-#### Framework and API updates
+> > > a = paddle.to_tensor([327])
 
-- C API supports processing of c++ std::string. ([#38667](https://github.com/PaddlePaddle/Paddle/pull/38667))
+> > > b = paddle.to_tensor([80])
 
-#### Back-end capability enhancement
+> > > a / b
 
-- GPU and TensorRT subgraph engine related updates
-  - Support invoke of TensorRT inference for relu, relu6, tanh, sigmoid, pool2d, concat, batch_norm, split, gelu, scale, swish, prelu, clip, reduce_sum, and reduce_mean operators in the static shape and 2-dimensional input. ([#37773](https://github.com/PaddlePaddle/Paddle/pull/37773))
-  - Support invoke of TensorRT inference by mish activation function. ([#38866](https://github.com/PaddlePaddle/Paddle/pull/38866))
+Tensor(shape=[1], dtype=int64, place=CUDAPlace(0), stop_gradient=True,
 
-### （2）Bug Fixes
+[4])
 
-#### Framework and API fixing
+> > > import paddle
 
-- Operator fixing
-  
-  - Fix incompatibility bug of the roi_align operator in use of TRT. ([#38788](https://github.com/PaddlePaddle/Paddle/pull/38788))
-  - Add the function of elementwise broadcasting in the same dimension. ([#37908](https://github.com/PaddlePaddle/Paddle/pull/37908))
-- Framework function fixing
-  
-  - Fix bug of model clipping logic in dynamic-to-static graphs, so operators containing subblock are clipped correctly in dynamic-to-static graphs. ([#37579](https://github.com/PaddlePaddle/Paddle/pull/37579))
-  - Fix error reporting issue of CreatePredictor interface under multiple threads. Current CreatePredictor interface allows calling in multiple threads without causing inference exceptions. ([#37894](https://github.com/PaddlePaddle/Paddle/pull/37894))
-  - Support “params file” to pass empty strings for models without weights in config. ([#38579](https://github.com/PaddlePaddle/Paddle/pull/38579))
-  - Fix problem of not copying GPU data when Paddle-TRT engine directly inputs CPU tensor. ([#37427](https://github.com/PaddlePaddle/Paddle/pull/37427))
+> > > a = paddle.to_tensor([327])
 
-#### Back-end capability fixing
+> > > b = paddle.to_tensor([80])
 
-- TensorRT subgraph engine fixing
-  
-  - Fix the bug of an error that occurred in the running of TensorRT by pool2d with some of the parameters. ([#37929](https://github.com/PaddlePaddle/Paddle/pull/37929))
-- MKLDNN engine fixing
-  
-  - Fix the problem that mkldnn kernel of matmul_v2 does not support different lengths of two input shapes. ([#38733](https://github.com/PaddlePaddle/Paddle/pull/38733))
+> > > a / b
 
-#### Others
+Tensor(shape=[1], dtype=float32, place=Place(gpu:0), stop_gradient=True,
 
-- Fix the possible hang bug of ERNIE model under TRT8. ([#37839](https://github.com/PaddlePaddle/Paddle/pull/37839))
+[4.08750010])
 
-# 2.2.1 Release Note
+- Revise the ELU's formula. The computing method in case of alpha <0 aligns with the original paper, thus fixing a small number of cases where the results are incorrectly calculated. Meanwhile, elu_ will report an error in case of alpha <0, because it is not mathematically possible to compute the inverse gradient from the output only at alpha <0. ([#37316](https://github.com/PaddlePaddle/Paddle/pull/37316))
 
-## 1. Important Updates
+|     |     |
+| --- | --- |
+| 2.2 | 2.3.0-rc0 |
 
-This version fixed some function and performance issues of PaddlePaddle 2.2.0, and optimized some functions. The highlights are as follows：
+# elu(x) = max(0, x) + min(0, α ∗ (e^x − 1))
 
-- Add  ``paddle.linalg.triangular_solve`` to calculate linear equations with triangular coefficient matrices. 
-- Add `paddle.device.cuda.graphs.CUDAGraph` API that supports the [CUDA Graph](https://developer.nvidia.com/blog/cuda-graphs/) function of NVIDIA. Note that this API is still experimental and not yet stable.
-- Fix known issues of basic API and Tensor index.
+> > > import paddle
 
-## 2. Training Framework（Distributed Included）
+> > > x = paddle.to_tensor([-1. ,6.])
 
-### （1）New Functions
+> > > m = paddle.nn.ELU(-0.2)
 
-#### API
+> > > out = m(x)
 
-- Add ``paddle.linalg.triangular_solve`` API to calculate linear equations with triangular coefficient matrices. ([#36714](https://github.com/PaddlePaddle/Paddle/pull/36714))
-- Add `paddle.device.cuda.graphs.CUDAGraph` API that supports the [CUDA Graph](https://developer.nvidia.com/blog/cuda-graphs/) function of NVIDIA by capturing all GPU calculations into a single CUDA Graph and calling them for later use, which not only cuts the extra overhead but also improves the runtime performance. Note that the API is still experimental and not yet stable. ([#37109](https://github.com/PaddlePaddle/Paddle/pull/37109))
-- Add``paddle.incubate.graph_send_recv`` API for graph learning to reduce the loss of intermediate variables in memory or video memory during message passing. It contains four update modes, namely, SUM, MEAN, MIN, and MAX. ([#37205](https://github.com/PaddlePaddle/Paddle/pull/37205))
-- Add `paddle.incubate.operators.ResNetUnit` API to integrate the convolution, batch normalization, and shortcut/bottleneck operation in the ResNet network. ([#37109](https://github.com/PaddlePaddle/Paddle/pull/37109))
+> > > out
 
-### （2）Function Optimization
+Tensor(shape=[2], dtype=float32, place=CUDAPlace(0), stop_gradient=True,
 
-#### API
+[ 0. , -74.48576355])
 
-- `paddle.incubate.FusedTransformerEncoderLayer` adds `src_mask=None` and supports pure fp16.([#37229](https://github.com/PaddlePaddle/Paddle/pull/37229))
+> > > out = paddle.nn.functional.elu_(x, alpha=-0.2, name=None)
 
-#### IR(Intermediate Representation)
+> > > out
 
-- Dynamic Graph to Static Graph
-  - When adopting`@paddle.jit.to_static` to decorate single function, `train()、eval()` functions are provided to support the switch to `train、eval` mode. ([#37383](https://github.com/PaddlePaddle/Paddle/pull/37383))
+Tensor(shape=[2], dtype=float32, place=CUDAPlace(0), stop_gradient=True,
 
-#### Distributed Training
+[ 0. , -74.48576355])
 
-- Optimize the ability of arbitrary cutting and add pipeline training in the heterogeneous parameter server, which enhance training throughput.([#37446](https://github.com/PaddlePaddle/Paddle/pull/37446))
+# elu(x) = x, if x > 0
 
-#### Others
+# elu(x) = α ∗ (e^x − 1), if x <= 0
 
-- Enhance the out-of-bounds check for the ``index`` of ``paddle.scatter` that causes core dump, and improve the corresponding error reporting message. ([#37431](https://github.com/PaddlePaddle/Paddle/pull/37431))
+> > > import paddle
 
-### （3）Performance Optimization
+> > > x = paddle.to_tensor([-1. ,6.])
 
-- Optimize `paddle.top_k` by enabling it to choose different implementations according to the size of ``k`` and ``input_width``: cub implementation when k>=75% input_width, otherwise the handwritten kernel implementation.([#37325](https://github.com/PaddlePaddle/Paddle/pull/37325)) 
-- Optimize `paddle.fluid.optimizer.LarsMomentumOptimizer` to improve OP performance by integrating optimizer operator and  [CUDA Cooperative Groups](https://developer.nvidia.com/blog/cooperative-groups/). ([#37109](https://github.com/PaddlePaddle/Paddle/pull/37109))
+> > > m = paddle.nn.ELU(-0.2)
 
-### （4）Bug Fixes
+> > > out = m(x)
 
-#### API
+> > > out
 
-- Fix the calculation error of `paddle.nn.ELU` and `paddle.nn.functional.elu` when alpha<0；please note the inplace version:`paddle.nn.functional.elu_` will raise error when alpha<0. ([#37437]
-- (https://github.com/PaddlePaddle/Paddle/pull/37437))
-- Fix the problem of `out_of_range` when the `paddle.slice` is reversely executed. ([#37584](https://github.com/PaddlePaddle/Paddle/pull/37584))
-- `paddle.shape` doesn't support backward, explicitly set ``stop_gradient`` to ``True``. ([#37412](https://github.com/PaddlePaddle/Paddle/pull/37412))
-- `paddle.arange` doesn't support backward, explicitly set ``stop_gradient`` to ``True``.([#37486](https://github.com/PaddlePaddle/Paddle/pull/37486))
-- `paddle.shard_index` reports an error if the last dimension of the input data is not 1. ([#37421](https://github.com/PaddlePaddle/Paddle/pull/37421))
-- Fix the wrong dimension of inverse quantization when ``paddle.matmul`` adopts int8 quantization. ([#36982](https://github.com/PaddlePaddle/Paddle/pull/36982))
-- Fix the issue that `paddle.nn.Dropout`, under  `eval`, does not calculate the gradient. ([#37305](https://github.com/PaddlePaddle/Paddle/pull/37305))
-- Fix the issue that `paddle.nn.functional.dropout`, in static graph mode,  reports an error when  -1 is included in the input shape of  `Tensor` and it is specified to drop this dimension. ([#37223](https://github.com/PaddlePaddle/Paddle/pull/37223))
-- Fix the backward calculation errors of multi-layer RNN (dropout set 0) in CPU training by RNN API `paddle.nn.LSTM`,`paddle.nn.GRU`, `paddle.nn.SimpleRNN`. ([#37086](https://github.com/PaddlePaddle/Paddle/pull/37086))
-- Fix issues such as the gradient error of`paddle.incubate.FusedTransformerEncoderLayer` backward calculation, incorrect processing of pre_layer_norm, incorrect parameter processing, missing parameters, calculation errors of add_bias, etc. ([#37229](https://github.com/PaddlePaddle/Paddle/pull/37229))
-- Fix the issue that `paddle.incubate.fused_multi_head_attention` does not support ``bias`` as `None`.([#37411](https://github.com/PaddlePaddle/Paddle/pull/37411), [#37566](https://github.com/PaddlePaddle/Paddle/pull/37566))
-- Fix the disordered data loaded by `paddle.vision.datasets.Cifar10`, `paddle.vision.datasets.Cifar100`. ([#37528](https://github.com/PaddlePaddle/Paddle/pull/37528))
-- Fix the issue that one-dimensional `Tensor` reports an exception error of dimension detection when using ellipsis(...) indexing. ([#37192](https://github.com/PaddlePaddle/Paddle/pull/37192))
-- Fix the issue that the gradient attribute of`Tensor` cannot be spread during indexing and assignment (`setitem`), see [issue](https://github.com/PaddlePaddle/Paddle/issues/36902) for details. ([#37028](https://github.com/PaddlePaddle/Paddle/pull/37028))
+Tensor(shape=[2], dtype=float32, place=CUDAPlace(0), stop_gradient=True,
 
-#### IR(Intermediate Representation)
+[0.12642412, 6. ])
 
-- Dynamic Graph to Static Graph
-  - The model can call `paddle.flops` to count the model parameters correctly. ([#36852](https://github.com/PaddlePaddle/Paddle/pull/36852))
-  - The model can correctly convert the loop statements `for i in [1, 2, 3]`.([#37259](https://github.com/PaddlePaddle/Paddle/pull/37259))
+> > > out = paddle.nn.functional.elu_(x, alpha=-0.2, name=None)
 
-#### Distributed Training
+Traceback (most recent call last):
 
-  - `fleet.load_model`: Fix the unavailable API loaded by the model in parameter server mode.([#37461](https://github.com/PaddlePaddle/Paddle/pull/37461))
-  - `fleet.save_inference_model`: Fix the issue that the model does not pull parameters from the server side before saving dense parameters in parameter server mode. ([#37461](https://github.com/PaddlePaddle/Paddle/pull/37461))
+File "<stdin>", line 1, in <module>
 
-#### Others
+File "/usr/local/lib/python3.7/dist-packages/decorator.py", line 232, in fun
 
-- Fix the problem of inplace operation of dynamic graph: after performing inplace operation on a non-leaf node, followed by immediate execution of backward, the gradient of this node and the nodes before is calculated incorrectly. ([#37420](https://github.com/PaddlePaddle/Paddle/pull/37420)) 
+return caller(func, *(extras + args), **kw)
 
-## 4. Paddle Inference
+File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/wrapped_decorator.py", line 25, in __impl__
 
-### （1）Bug Fixes
+return wrapped_func(*args, **kwargs)
 
-- Further removal of redundant debug logs in the case of clear log disable.([#37212](https://github.com/PaddlePaddle/Paddle/pull/37212))
-- Fix memory/video memory optimization policies to avoid incorrect prediction results or crashes due to improper memory/video memory optimization. ([#37324](https://github.com/PaddlePaddle/Paddle/pull/37324), [#37123](https://github.com/PaddlePaddle/Paddle/pull/37123))
-- Fix the scale calculation error in the MultiHead structure of Transformer model after integrating QkvToContextPluginDynamicscale, which is caused by wrong block and thread settings of cuda function. ([#37096](https://github.com/PaddlePaddle/Paddle/pull/37096))
-- Register all inference OPs in the function of int8 quantization: Solve the issues that some inference OPs are not registered in int8 quantization due to historical reasons. ([#37266](https://github.com/PaddlePaddle/Paddle/pull/37266))
+File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/inplace_utils.py", line 34, in __impl__
 
-# 2.2.0 Release Note
+return func(*args, **kwargs)
 
-## **1. Highlights**
+File "/usr/local/lib/python3.7/dist-packages/paddle/nn/functional/activation.py", line 89, in elu_
 
-We are excited to release the PaddlePaddle Framework V2.2.0. This version contains the following highlights.
+assert alpha >= 0., "elu_ only support alpha >= 0, please use elu instead."
 
-### API
+AssertionError: elu_ only support alpha >= 0, please use elu instead.
 
-- Added 100+ APIs, including 24 Fourier transform APIs, 17 linear algebra APIs, etc., to better facilitate developing of scientific computing and signal processing models.
-- Added the support for multiple indexing syntax, including ellipsis (...), dimension expansion (None), boolean arrays (Bool Mask), and integer arrays (list and tensor), making it easier to operate on tensor.
-- Added the `paddle.einsum` API, to express multi-dimensional tensor computation in a more concise way.  
-- Enhanced the dynamic graph mixed precision. Added a way to use half-precision (float16) training for the whole task. The computational efficiency under the main tasks increased by 20%.
-
-### IR(Intermediate Representation)
-
-- Dynamic graph to static graph conversion: Further expand the syntax and scenarios supported by dynamic-static conversion. Now the dynamic graph models trained with mixed precision can also be converted to static graphs for training or inference deployment via the `to_static` interface. In addition, the training performance after conversion can be optimized, and the training performance after conversion is significantly improved with the comparison to the dynamic graph method by introducing caching and enabling the Pass and other strategies.
-- Pass development: Added the interface for rewriting static graph IR in Python, so that development can be completed quickly in python for OP fusion and other subgraph replacement scenarios.
-- Abstraction and functional encapsulation of the underlying codes in the operator Kernel: Provide high-performance Block-level IO operations and Compute operations (Kernel Primitive API).The Kernel development using the Kernel Primitive API allows you to focus more on the implementation of the computational logic, significantly reducing the amount of codes while ensuring performance, and decoupling operator computation from hardware.
-
-### **Distributed**
-
-- Hybrid parallel: Based on the existing 4D hybrid parallel of static graph, the performance optimization such as pipeline executor is carried out, and the training arithmetic utilization reaches 51% of the theoretical peak performance of GPU under 100 billion models. The dynamic graph supports 4D hybrid parallelism, and the function and performance under 100 billion models are the same as static graphs. The basic functions such as auto-completion and auto-slicing are added, and semi-automatic parallelism based on user mark is available.
-- GPU Parameter Server: Under the 100 billion models, optimize the data reading, GPU-PS construction, SSD performance, and improve the pipeline. The overall performance is doubled and memory usage is halved, and one GPU machine can replace one hundred CPU machines to train 100 billion models.
-
-### **Inference engine**
-
-- Inference acceleration: Support the latest TensorRT 8.x, and adapt Nvidia's new hardware features for acceleration.
-- Ease of Inference: Add automatic derivation of dynamic Shape configurations in TensorRT subgraphs. Optionally, derive the range of Shapes from data without trivial manual configuration. This can simplify the use of dynamic Shape.
-
-
-## **2. Backwards Incompatible changes**
-
-- For the problem of `grad` being exposed in paths (`paddle.autograd,grad`, `paddle.grad`), it is recommended to use `paddle.grad` , with removing `from paddle.autograd import *` and calling the grad directly. ([#35579](https://github.com/PaddlePaddle/Paddle/pull/35579))
-
-<table>
-<tr>
-<th>
-2.1
-</th>
-<th>
-2.2
-</th>
-</tr>
-
-<tr>
-<td>
-<pre>
-
-```python
->>> import paddle
->>> from paddle.autograd import *
->>> x = paddle.ones(shape=[1], dtype='float32')
->>> x.stop_gradient = False
->>> y = x*x
->>> grad(outputs=[y], inputs=[x])
-[Tensor(shape=[1], dtype=float32, place=CUDAPlace(0), stop_gradient=True,
-        [2.])]
-```
-</pre>
-</td>
-
-<td>
-<pre>
-
-```python
->>> import paddle
->>> from paddle.autograd import *
->>> x = paddle.ones(shape=[1], dtype='float32')
->>> x.stop_gradient = False
->>> y = x*x
->>> grad(outputs=[y], inputs=[x])
-NameError: name 'grad' is not defined
->>> paddle.grad(outputs=[y], inputs=[x]) # 改用paddle.grad API
-[Tensor(shape=[1], dtype=float32, place=CUDAPlace(0), stop_gradient=True,
-       [2.])]
-```
-</pre>
-</td>
-</tr>
-</table>
-
-- ``Tensor.__setitem__`` does not support the slice index of non- ``int`` type ( ``x[start:stop:step] = value`` ). Since the ``float`` type does not make mathematical sense when used as an index (For example, how to determine the exact index position when ``start`` is 0.5?) and it is prone to some unknown behaviors, we limit the data type of slice index to ``int`` in this update, and the slice index using ``float`` will report an error. ([#35701](https://github.com/PaddlePaddle/Paddle/pull/35701))
-
-<table>
-<tr>
-<th>
-2.1
-</th>
-<th>
-2.2
-</th>
-</tr>
-
-<tr>
-<td>
-<pre>
-
-```python
->>> import paddle
->>> x = paddle.to_tensor([1, 2, 3, 4])
->>> start = paddle.zeros([1])
->>> stop = paddle.zeros([1]) + 2
->>> step = paddle.ones([1])
->>> x[start:stop:step] = 0 # start,stop,step supports the float type Tensor
->>> x 
-```
-</pre>
-</td>
-
-<td>
-<pre>
-
-```python
->>> import paddle
->>> x = paddle.to_tensor([1, 2, 3, 4])
->>> start = paddle.zeros([1])
->>> stop = paddle.zeros([1]) + 2
->>> step = paddle.ones([1])
->>> x[start:stop:step] = 0
-ValueError: (InvalidArgument) Currently, the type of tensor in slice indices only allows int32 and int64, please check the type of index tensor.
-
->>> # Must be changed to the following codes:
->>> start = paddle.zeros([1], dtype='int32')
->>> stop = paddle.zeros([1], dtype='int32') + 2
->>> step = paddle.ones([1], dtype='int32')
->>> x[start:stop:step] = 0 # start,stop,step must be integer type Tensor
->>> x
-Tensor(shape=[4], dtype=int64, place=CUDAPlace(0), stop_gradient=True,
-       [0, 0, 3, 4])
-```
-</pre>
-</td>
-</tr>
-</table>
-
-
-- Add inplace to call legality check for dynamic graph ``Tensor.__setitem__``. When the detected assignment code is not met, an error will be reported (detection logic: when ``Tensor`` is a leaf node and ``stop_gradient`` is ``False``, the ``Tensor`` assignment operation will be intercepted with reporting an error).Since the execution of ``tensor[index]=value`` will overwrite the original value of the ``Tensor``, it is an inplace operation of the ``Tensor``. If the ``Tensor`` is a leaf node in the computation graph and needs to calculate the gradient, the assignment of the ``Tensor`` will cause problems in the calculation of the inverse gradient of the ``Tensor``, which is an illegal inplace operation. Therefore, we add the detection and interception of such operations in this update. For the current code with the assignment by using ``tensor [index]=value``, check whether the inplace operation requirement is met. If it is not met, an error is reported.  ([#35701](https://github.com/PaddlePaddle/Paddle/pull/35701))
-  - Example: The initialization code is adjusted by using ``weight[index]=value``. The ``self.weight`` belongs to the leaf node and needs to calculate the gradient, so the inplace operation cannot be used (it will affect the inverse gradient value calculation). However, the initialization assignment itself does not need the inverse calculation process. Therefore, use ``no_ grad`` to disable the gradient calculation and then assign the value when it is clear that the inverse calculation is not needed.
-
-<table>
-<tr>
-<th>
-2.1
-</th>
-<th>
-2.2
-</th>
-</tr>
-
-<tr>
-<td>
-<pre>
-
-```python
->>> import paddle
->>> class MyLayer(paddle.nn.Layer):
-...     def __init__(self):
-...         super(MyLayer, self).__init__()
-...         self.weight = self.create_parameter(...)
-...         self.weight[index] = 1.0
-...
-```
-</pre>
-</td>
-
-<td>
-<pre>
-
-```python
->>> import paddle
-class MyLayer(paddle.nn.Layer):
-...     def __init__(self):
-...         super(MyLayer, self).__init__()
-...         self.weight = self.create_parameter(...)
-...         with paddle.no_grad(): # Assignment can be done after gradient calculation is disabled.
-...             self.weight[index] = 1.0
-```
-</pre>
-</td>
-</tr>
-</table>
-
-
-- When the `paddle.sum` input type is ``bool``, the output type is also bool, and the action is not consistent with ``numpy.sum``. To solve the problem, upgrade the incompatibility. After the upgrade, the output type is ``int64``, which is consistent with ``numpy.sum``. ([#34313](https://github.com/PaddlePaddle/Paddle/pull/34313))
-
-
-<table>
-<tr>
-<th>
-2.1
-</th>
-<th>
-2.2
-</th>
-</tr>
-
-<tr>
-<td>
-<pre>
-
-```python
->>> import paddle
->>> import numpy as np
->>> np_arr = np.ones((2, 3), dtype='bool')
->>> pd_arr = paddle.to_tensor(np_arr)
->>> pd_sum = pd_arr.sum(0)
->>> pd_sum.dtype
-paddle.bool
-```
-</pre>
-</td>
-
-<td>
-<pre>
-
-```python
->>> import paddle
->>> import numpy as np
->>> np_arr = np.ones((2, 3), dtype='bool')
->>> pd_arr = paddle.to_tensor(np_arr)
->>> pd_sum = pd_arr.sum(0)
->>> pd_sum.dtype
-paddle.int64
-```
-</pre>
-</td>
-</tr>
-</table>
-
-- Optimize the ``Tensor`` copying act in the case where ``paddle.to_tensor`` does not copy the ``Tensor`` when the input ``data`` is a ``Tensor``, causing the ``stop_gradient`` property to be incorrectly modified. In the original implementation, when ``data`` is a ``Tensor`` and ``dtype`` and ``place`` do not change, ``data`` is returned directly (i.e., no copying occurs) and the ``data.stop_gradient`` property is modified. This action will cause the problem of the back propagation of the original computed graph ``data``. In the new implementation, the ``paddle.to_tensor`` copies a new ``Tensor`` and returns it in the above case, without modifying the ``stop_gradient`` property of the original ``data``.  ([#33335](https://github.com/PaddlePaddle/Paddle/pull/33335)) 
-
-<table>
-<tr>
-<th>
-2.1
-</th>
-<th>
-2.2
-</th>
-</tr>
-
-<tr>
-<td>
-<pre>
-
-```python
->>> import paddle
->>> x = paddle.rand([2,3])
->>> x.stop_gradient = False
->>> y = paddle.to_tensor(x)
->>> print(id(x) == id(y)) # True
->>> print(x.stop_gradient, y.stop_gradient) # True True
-```
-</pre>
-</td>
-
-<td>
-<pre>
-
-```python
->>> import paddle
->>> x = paddle.rand([2,3])
->>> x.stop_gradient = False
->>> y = paddle.to_tensor(x)
->>> print(id(x) == id(y)) # False
->>> print(x.stop_gradient, y.stop_gradient) # False True
-```
-</pre>
-</td>
-</tr>
-</table>
-
-## **3. Training framework (with distributed)**
+## **3. Training Framework (with the distributed function)**
 
-### **(1) New features**
+### **(1) New functions**
 
 #### API
 
-- Add the linear algebra computation API  ``paddle.linalg.*``
- - Add the ``paddle. linalg.svd``, to support the singular value decomposition for multi-dimensional ``Tensor``.  ([#34953](https://github.com/PaddlePaddle/Paddle/pull/34953)) 
-   - Add the ``paddle.linalg.cond``, to support the computing of the condition number of a matrix or a batch of matrixes based on the norm type ``p``.  ([#35140](https://github.com/PaddlePaddle/Paddle/pull/35140)) 
-   - Add the ``paddle.linalg.matrix_rank``, to support the computing of the rank of a multidimensional matrix ``Tensor``.  ([#34823](https://github.com/PaddlePaddle/Paddle/pull/34823)) 
-   - Add the ``paddle.linalg.eigvals``, to support the computing of general squares.  ([#35720](https://github.com/PaddlePaddle/Paddle/pull/35720), [#35909](https://github.com/PaddlePaddle/Paddle/pull/35720))
-   - Add the ``padding.linalg.eigh``, to support the computing of eigenvalues and eigenvectors of complex Hermite matrix or real symmetric matrix. ([#34990](https://github.com/PaddlePaddle/Paddle/pull/34990), [#35916](https://github.com/PaddlePaddle/Paddle/pull/35916), [#35812](https://github.com/PaddlePaddle/Paddle/pull/35812), [#36091](https://github.com/PaddlePaddle/Paddle/pull/36091),[#35919](https://github.com/PaddlePaddle/Paddle/pull/35919)) 
-   - Add the ``paddle.linalg.det``, to support the computing of determinant values of multidimensional matrix.  ([#34992](https://github.com/PaddlePaddle/Paddle/pull/34992)) 
-   - Add the ``paddle.linalg.slogdet``, to support the computing of signed and natural logarithm values of multidimensional matrix determinant values. ([#34992](https://github.com/PaddlePaddle/Paddle/pull/34992))
-   - Add the ``paddle.linalg.pinv``, to support the computing of pseudo-inverse matrix of multidimensional matrix Tensor.  ([#35804](https://github.com/PaddlePaddle/Paddle/pull/35804))
-   - Add the ``paddle.linalg.multi_dot``, to support the computing of concatenated multiplication of multiple matrices. ([#35224](https://github.com/PaddlePaddle/Paddle/pull/35224))
-   - Add the ``paddle.linalg.solve``, to support the computing of the solutions of linear equations.  ([#35715](https://github.com/PaddlePaddle/Paddle/pull/35715))
-   - Add the ``paddle.linalg.matrix_power``, to support the power operations on matrices.  ([#34667](https://github.com/PaddlePaddle/Paddle/pull/34667))
-   - Add `paddle.linalg.eigvalsh` for computing eigenvalues of Hermite Matrix or real symmetric matrices.  ([#36680](https://github.com/PaddlePaddle/Paddle/pull/36680))
-   - Add `paddle.linalg.eig` for computing eigenvalues and eigenvectors of general square matrices.  ([#35674](https://github.com/PaddlePaddle/Paddle/pull/35674))
-   - Add `paddle.linalg.qr` for computing QR decomposition of matrices (inverse is not supported yet).  ([#36627](https://github.com/PaddlePaddle/Paddle/pull/36627))
-
-- Add new Fourier transform related API ([#35665](https://github.com/PaddlePaddle/Paddle/pull/35665)) 
-  - Add fast Fourier transform family functions  
-    - Differentiable 1d to nd complex to complex fast Fourier transforms.  (``paddle.fft.fft``, ``paddle.fft.fft2``, ``paddle.fft.fftn``, ``paddle.fft.ifft``, ``paddle.fft.ifft2``, ``paddle.fft.ifftn``)
-    - Differentiable 1d to nd real to complex fast Fourier transform.  (``paddle.fft.rfft``, ``paddle.fft.rfft2``, ``paddle.fft.rfftn``, ``paddle.fft.ihfft``, ``paddle.fft.ihfft2``, ``paddle.fft.ihfftn``)
-    - Differentiable 1d to nd complex to real fast Fourier transform.  (``paddle.fft.hfft``, ``paddle.fft.hfft2``, ``paddle.fft.hfftn``, ``paddle.fft.irfft``, ``paddle.fft.irfft2``, ``paddle.fft.irfftn``)
-    - fft related helper functions. (``paddle.fft.fftfreq``, ``paddle.fft.rfftfreq``, ``paddle.fft.fftshift``, ``paddle.fft.ifftshift``)
-
-  - Add short-time Fourier transform related functions
-    - Short-time Fourier transform.  (``paddle.signal.stft``)
-    - Short-time Fourier inverse transform.  (``paddle.signal.istft``)
-
-- Add new high-level APIs  
-  - Add the ``paddle.vision.ops.roi_pool`` and ``paddle.vision.ops.RoIPool``, support RoI region pooling operations in detection tasks. ([#36154](https://github.com/PaddlePaddle/Paddle/pull/36154))
-  -  Add the ``paddle.vision.ops.roi_align`` and ``paddle.vision.ops.RoIAlign``, to support RoI region Align operations in detection tasks.  ([#36207](https://github.com/PaddlePaddle/Paddle/pull/36207))
-  -  Add the ``paddle.vision.ops.psroi_pool`` and ``paddle.vision.ops.PSRoIPool``, to support location-sensitive RoI region pooling operations in detection tasks. ([#36111](https://github.com/PaddlePaddle/Paddle/pull/36111))
-  -  Add the ``paddle.vision.models.vgg19`` pre-training weights. ([#35788](https://github.com/PaddlePaddle/Paddle/pull/35788))
-  -  Add the datasets API download progress bar in ``paddle.vision.datasets.*``. ([#33302](https://github.com/PaddlePaddle/Paddle/pull/33302))
-  -  Add the ``paddle.Model.predict`` parameter ``verbose``, to support whether to show logs or not. ([#33405](https://github.com/PaddlePaddle/Paddle/pull/33405))
-  -  Add the ``paddle.hub`` download option ``wget`` method. ([#33379](https://github.com/PaddlePaddle/Paddle/pull/33379))
-  -  Add the ``paddle.Model`` gradient accumulation in dynamic graph mode. ([#32702](https://github.com/PaddlePaddle/Paddle/pull/32702))
-  -  Add the ``paddle.Model.fit`` and ``paddle.Model.evaluate`` ``num_iters`` parameters in dynamic graph mode to control the number of training iterations. ([#33986](https://github.com/PaddlePaddle/Paddle/pull/33986))
-  -  Add the ``paddle.vision.ops.yolo_box`` parameters ``iou_aware`` and ``iou_aware_factor``, to support YoloBox using predicted IOUs as confidence factors. ([#33400](https://github.com/PaddlePaddle/Paddle/pull/33400))
-  -  Add the ``paddle.summary`` parameter input to support the given ``input``. ([#34165](https://github.com/PaddlePaddle/Paddle/pull/34165))
-  - Add `paddle.text.viterbi_decode`, to support Viterbi decoding for CPU and GPU under dynamic graphs.  ([#35778](https://github.com/PaddlePaddle/Paddle/pull/35778))
-
-- Add networking class APIs  
-  - Add `paddle.nn.functional.sparse_attention` for computing sparse Transformer Attention modules.  ([#35757](https://github.com/PaddlePaddle/Paddle/pull/35757))
-  - Add the ``paddle.nn.MaxUnPool2D`` and ``paddle.nn.functional.max_unpool2d``, to support the computing of the inverse of the pooling result based on the input and maximum position.  ([#35056](https://github.com/PaddlePaddle/Paddle/pull/35056))
-  - Add the ``paddle.nn.functional.gumbel_softmax``, to support ``gumbel softmax`` sampling.  ([#35506](https://github.com/PaddlePaddle/Paddle/pull/35506), [#36065](https://github.com/PaddlePaddle/Paddle/pull/36065), [#36094](https://github.com/PaddlePaddle/Paddle/pull/36094))
-  - Add the ``paddle.nn.functional.class_center_sample``, to support PartialFC class center sampling. ([#34106](https://github.com/PaddlePaddle/Paddle/pull/34106))
-  - Add the ``paddle.nn.functional.margin_cross_entropy``, to support ArcFace, CosFace, SphereFace and other MarginLoss functions. ([#34247](https://github.com/PaddlePaddle/Paddle/pull/34247))
-  - Add the ``paddle.nn.AvgPool2D``, to support second-order derivatives. ([#35388](https://github.com/PaddlePaddle/Paddle/pull/35388))
-  - Add the ``paddle.nn.Linear, paddle.matmul, and paddle.mm``, to support second-order derivatives.  [#35428](https://github.com/PaddlePaddle/Paddle/pull/35428)
-  - Add the ``paddle.nn.GroupNorm``, to support the inputs of the form (N, C, *). ([#34773](https://github.com/PaddlePaddle/Paddle/pull/34773))
-  - Add the ``paddle.nn.BatchNorm1D/2D/3D`` to compute the inverse under ``x.stop_gradient=True``. ([#34102](https://github.com/PaddlePaddle/Paddle/pull/34102))
-  - Add the ``paddle.nn.Dropout, paddle,nn.Dropout2D/3D`` to compute the inverse in ``model.eval`` mode.  ([#35122](https://github.com/PaddlePaddle/Paddle/pull/35122))
-
-- Add hardware related APIs  
-  - Add the `paddle.device.cuda.Stream`, `paddle.device.cuda.Event`, `paddle.device.cuda.current_stream`, `paddle.device.cuda.synchronize`, `paddle.device.cuda.synchronize`, to support synchronization operations for event and stream of CUDA on the Python side. ([#32460](https://github.com/PaddlePaddle/Paddle/pull/32460))
-  - Add the ``paddle.device.cuda.device_count``, to support returning the current number of available GPUs. ([#34811](https://github.com/PaddlePaddle/Paddle/pull/34811))
-  - Add the ``paddle.device.cuda.empty_cache``, to support for clearing free GPU memory.  ([#35427](https://github.com/PaddlePaddle/Paddle/pull/35427))
-  - Add the ``paddle.device.cuda.get_device_properties``, to support for returning the given device properties. ([#35875](https://github.com/PaddlePaddle/Paddle/pull/35875))
-  - Add the ``paddle.device.cuda.stream_guard`` for flexible switching of CUDA Streams under dynamic graphs. ([#35623](https://github.com/PaddlePaddle/Paddle/pull/35623))
-  - Add `paddle.device.cuda.get_device_name`, to support returning the name of a given device.  ([#36172](https://github.com/PaddlePaddle/Paddle/pull/36172))
-  - Add `paddle.device.cuda.get_device_capability`, to support returning version number of the computational capability of a given device.  ([#36172](https://github.com/PaddlePaddle/Paddle/pull/36172))
-  - Add `paddle.framework.core.async_read` and `paddle.framework.core.async_write`, to support `Tensor` data asynchronous read and write of `CUDAPinnedPlace` and ` CUDAPlace` under non-default CUDA `Stream`.  ([#36501](https://github.com/PaddlePaddle/Paddle/pull/36501))
-
-
-- Add Tensor operation APIs  
- - Add `paddle.tensordot`, to support Tensor Contraction for high dimension.  ([#36454](https://github.com/PaddlePaddle/Paddle/pull/36454))
- - Add `paddle.bincount`, to support counting elements in a one-dimensional tensor.  ([#36709](https://github.com/PaddlePaddle/Paddle/pull/36709))
- - Add the `paddle.broadcast_tensors`, to support broadcast operations on a set of `Tensors`.  ([#33294](https://github.com/PaddlePaddle/Paddle/pull/33294), [#34874](https://github.com/PaddlePaddle/Paddle/pull/34874))
- - Add the `paddle.einsum`.  ([#33821](https://github.com/PaddlePaddle/Paddle/pull/34874))
- - Enhance the ``paddle.tensor.gradient`` interface to support second-order derivative operators for sigmoid_op. ([#32971](https://github.com/PaddlePaddle/Paddle/pull/32971))
- - Add the ``paddle.searchsorted``, to support the search of the index of a given value in an ordered ``Tensor``.  ([#35159](https://github.com/PaddlePaddle/Paddle/pull/35159))
- - Add the ``paddle.unique_consecutive``, to support removing duplicates of consecutively repeated elements in a ``Tensor`` to return consecutive non-repeated Tensor. ([#34334](https://github.com/PaddlePaddle/Paddle/pull/34334))
- - Add the ``paddle.diagflat``, to support the returning of a diagonal matrix with the elements of the input ``Tensor`` as diagonals.  ([#33334](https://github.com/PaddlePaddle/Paddle/pull/33334))
- - Add the ``paddle.lgamma``, to support element-by-element computing of the ``Tensor``'s ``lgamma`` function value.  ([#33913](https://github.com/PaddlePaddle/Paddle/pull/32913))
- - Add the ``paddle.digamma``, to support element-by-element computing of ``digamma`` function values for ``Tensor``. ([#33278](https://github.com/PaddlePaddle/Paddle/pull/33278))
- - Add the ``paddle.neg``, to support element-by-element computing of the opposite value of a ``Tensor``.  ([#33248](https://github.com/PaddlePaddle/Paddle/pull/33248))
- - Add the ``paddle.cumprod``, to support the computing of ``Tensor`` cumulative multiplication based on a given dimension. ([#35185](https://github.com/PaddlePaddle/Paddle/pull/35185))
- - Add the ``paddle.atan2``, to support element-by-element ``arctangent`` operations to determine quadrants by symbols. ([#33067](https://github.com/PaddlePaddle/Paddle/pull/33067))
- - Add the ``paddle.expm1``, to support element-by-element arithmetic with ``exp(x)-1``.  ([#33066](https://github.com/PaddlePaddle/Paddle/pull/33066))
- - Add the ``paddle.trunc``, to support truncated integer values for the input ``Tensor``. ([#33371](https://github.com/PaddlePaddle/Paddle/pull/33371))
- - Add the ``paddle.diagonal``, to support the extracting of diagonal elements of the input ``Tensor``.  ([#33586](https://github.com/PaddlePaddle/Paddle/pull/33586)) 
- - Add the ``paddle.utils.dlpack``, including: ``paddle.utils.dlpack.to_dlpack`` and ``paddle.utils.dlpack.from_dlpack``, to support the ``Tensor`` transfer between different frameworks with using ``DLPack``.  ([#35067](https://github.com/PaddlePaddle/Paddle/pull/35067))
- - Add the ``Tensor.uniform``_, to support filling a ``Tensor`` in-place with random numbers that obey a uniform distribution. ([#33394](https://github.com/PaddlePaddle/Paddle/pull/33934))
- - Add the ``paddle.Tensor.T``, to transpose an N-D Tensor to return a Tensor with the opposite shape of the original Tensor.  ([#35379](https://github.com/PaddlePaddle/Paddle/pull/35379)) 
- - Add the ``paddle.Tensor`` magic operators: & (bitwise_and), | (bitwise_or), ^ (bitwise_xor), ~ (bitwise_not). ([#33524](https://github.com/PaddlePaddle/Paddle/pull/33524))
- - Add the `paddle.Tensor.fill_ `and `paddle.Tensor.zero_`, to modify the value in Tensor in-place, use the fixed values to fill, use all-zero to fill respectively. ([#33829](https://github.com/PaddlePaddle/Paddle/pull/33829)) 
- - Add the `paddle.Tensor.fill_diagonal`, and `paddle.Tensor.fill_diagonal`, to modify Tensor diagonal element values. ([#34460](https://github.com/PaddlePaddle/Paddle/pull/34460)) 
- - Add the `paddle.Tensor.fill_diagonal_tensor_`, to modify the whole sub-Tensor formed by the diagonal of two specified coordinate axes of the Tensor with other axes. ([#34515](https://github.com/PaddlePaddle/Paddle/pull/34515)) 
- - Dynamic-Static Graph ``Tensor``: Add the support for multiple index types, including: ellipsis (...), dimensional augmentation (None), boolean type arrays (Bool Mask), integer arrays (list), and tensors (Tensor).  
-   - ellipsis (...) Index:  `X[..., 0]` 。([#34267](https://github.com/PaddlePaddle/Paddle/pull/34267), [#32876](https://github.com/PaddlePaddle/Paddle/pull/32876))
-   - Dimensional augmentation (None) index:   `X[None, :]` 。([#34338](https://github.com/PaddlePaddle/Paddle/pull/34338), [#34442](https://github.com/PaddlePaddle/Paddle/pull/34442),  [#34877](https://github.com/PaddlePaddle/Paddle/pull/34877),  [#34911](https://github.com/PaddlePaddle/Paddle/pull/34911),  [#33001](https://github.com/PaddlePaddle/Paddle/pull/33001))
-    - Boolean type array (Bool Mask) index:  `X[X > 0] = 0` 。 ([#35026](https://github.com/PaddlePaddle/Paddle/pull/35026),  [#35133](https://github.com/PaddlePaddle/Paddle/pull/35133),  [#33298](https://github.com/PaddlePaddle/Paddle/pull/33298))
-    - Array of integers (list) index: `X[[1, 0], [0]]` 。([#34824](https://github.com/PaddlePaddle/Paddle/pull/34824), [#33000](https://github.com/PaddlePaddle/Paddle/pull/33000),  [#35404](https://github.com/PaddlePaddle/Paddle/pull/35404))
-    - Tensor index:  `X[panddle.to_tensor([0, 1], [1, 0])]` 。([#34824](https://github.com/PaddlePaddle/Paddle/pull/34824))
-
-- Add the distributed related APIs  
-  - Add the ``paddle.distributed.utils.global_scatter`` and ``paddle.distributed.utils.global_gather``, to support MOE conditional distribution of data. The ``global_scatter`` distributes the data to all cards based on the conditions, and then the ``global_gather`` then collects the data from all GPU cards based on the conditions. ([#35546](https://github.com/PaddlePaddle/Paddle/pull/35546))
-
-- Add additional APIs  
-  -  Add the ``paddle.disable_signal_handler``, to support the disabling of the signal capture mechanism in PaddlePaddle, thus allow users to use Paddle and TVM at the same time. ([#34577](https://github.com/PaddlePaddle/Paddle/pull/34577))
-  -  Add the ``paddle.incubate.softmax_mask_fuse``, to support the acceleration of softmax and mask operations for Transformer architecture. ([#33841](https://github.com/PaddlePaddle/Paddle/pull/33841))
-  -  Add the ``paddle.incubate.softmax_mask_fuse_upper_triangle``, to support the acceleration of the softmax and mask operations of the GPT version of the Transformer architecture. ([#33981](https://github.com/PaddlePaddle/Paddle/pull/33981))
-  -  Add the ``paddle.static.ExponentialMovingAverage``, to support the computing of the sliding average of parameters with exponential decay. ([#35673](https://github.com/PaddlePaddle/Paddle/pull/35673))
-  -  Add the ``paddle::Tensor::slice`` C++ API, to support the slice operation, and allow users to perform slice operations for the external Tensor. ([#34227](https://github.com/PaddlePaddle/Paddle/pull/34227))
-  -  Add the ``paddle.incubate.segment_*`` series APIs, including ``paddle.incubate.segment_sum``, ``paddle.incubate.segment_mean``, ``paddle.incubate.segment_max``, and ``paddle. incubate.segment_min``. Support the summing, averaging, maximizing, and minimizing of ``Tensor`` by segment.  ([#35759](https://github.com/PaddlePaddle/Paddle/pull/35759))
-  - Add `paddle.version.cuda` and `paddle.version.cudnn` to get version numbers of `CUDA` and `cuDNN` used by paddle installer.  ([#36556](https://github.com/PaddlePaddle/Paddle/pull/36556))
-
+- Add 4 new automatic differentiation APIs to support scientific computing, as listed below: ([#40692](https://github.com/PaddlePaddle/Paddle/pull/40692))
+  
+  - `paddle.incubate.autograd.vjp`, compute vector-Jacobi matrix product.
+    
+  - `paddle.incubate.autograd.jvp`, compute Jacobi matrix-vector product.
+    
+  - `paddle.incubate.autograd.Jacobian`, compute Jacobi matrix.
+    
+  - `paddle.incubate.autograd.Hessian`, compute Hessian matrix.
+    
+- Add linear algebra class API
+  
+  - Add `paddle.linalg.triangular_solve`, to compute a system of linear equations with unique solutions through a triangular coefficient. ([#36714](https://github.com/PaddlePaddle/Paddle/pull/36714))
+    
+  - Add `paddle.linalg.eig`, to compute the characteristic decomposition of the general square matrix. ([#35764](https://github.com/PaddlePaddle/Paddle/pull/35764))
+    
+  - Add `paddle.linalg.sovle`, to compute solutions to systems of linear equations. ([#35715](https://github.com/PaddlePaddle/Paddle/pull/35715))
+    
+  - Add `paddle.linalg.lstsq`, to compute least-squares solutions to systems of linear equations. ([#38585](https://github.com/PaddlePaddle/Paddle/pull/38585), [#38621](https://github.com/PaddlePaddle/Paddle/pull/38621))
+    
+  - Add `paddle.linalg.qr`, compute QR decomposition of matrix. ([#35742](https://github.com/PaddlePaddle/Paddle/pull/35742), [#38824](https://github.com/PaddlePaddle/Paddle/pull/38824)）
+    
+  - Add `paddle.inner`, to compute inner product of a matrix. ([#37706](https://github.com/PaddlePaddle/Paddle/pull/37706))
+    
+  - Add `paddle.outer`, to compute outer product of a matrix. ([#37706](https://github.com/PaddlePaddle/Paddle/pull/37706))
+    
+  - Add `paddle.linalg.cov`, to compute covariance between vectors. ([#38392](https://github.com/PaddlePaddle/Paddle/pull/38392))
+    
+  - Add `paddle.linalg.cholesky_sovle`, to compute the cholesky solution of the equation. ([#38167](https://github.com/PaddlePaddle/Paddle/pull/38167))
+    
+  - Add `paddle.linalg.lu` and `paddle.linalg.lu_unpack`, to compute matrix lu decomposition, and decompress lu matrix. ([#38617](https://github.com/PaddlePaddle/Paddle/pull/38617), [#38559](https://github.com/PaddlePaddle/Paddle/pull/38559), [#38616](https://github.com/PaddlePaddle/Paddle/pull/38616))
+    
+- Add 21 new probability distribution class APIs for reinforcement learning, variation inference, scientific computing, and other scenarios. Including 6 random variable distributions, 13 random variable transformations, and 2 KL divergence computing. as listed below: ([#40536](https://github.com/PaddlePaddle/Paddle/pull/40536), [#38820](https://github.com/PaddlePaddle/Paddle/pull/38820), [#38558](https://github.com/PaddlePaddle/Paddle/pull/38558/files), [#38445](https://github.com/PaddlePaddle/Paddle/pull/38445), [#38244](https://github.com/PaddlePaddle/Paddle/pull/38244), [#38047](https://github.com/PaddlePaddle/Paddle/pull/38047))
+  
+  - `paddle.distribution.ExponentialFamily`, exponential distribution family base class.
+    
+  - `paddle.distribution.Beta`, `Beta` distribution.
+    
+  - `paddle.distribution.Dirichlet`, `Dirichlet` distribution.
+    
+  - `paddle.distribution.Independent`, Independent distribution, used to create higher order distributions.
+    
+  - `paddle.distribution.TransformedDistribution`, Transform distribution, used to generate higher-order distributions through the base distribution and a series of transformations.
+    
+  - `paddle.distribution.Multionmial`, a multinomial distribution.
+    
+  - `paddle.distribution.Transform`, base class for transforming random variables.
+    
+  - `paddle.distribution.AbsTransform`, take absolute value transform.
+    
+  - `paddle.distribution.AffineTransform`, affine transform.
+    
+  - `paddle.distribution.ChainTransform`, chain combination of the transform.
+    
+  - `paddle.distribution.ExpTransform`, exponential transform.
+    
+  - `paddle.distribution.IndependentTransform`, independent transform, used to extend the `event_dim` of the transform definition field.
+    
+  - `paddle.distribution.PowerTransform`, power transform.
+    
+  - `paddle.distribution.ReshapeTransform`, `reshape` transform.
+    
+  - `paddle.distribution.SigmoidTransform`, `sigmoid` transform.
+    
+  - `paddle.distribution.SoftmaxTransform`, `softmax` transform.
+    
+  - `paddle.distribution.StackTransform`, `stack` transform, used to combine multiple transforms in a `stack` method.
+    
+  - `paddle.distribution.StickBreakingTransform` , `stickbreaking` transform.
+    
+  - `paddle.distribution.TanhTransform`, `tanh` transform.
+    
+  - `paddle.distribution.kl_divergence`, compute KL divergence.
+    
+  - `paddle.distribution.register_kl`, register user-defined KL divergence calculation function.
+    
+- Add high-level API
+  
+  - Add `paddle.vision.models.AlexNet` and `paddle.vision.models.alexnet`, to use AlexNet models directly. ([#36058](https://github.com/PaddlePaddle/Paddle/pull/36058))
+    
+  - Add `paddle.vision.models.DenseNet`, `paddle.vision.models.densenet121`, `paddle.vision.models.densenet161`, `paddle.vision.models. densenet169`, `paddle.vision.models.densenet201`, and `paddle.vision.models.densenet264`, to use DenseNet models directly. ([#36069](https://github.com/PaddlePaddle/Paddle/pull/36069))
+    
+  - Add `paddle.vision.models.GoogLeNet` and `paddle.vision.models.googlenet`, to use GoogLeNet models directly. ([#36034](https://github.com/PaddlePaddle/Paddle/pull/36034))
+    
+  - Add `paddle.vision.models.InceptionV3`, `paddle.vision.models.inception_v3`, to use InceptionV3 models directly. ([#36064](https://github.com/PaddlePaddle/Paddle/pull/36064))
+    
+  - Add `paddle.vision.models.MobileNetV3Small`, `paddle.vision.models.MobileNetV3Large`, `paddle.vision.models.mobilenet_v3_small`, and `paddle.vision.models.mobilenet_v3_large`, to use MobileNetV3 models directly . ([#38653](https://github.com/PaddlePaddle/Paddle/pull/38653))
+    
+  - Add `paddle.vision.models.ResNeXt`, `paddle.vision.models.resnext50_32x4d`, `paddle.vision.models.resnext50_64x4d`, `paddle.vision.models. paddle.vision.models.resnext101_32x4d`, `paddle.vision.models.resnext101_64x4d`, `paddle.vision.models.resnext152_32x4d`, and `paddle.vision.models.resnext152_64x4d`, to use ResNeXt models directly. ([#36070](https://github.com/PaddlePaddle/Paddle/pull/36070))
+    
+  - Add `paddle.vision.models.ShuffleNetV2`, `paddle.vision.models.shufflenet_v2_x0_25`, `paddle.vision.models.shufflenet_v2_x0_33`, `paddle.vision.models.shufflenet_v2_x0_5`, `paddle.vision.models.shufflenet_v2_x1_0`, `paddle.vision.models.shufflenet_v2_x1_5`, `paddle.vision.models.shufflenet_v2_x2_0`, and `paddle.vision.models.shufflenet_v2_swish`, to use ShuffleNetV2 models directly ([#36067](https://github.com/PaddlePaddle/Paddle/pull/36067))
+    
+  - Add `paddle.vision.models.SqueezeNet`, `paddle.vision.models.squeezenet1_0`, and `paddle.vision.models.squeezenet1_1`, to use SqueezeNet models directly. ([#36066](https://github.com/PaddlePaddle/Paddle/pull/36066))
+    
+  - Add `paddle.vision.models.wide_resnet50_2`, and `paddle.vision.models.wide_resnet101_2`, to use WideResNet models directly. ([#36952](https://github.com/PaddlePaddle/Paddle/pull/36952))
+    
+  - Add `paddle.vision.ops.nms` API, to support single-category and multi-category non-maximum suppression (NMS) algorithms for target detection and prediction task acceleration ([#40962](https://github.com/PaddlePaddle/Paddle/pull/40962))
+    
+  - Add `paddle.vision.ops.roi_pool` and `paddle.vision.ops.RoIPool`, to support RoI region pooling operations in detection tasks. ([#36154](https://github.com/PaddlePaddle/Paddle/pull/36154))
+    
+  - Add `paddle.vision.ops.roi_align` and `paddle.vision.ops.RoIAlign`, to support RoI Align operations in detection tasks. ([#35102](https://github.com/PaddlePaddle/Paddle/pull/36154))
+    
+  - Add `paddle.text.ViterbiDecoder`, and `paddle.text.viterbi_decode` Viterbi decoding API, mainly for sequence tagging model prediction. ([#35778](https://github.com/PaddlePaddle/Paddle/pull/35778))
+    
+- Add 11 Sparse class APIs, to support basic functions, such as creating Sparse Tensor in COO and CRS formats, and add C++ inter-converting with Tensor.
+  
+  - `paddle.sparse.sparse_coo_tensor`，create Sparse Tensor in COO format. ([#40780](https://github.com/PaddlePaddle/Paddle/pull/40780)）
+    
+  - `paddle.sparse.sparse_csr_tensor`，create Sparse Tensor in CSR format. ([#40780](https://github.com/PaddlePaddle/Paddle/pull/40780)）
+    
+  - `paddle.sparse.ReLU`，support ReLU activation layer for SparseCooTensor.（[#40959](https://github.com/PaddlePaddle/Paddle/pull/40959))
+    
+  - `paddle.sparse.functional.relu`，support ReLU function of SparseCooTensor.（[#40959](https://github.com/PaddlePaddle/Paddle/pull/40959))
+    
+  - `Tensor.values()`，c++ method to get non-zero elements of a SparseCooTensor or SparseCsrTensor. （[#40608](https://github.com/PaddlePaddle/Paddle/pull/40608)）
+    
+  - `Tensor.indices()`，c++ method to get the coordinate information of a SparseCooTensor. （[#40608](https://github.com/PaddlePaddle/Paddle/pull/40608)）
+    
+  - `Tensor.crows()`，c++ method to get information about the compressed row information of the SparseCsrTensor.（[#40608](https://github.com/PaddlePaddle/Paddle/pull/40608)）
+    
+  - `Tensor.cols()`，c++ method to get the column information of the SparseCsrTensor （[#40608](https://github.com/PaddlePaddle/Paddle/pull/40608)）
+    
+  - `Tensor.to_sparse_coo()`，c++ method to convert a DenseTensor or SparseCsrTensor to a SparseCooTensor. ([#40780](https://github.com/PaddlePaddle/Paddle/pull/40780)）
+    
+  - `Tensor.to_sparse_csr()`，c++ convert a DenseTensor or SparseCooTensor to a SparseCsrTensor. ([#40780](https://github.com/PaddlePaddle/Paddle/pull/40780)）
+    
+  - `Tensor.to_dense()`，c++ convert a SparseCooTensor or SparseCsrTensor to a DenseTensor. ([#40780](https://github.com/PaddlePaddle/Paddle/pull/40780)）
+    
+- Add hardware related APIs
+  
+  - Add four GPU memory monitoring related APIs: `paddle.device.cuda.max_memory_allocated`, `paddle.device.cuda.max_memory_reserved`, `paddle.device.cuda.memory_allocated`, and `paddle.device.cuda.memory_reserved`, to view and analyze the GPU memory usage in real-time. ([#38657](https://github.com/PaddlePaddle/Paddle/pull/38657))
+    
+  - Add `paddle.device.cuda.get_device_properties`, to return the properties of the GPU device. ([#35661](https://github.com/PaddlePaddle/Paddle/pull/35661))
+    
+  - Add `paddle.device.cuda.get_device_name` and `paddle.device.cuda.get_device_capability`, to return the name and compute capability of the GPU device. ([#35672](https://github.com/PaddlePaddle/Paddle/pull/35672))
+    
+- Add Tensor operation API
+  
+  - Add `paddle.nansum`, to sum input Tensor along `axis` with ignoring the `NaNs` values. ([#38137](https://github.com/PaddlePaddle/Paddle/pull/38137))
+    
+  - Add `paddle.nanmean`,to average input Tensor along `axis` with ignoring the `NaNs` values. （[#40472](https://github.com/PaddlePaddle/Paddle/pull/40472)）
+    
+  - Add `paddle.clone`, to return a copy of the input Tensor and provide gradient calculation. ([#38020](https://github.com/PaddlePaddle/Paddle/pull/38020))
+    
+  - Add `paddle.Tensor.element_size`, to return the number of bytes allocated for a single element in a Tensor. ([#38020](https://github.com/PaddlePaddle/Paddle/pull/38020))
+    
+  - Add `paddle.Tensor.to_uva_tensor`, to convert the numpy objects to be accessed by CUDA objects with virtual addresses, which are stored in CPU memory physically. ([#39146](https://github.com/PaddlePaddle/Paddle/pull/39146), [#38950](https://github.com/PaddlePaddle/Paddle/pull/38950))
+    
+  - Add `paddle.rot90`, to rotate the n-dimensional Tensor by 90 degrees along the plane specified by `axes`. ([#37634](https://github.com/PaddlePaddle/Paddle/pull/37634))
+    
+  - Add `paddle.logit` and `paddle.Tensor.logit`, to compute the logit function values for input Tensor. ([#37844](https://github.com/PaddlePaddle/Paddle/pull/37844))
+    
+  - Add `paddle.repeat_interleave`, to copy the input along the specified axis, and return a new Tensor. ([#37981](https://github.com/PaddlePaddle/Paddle/pull/37981))
+    
+  - Add `paddle.renorm`, to split the Tensor into multiple pieces at the specified `axis` and then perform p norm operations separately. ([#38130](https://github.com/PaddlePaddle/Paddle/pull/38130), [#38459](https://github.com/PaddlePaddle/Paddle/pull/38459))
+    
+  - Add `paddle.mode` and `paddle.Tensor.mode`, to search the values and indices of the input Tensor along the specified axis. ([#38446](https://github.com/PaddlePaddle/Paddle/pull/38446))
+    
+  - Add `paddle.quantile` and `paddle.Tensor.quantile`, to compute the q-quantile of a Tensor along the specified axis. ([#38567](https://github.com/PaddlePaddle/Paddle/pull/38567))
+    
+  - Add `paddle.kthvalue` and `paddle.Tensor.kthvalue`, to find the values and indices of the kth smallest at the specified axis. ([#38386](https://github.com/PaddlePaddle/Paddle/pull/38386))
+    
+  - Add `paddle.is_floating_point` and `paddle.Tensor.is_floating_point`, to determine if the input Tensor is the floating point type. ([#37885](https://github.com/PaddlePaddle/Paddle/pull/37885))
+    
+  - Add `paddle.erfinv` and `paddle.Tensor.erfinv`, to compute the inverse error function of the input Tensor. ([#38295](https://github.com/PaddlePaddle/Paddle/pull/38295))
+    
+  - Add `paddle.lerp` and `paddle.Tensor.lerp`, to compute linear interpolation among the input Tensors based on the given weights. ([#37253](https://github.com/PaddlePaddle/Paddle/pull/37253))
+    
+  - Add `paddle.angle`, to compute the phase angle of a complex Tensor. ([#37689](https://github.com/PaddlePaddle/Paddle/pull/37689))
+    
+  - Add `paddle.rad2deg` and `paddle.Tensor.rad2deg`, to convert each of the elements of input from the angles in radians to the degrees. ([#37598](https://github.com/PaddlePaddle/Paddle/pull/37598))
+    
+  - Add `paddle.deg2rad` and `paddle.Tensor.deg2rad`, to convert each of the elements of input from the degrees in radians to the angles. ([#37598](https://github.com/PaddlePaddle/Paddle/pull/37598))
+    
+  - Add `paddle.gcd` and `paddle.Tensor.gcd`, to compute the greatest common divisors of the absolute values of two inputs by element. ([#37819](https://github.com/PaddlePaddle/Paddle/pull/37819))
+    
+  - Add `paddle.lcm` and `paddle.Tensor.lcm`, to compute the least common multiple of the absolute value of two inputs by element. ([#37819](https://github.com/PaddlePaddle/Paddle/pull/37819))
+    
+  - Add `paddle.amax` and `paddle.Tensor.amax`, to get the maximum value of Tensor elements along the specified dimension. ([#38417](https://github.com/PaddlePaddle/Paddle/pull/38417))
+    
+  - Add `paddle.amin` and `paddle.Tensor.amin`, to get the minimum value of Tensor elements along the specified dimension. ([#38417](https://github.com/PaddlePaddle/Paddle/pull/38417))
+    
+  - Add `paddle.isclose`, to determine if each element of two Tensors is close to each other. ([#37135](https://github.com/PaddlePaddle/Paddle/pull/37135))
+    
+  - Add `paddle.put_along_axis` and `paddle.take_along_axis`, for extracting or placing elements with specified index subscripts. ([#38608](https://github.com/PaddlePaddle/Paddle/pull/38608))
+    
+  - Add `paddle.bincount` and `paddle.Tensor.bincount`, for counting the number of occurrences of each element in a Tensor. ([#36317](https://github.com/PaddlePaddle/Paddle/pull/36317))
+    
+  - Add `paddle.fmax` and `paddle.fmin`, to extend the max/min function to support the case of NaN values in the two Tensors. If there is one NaN value in the corresponding position, return that non-NaN value; if there are two NaN values in the corresponding position, return the NaN value. ([#37826](https://github.com/PaddlePaddle/Paddle/pull/37826))
+    
+  - Add `paddle.diff`, for computing the nth forward difference along a given dimension. It currently supports n=1. ([#37441](https://github.com/PaddlePaddle/Paddle/pull/37441))
+    
+  - Add inverse hyperbolic functions: `paddle.asinh`, `paddle.acosh`, and `paddle.atanh`. ([#37076](https://github.com/PaddlePaddle/Paddle/pull/37076))
+    
+  - Add `paddle.as_real` and `paddle.as_complex` for conversion between real Tensor and complex Tensor. ([#37784](https://github.com/PaddlePaddle/Paddle/pull/37784))
+    
+  - Add `paddle.complex`, for constructing a complex Tensor with the given real and imaginary parts. ([#37918](https://github.com/PaddlePaddle/Paddle/pull/37918), [#38272](https://github.com/PaddlePaddle/Paddle/pull/38272))
+    
+  - Add `paddle.det` and `paddle.slogdet`, to compute the determinant of a matrix and the natural logarithm of the determinant. ([#34992](https://github.com/PaddlePaddle/Paddle/pull/34992))
+    
+  - Add `paddle.nn.utils.parameters_to_vector`, to flatten parameters to a 1-D Tensor. ([#38020](https://github.com/PaddlePaddle/Paddle/pull/38020))
+    
+  - Add `paddle.nn.utils.vector_to_parameters`, to transform a Tensor with 1-D shape to the parameters. ([#38020](https://github.com/PaddlePaddle/Paddle/pull/38020))
+    
+- Add networking class APIs
+  
+  - Add `paddle.nn.Fold` and `paddle.nn.functional.fold`, to extract sliding local area blocks for the Tensors of a batch. ([#38613](https://github.com/PaddlePaddle/Paddle/pull/38613))
+    
+  - Add `paddle.nn.CELU` and `paddle.nn.functional.celu`, to support the CELU activation layer. ([#36088](https://github.com/PaddlePaddle/Paddle/pull/36088))
+    
+  - Add `paddle.nn.HingeEmbeddingLoss`. Add a way to compute hinge embedding loss. It is usually used for nonlinear embedding or semi-supervised learning. ([#37540](https://github.com/PaddlePaddle/Paddle/pull/37540))
+    
+  - Add `paddle.nn.ZeroPad2D` API, for zero-padding according to the padding property. ([#37151](https://github.com/PaddlePaddle/Paddle/pull/37151))
+    
+  - Add `paddle.nn.MaxUnPool3D` and `paddle.nn.MaxUnPool1D`, for computing 3D maximum inverse pooling and 1D maximum inverse pooling. ([#38716](https://github.com/PaddlePaddle/Paddle/pull/38716))
+    
+  - Add `paddle.incubate.graph_khop_sampler`, `paddle.incubate.graph_sample_neighbors`, and `paddle.incubate.graph_reindex` APIs, to support graph multi-order neighbor sampling and graph reindexing operations. They are mainly used for graph neural network model training. ([#39146](https://github.com/PaddlePaddle/Paddle/pull/39146), [#40809](https://github.com/PaddlePaddle/Paddle/pull/40809))
+    
+- Add random number class APIs
+  
+  - Add `paddle.poisson`, to generate a Tensor that obeys Poisson distributed with the lambda parameter. ([#38117](https://github.com/PaddlePaddle/Paddle/pull/38117))
+    
+  - Add `paddle.randint_like` API, to generate a new Tensor that obeys uniform distribution in the range [low, high), with the shape of the output matching the shape of the input. ([#36169](https://github.com/PaddlePaddle/Paddle/pull/36169))
+    
+  - Add `paddle.Tensor.exponential_`. It is an inplace style API that populates the input Tensor with exponentially distributed random numbers. ([#38256](https://github.com/PaddlePaddle/Paddle/pull/38256))
+    
+- Add parameter initialization class APIs
+  
+  - Add `paddle.nn.initializer.Dirac`, to initialize 3D/4D/5D parameters with Dirac delta functions. It is commonly used for initialization of Conv1D/Conv2D/Conv3D parameters in the convolution layer. ([#37389](https://github.com/PaddlePaddle/Paddle/pull/37389))
+    
+  - Add `paddle.nn.initializer.Orthogonal` for orthogonal matrix initialization. The initialized parameter is the (semi-) orthogonal vector. ([#37163](https://github.com/PaddlePaddle/Paddle/pull/37163))
+    
+  - Add `paddle.nn.initializer.calculate_gain`, to get the recommended gain value for the activation function. The gain value can be used to set certain initialization APIs to adjust the initialization range. ([#37163](https://github.com/PaddlePaddle/Paddle/pull/37163))
+    
+- Add learning rate class API
+  
+  - Add `paddle.optimizer.lr.MultiplicativeDecay`, to provide the `lambda` function to set the learning rate. ([#38250](https://github.com/PaddlePaddle/Paddle/pull/38250))
+- Add distributed-related APIs
+  
+  - Add `paddle.incubate.optimizer.DistributedFusedLamb`, to allow the Lamb optimizer to update parameters distributedly. ([#40011](https://github.com/PaddlePaddle/Paddle/pull/40011), [#39972](https://github.com/PaddlePaddle/Paddle/pull/39972), [#39900](https://github.com/PaddlePaddle/Paddle/pull/39900), [#39747](https://github.com/PaddlePaddle/Paddle/pull/39747), [#39148](https://github.com/PaddlePaddle/Paddle/pull/39148), [#39416](https://github.com/PaddlePaddle/Paddle/pull/39416))
+- Add new optimizer-related APIs([#40710](https://github.com/PaddlePaddle/Paddle/pull/40710))
+  
+  - `paddle.incubate.optimizer.functional.minimize_bfgs`，add second-order optimizer BFGS.
+    
+  - `paddle.incubate.optimizer.functional.minimize_lbfgs`，add second-order optimizer L-BFGS.
+    
+- Add `paddle.incubate.multiprocessing` module, to provide Tensor (CPU/GPU) data transfer between python processes. ([#37302](https://github.com/PaddlePaddle/Paddle/pull/37302), [#41339](https://github.com/PaddlePaddle/Paddle/pull/41339))
+  
 
 #### IR(Intermediate Representation)
 
-- Dynamic graph to static graph 
-  - Add the dynamic to static transcription error type recognition, and give suggestions for modification. ([#35648](https://github.com/PaddlePaddle/Paddle/pull/35648)) 
-  - Add the support for mixed precision training. ``@to_static`` c supports one-click conversion to mixed precision training mode for static graphs.  ([#34562](https://github.com/PaddlePaddle/Paddle/pull/34562))
-  - Add the ``build_strategy`` parameter in ``@to_static``. Support customizing the ``Pass`` optimization strategy to accelerate model training after dynamic to static, such as operator fusion, etc.   ([#34347](https://github.com/PaddlePaddle/Paddle/pull/34347))
-  - Add the support for `a, b = static_variable`.  ([#33499](https://github.com/PaddlePaddle/Paddle/pull/33499))
-  - Add the support for second-order derivatives.  ([#33110](https://github.com/PaddlePaddle/Paddle/pull/33110))
-
-- Program and Graph conversion: ``Program`` and ``Graph`` are the intermediate representations used to express computations in the underlying framework of the PaddlePaddle, or developers of the PaddlePaddle, it is sometimes necessary to convert ``Program`` and ``Graph`` to each other for computational processing. This feature adds the ability to convert ``Program`` and ``Graph`` to each other.  
-  - Develop and refine the ``Program`` and ``Graph`` interconversion feature.  ([#33949](https://github.com/PaddlePaddle/Paddle/pull/33949))
-  - In order to support control flow nodes such as `while`, the `Program` of the PaddlePaddle Framework may contain multiple sub-`blocks` in addition to the main `block`. Previously, in the conversion of `Program` to `Graph`, only convert the main `block` to `Graph`. In this update, modify the `Graph`, to support the expression of sub-`blocks` to achieve a complete conversion of `Program` to `Graph`.  ([#33320](https://github.com/PaddlePaddle/Paddle/pull/33320))
-  - Provide dependent helper functions needed to analyze the control flow in `Program`.  ([#33439](https://github.com/PaddlePaddle/Paddle/pull/33439))
-  - `Program` and `Graph` retain the values of the `stop_gradient` and `persistable` attributes needed for training after converting each other.  ([#33771](https://github.com/PaddlePaddle/Paddle/pull/33771)) 
-  - `Pass` now supports processing the main `Graph` and all its sub-graphs, while the original `Pass` only processed the main `Graph` and ignored the sub-graphs.  ([#34158](https://github.com/PaddlePaddle/Paddle/pull/34158)) 
-  - Handle some topological ordering problems for `Program` and `Graph` inter-conversion in the prediction cases. ([#34121](https://github.com/PaddlePaddle/Paddle/pull/34121), [#34521](https://github.com/PaddlePaddle/Paddle/pull/34521)).
-
-- Pass development  
-  - Add the Pass development for subgraph replacement scenarios such as fusion on the Python side.  ([#35708](https://github.com/PaddlePaddle/Paddle/pull/35708), [#35602](https://github.com/PaddlePaddle/Paddle/pull/35602))
-
-- Kernel Primitive API	
-  - Abstract and encapsulate the underlying codes in the operator Kernel implementation, to provide high-performance Block-level IO and Compute operations. The Kernel development using the Kernel Primitive API allows you to focus more on the implementation of the computational logic, significantly reducing the amount of codes while ensuring performance, and decoupling operator computation from hardware.  ([#34672](https://github.com/PaddlePaddle/Paddle/pull/34672),  [#35075](https://github.com/PaddlePaddle/Paddle/pull/35075),  [#34456](https://github.com/PaddlePaddle/Paddle/pull/34456),  [#35282](https://github.com/PaddlePaddle/Paddle/pull/35282),  [#35743](https://github.com/PaddlePaddle/Paddle/pull/35743),  [#34208](https://github.com/PaddlePaddle/Paddle/pull/34208))
-  - Add a total of 13 monadic and binary computation Functors to the Kernel Primitive API.  ([#36418](https://github.com/PaddlePaddle/Paddle/pull/36418))
-  - Modify the ReadData implementation in the Kernel Primitive API to fix the NX ! =1 access memory out-of-bound bug.  ([#36373](https://github.com/PaddlePaddle/Paddle/pull/36373))
+- Dynamic graph to static graph
+  
+  - For the variable type StaticAnalysis module, add support for type tag similar to `a, b = paddle.shape(x)` . ([#39245](https://github.com/PaddlePaddle/Paddle/pull/39245))
+    
+  - Add a computed field, supporting `InputSpec.name` as the Program cache hash key. ([#38273](https://github.com/PaddlePaddle/Paddle/pull/38273))
+    
+  - Add syntax for supporting `dict['key'] = x.shape`. ([#40611](https://github.com/PaddlePaddle/Paddle/pull/40611))
+    
+  - Add the support for Pure FP16 training. ([#36944](https://github.com/PaddlePaddle/Paddle/pull/36944))
+    
+  - Add the support `for i in [x,y,z]` syntax. ([#37259](https://github.com/PaddlePaddle/Paddle/pull/37259))
+    
+  - Add the support for type hint syntax of python3. ([#36544](https://github.com/PaddlePaddle/Paddle/pull/36544))
+    
+- Pass development
+  
+  - Add forward and backward fusion for FC + [relu|gelu] based on NVIDIA cuBlasLt Epilogue. ([#39437](https://github.com/PaddlePaddle/Paddle/pull/39437)）
+- Kernel Primitive API
+  
+  - Add KP operators on GPU platform, including cast, scale, clip, bce_loss, abs_grad, reduce_sum_grad, reduce_mean_grad, clip, bce_loss, full, full_like, distribution, random , masked_select_kernel, where_index, masked_select_grad, dropout, sigmoid, where, and abs_grad. ([#36203](https://github.com/PaddlePaddle/Paddle/pull/36203), [#36423](https://github.com/PaddlePaddle/Paddle/pull/36423), [#39390](https://github.com/PaddlePaddle/Paddle/pull/39390), [#39734](https://github.com/PaddlePaddle/Paddle/pull/39734), [#38500](https://github.com/PaddlePaddle/Paddle/pull/38500), [#38959](https://github.com/PaddlePaddle/Paddle/pull/38959), [#39197](https://github.com/PaddlePaddle/Paddle/pull/39197/), [#39563](https://github.com/PaddlePaddle/Paddle/pull/39563), [#39666](https://github.com/PaddlePaddle/Paddle/pull/39666), [#40517](https://github.com/PaddlePaddle/Paddle/pull/40517), [#40617](https://github.com/PaddlePaddle/Paddle/pull/40617), [#40766](https://github.com/PaddlePaddle/Paddle/pull/40766), [#39898](https://github.com/PaddlePaddle/Paddle/pull/39898), [#39609](https://github.com/PaddlePaddle/Paddle/pull/39609))
+    
+  - Add the support for XPU2 source code compilation mode. ([#37254](https://github.com/PaddlePaddle/Paddle/pull/37254), [#40397](https://github.com/PaddlePaddle/Paddle/pull/40397), [#38455](https://github.com/PaddlePaddle/Paddle/pull/38455))
+    
+  - Add the support for KP operator reuse on XPU2 and GPU, including reduce, broadcast, elementwise_add, `exp、log、relu、sigmoid、leaky_relu、softplus、hard_swish、reciprocal`。([#36904](https://github.com/PaddlePaddle/Paddle/pull/36904), [#37226](https://github.com/PaddlePaddle/Paddle/pull/37226), [#38918](https://github.com/PaddlePaddle/Paddle/pull/38918), [#40560](https://github.com/PaddlePaddle/Paddle/pull/40560/), [#39787](https://github.com/PaddlePaddle/Paddle/pull/39787), [#39917](https://github.com/PaddlePaddle/Paddle/pull/39917), [#40002](https://github.com/PaddlePaddle/Paddle/pull/40002), [#40364](https://github.com/PaddlePaddle/Paddle/pull/40364))
+    
+  - Add unit tests of KP operators on the XPU2 platform, including `brelu、ceil、celu、elu、floor、hard_shrink、hard_sigmoid、log1p、logsigmoid、relu6、silu、soft_relu、softsign、sqrt、square、swish、thresholded_relu、softshrink`。([#40448](https://github.com/PaddlePaddle/Paddle/pull/40448), [#40524](https://github.com/PaddlePaddle/Paddle/pull/40524))
+    
+  - Add the support for XPU2 KP models, including resnet50, deepfm, wide_deep, yolov3-darknet53, det_mv3_db, bert, transformer, mobilenet_v3, and GPT2.
+    
 
 #### **Mixed Precision Training**
 
-- Enhance the dynamic graph mixed precision. Add a way to use half-precision (float16) training for the whole task. The computational efficiency under the main task increases by 20%. ([#35521](https://github.com/PaddlePaddle/Paddle/pull/35521))
-- In the dynamic graph mixed precision ``paddle.amp.GradScaler``, add the ``get`` and ``set`` methods for user-friendly settings. ([#33835](https://github.com/PaddlePaddle/Paddle/pull/33835))
-- In the dynamic graph mixed precision ``paddle.amp.GradScaler``, add the ``state_dict`` and ``load_state_dict`` methods.  ([#34300](https://github.com/PaddlePaddle/Paddle/pull/34300))
-- In the dynamic graph mixed precision, split ``minimize`` to ``step + update``. In addition, add the ``unscale`` method.  ([#35927](https://github.com/PaddlePaddle/Paddle/pull/35927))
-- In the dynamic graph mixed precision training, support param group. ([#34899](https://github.com/PaddlePaddle/Paddle/pull/34899))
-- In the static graph mixed precision training, support the gradient pruning.   ([#33565](https://github.com/PaddlePaddle/Paddle/pull/33565))
-
-
-#### **Distributed training**
-
-- Basic functions of distributed training
-  - Add `paddle.DataParallel.no_sync`, to pause multi-card communication and gradient synchronization under dynamic graph data parallelism.  ([#34740](https://github.com/PaddlePaddle/Paddle/pull/34740)) 
-  - Add the `paddle.distributed.launch`, to start the mode support for fault tolerance, and implement fault tolerance for nodes in `collective` mode.  ([#33369](https://github.com/PaddlePaddle/Paddle/pull/33369),  [#34572](https://github.com/PaddlePaddle/Paddle/pull/34572))
-  - In the distributed training API `paddle.static.Executor.train_from_dataset`, `paddle.static.Executor.infer_from_dataset`, add the dump function for parameters and intermediate variables of the model during training. [#34457](https://github.com/PaddlePaddle/Paddle/pull/34457) 
-  - In the hybrid parallel, support the combination of model parallel and data parallel. ([#34377](https://github.com/PaddlePaddle/Paddle/pull/34377))
-  - Add the distributed policy `gradient scale` option. Users can specify the way of `gradient scale`: `avg`, `sum` or custom. ([#33862](https://github.com/PaddlePaddle/Paddle/pull/33862))
-  - Add `paddle.distributed.parallel_with_gloo`, support CPU barrier operation.  ([#34671](https://github.com/PaddlePaddle/Paddle/pull/34671))
-  - For the GPU parameter servers add the training profiler function. ([#32640](https://github.com/PaddlePaddle/Paddle/pull/32640))
-  - For the GPU parameter server, add the pipeline function. The training performance can increase by 40%.  [#33159](https://github.com/PaddlePaddle/Paddle/pull/33159)  
-  - For the static graph hybrid parallel, add the `dp_as_optimizer_sharding` experimental feature that can parallelize data as optimizer parameter sharding. This can save the optimizer state GPU memory usage. ([#35593](https://github.com/PaddlePaddle/Paddle/pull/35593))
-  - For the static graph pipeline parallel executor, support the `LRScheduler`.  ([#34402](https://github.com/PaddlePaddle/Paddle/pull/34402))
-  - Add the `paddle.fluid.core.GraphPyClient.set_node_feat`, to support for setting graph node features in the graph engine client, support the storage of multiple types of features.  ([#34994](https://github.com/PaddlePaddle/Paddle/pull/34994))
-  - Improve the performance of the graph engine graph node neighbor sampling algorithm, and optimize the execution of the graph wandering algorithm. ([#34088](https://github.com/PaddlePaddle/Paddle/pull/34088))
-  - Implement the unified dynamic-static mode for the model parallel interfaces `paddle.distributed.fleet.meta_parallel.ColumnParallelLinear`, `paddle.distributed.fleet.meta_parallel.RowParallelLinear`, `paddle.distributed.fleet.meta_parallel.VocabParallelEmbedding`, and `paddle.distributed.fleet.meta_parallel.ParallelCrossEntropy`.  ([#33700](https://github.com/PaddlePaddle/Paddle/pull/33700),  [#33411](https://github.com/PaddlePaddle/Paddle/pull/33411))
-  - Add the distributed model parallel `cpu c_embedding` op.  ([#35467](https://github.com/PaddlePaddle/Paddle/pull/35467))
-  - Change to the retry mechanism for getting gethostbyname when gen_comm_id is added to the initialization phase of the distributed communication. ([#34855](https://github.com/PaddlePaddle/Paddle/pull/34855))
-  - Add the switch configuration for `scale_sparse_gradient_with_batch_size` during `fleet` gradient update, to determine whether the gradient is multiplied by `batch_size`.   ([#34893](https://github.com/PaddlePaddle/Paddle/pull/34893))
-
-- Dynamic graph hybrid parallel 
-  - In dynamic graph distributed data parallel scenarios, add the `paddle.distributed.fleet.dygraph_optimizer.DygraphShardingOptimizer` interface. Optimize the GPU memory occupation through the sharding optimizer between cards. Support the larger model or batch size.  ([#33633](https://github.com/PaddlePaddle/Paddle/pull/33633))
-  - For the dynamic graph Sharding, support the MP-PP-DP for dynamic graph 4D hybrid parallelism. ([#35580](https://github.com/PaddlePaddle/Paddle/pull/35580))
-  - For the dynamic graph Recompute, support mixed precision computation. ([#33251](https://github.com/PaddlePaddle/Paddle/pull/33251))
-  - For the pipeline parallel, support 1f1b scheduling policy for runtime memory savings.  ([#34483](https://github.com/PaddlePaddle/Paddle/pull/34483))
-  - For the dynamic graph 3D hybrid parallel, support the recompute policy. Support the offload function.  ([#34607](https://github.com/PaddlePaddle/Paddle/pull/34607) [#35588](https://github.com/PaddlePaddle/Paddle/pull/35588))
-  - For the dynamic graph 3D Hybrid Parallel, support model saving and loading.  ([#34768](https://github.com/PaddlePaddle/Paddle/pull/34768))
-  - Add the scatter-gather scheme for model parallel + pipeline parallel scenarios. Optimize the cross-machine communication performance.  ([#34130](https://github.com/PaddlePaddle/Paddle/pull/34130))
-  - For the pipeline parallel, support the slice based on the number of layers to ensure more equal sharding.  ([#34207](https://github.com/PaddlePaddle/Paddle/pull/34207))
-  - For the pipeline parallel, support the automatic mixing precision. ([#33951](https://github.com/PaddlePaddle/Paddle/pull/33951))
-  - For the pipeline parallel, add the `paddle.distributed.fleet.meta_parallel.SharedLayerDesc` the networking description, to support the parameter sharing networking mode. ([#33578](https://github.com/PaddlePaddle/Paddle/pull/33578))
-  - For the tensor parallel, add `paddle.distributed.fleet.meta_parallel.ParallelCrossEntropy`, for a tensor parallel computation method that supports cross-entropy Loss.  ([#33401](https://github.com/PaddlePaddle/Paddle/pull/33401))
-  - For the `paddle.DataParallel`, add the `find_unused_parameters` interface, to support the use of control flow in the model in the data parallel mode. ([#32826](https://github.com/PaddlePaddle/Paddle/pull/32826))
-  - In the data parallel mode, add the port waiting feature to solve port conflict problem.  ([#34207](https://github.com/PaddlePaddle/Paddle/pull/34207))
-
-- Static graph hybrid parallel
-  - Support the fuse grad merge function under pipeline parallel. Through the `distributed_strategy.fuse_grad_merge` switch control, the performance increases by about 5%.   ([#35004](https://github.com/PaddlePaddle/Paddle/pull/35004))
-  - Support the fuse allreduce sum function with enabling dp in the mixed parallel, the performance increases by 3%. ([#34480](https://github.com/PaddlePaddle/Paddle/pull/34480))
-
-- Automatic parallel  
-  - Add the auto-parallel `shard_tensor`, `shard_op` interfaces.(#33804, #35765). Support semi-automatic parallel based on user tags.
-  - Add the auto-completion distributed attribute feature. Support completing all untagged distributed attributes based on user-tagged distributed attributes. ([#34813](https://github.com/PaddlePaddle/Paddle/pull/34813))
-  - Add the auto-slice serial `Program` function. ([#35117](https://github.com/PaddlePaddle/Paddle/pull/35117))
-  - Enable the automatic parallel adaptation of the Fleet API. ([#35483](https://github.com/PaddlePaddle/Paddle/pull/35483))
-
-
-#### **Others**  
-
-- Model quantization  
-  - Add the offline quantization of dynamic graphs. ([#33445](https://github.com/PaddlePaddle/Paddle/pull/33445),  [#33898](https://github.com/PaddlePaddle/Paddle/pull/33898), [#33962](https://github.com/PaddlePaddle/Paddle/pull/33962),  [#35015](https://github.com/PaddlePaddle/Paddle/pull/35015))
-  - Refactor the statistical output quantization information module in the dynamic graph quantization training function, to allow the availability on the prediction side to improve the robustness. ([#31680](https://github.com/PaddlePaddle/Paddle/pull/31680), [#31710](https://github.com/PaddlePaddle/Paddle/pull/31710), [#31861](https://github.com/PaddlePaddle/Paddle/pull/31861))
-  - For the dynamic graph quantization training, support the use in combination with mixed precision training. ([#33484](https://github.com/PaddlePaddle/Paddle/pull/33484))
-  - For the dynamic graph quantization training function, support the quantization of Function class API. ([#33162](https://github.com/PaddlePaddle/Paddle/pull/33162), [#33871](https://github.com/PaddlePaddle/Paddle/pull/33871))
-  - Support the distributed quantization training in static graph mode. ([#33781](https://github.com/PaddlePaddle/Paddle/pull/33781))
-  - Support the quantization of conv2d_transpose in dynamic graph mode. ([#34547](https://github.com/PaddlePaddle/Paddle/pull/34547))
-
-- Custom OP
-  - Add the custom operator DCU back-end support. ([#34050](https://github.com/PaddlePaddle/Paddle/pull/34050))
-
-- Cost Model
-  - Add the Paddle CostModel, to implement the method to get op time cost via Profiler.  ([#35774](https://github.com/PaddlePaddle/Paddle/pull/35774)) 
-
-- Model saving and loading
-  - Add the function of saving Layer's non-forward member methods and related parameters as inference models directly via the ``paddle.jit.save`` interface.  ([#34070](https://github.com/PaddlePaddle/Paddle/pull/34070))
-
-
-- ONNX Exporter 
-  - Add 8 operator adaptations: `softplus`, `elementwise_mod`, `elementwise_floordiv`, `p_norm`, `depthwise_transpose`, `group_norm`, `pixel_shuffle, top_k`. ([Paddle2ONNX#252](https://github.com/PaddlePaddle/Paddle2ONNX/pull/252),  [Paddle2ONNX#261](https://github.com/PaddlePaddle/Paddle2ONNX/pull/261),  [Paddle2ONNX#293](https://github.com/PaddlePaddle/Paddle2ONNX/pull/293))
-  - Add 8 detection model exports: PPYOLO, PPYOLOv2, PPYOLO-Tiny, TTFNet, PAFNet, FCOS, SSD.   ([Paddle2ONNX#252](https://github.com/PaddlePaddle/Paddle2ONNX/pull/252))
-
-### **(2) Function optimization**  
+- Split the `paddle.amp.GradScaler.unscale_` method from the `minimize` of the mixed precision training `paddle.amp.GradScaler`, to provide a separate interface for recovering the loss. ([#35825](https://github.com/PaddlePaddle/Paddle/pull/35825))
+  
+- Add the FP16 support for `paddle.nn.ClipByGlobalNorm` dynamic graph mode. Add FP16 Kernel for clip op to enable clip-related operations to support FP16 compute. ([#36198](https://github.com/PaddlePaddle/Paddle/pull/36198), [#36577](https://github.com/PaddlePaddle/Paddle/pull/36577))
+  
+- Support the case that the `optimizer` parameter transferred from `paddle.amp.decorate` is Nan. ([#37541](https://github.com/PaddlePaddle/Paddle/pull/37541))
+  
+- For the merged_momentum op，add the support of input multiple learning rates ， the computing for use_nesterov policy and the regularization computing . ([#37527](https://github.com/PaddlePaddle/Paddle/pull/37527))
+  
+- Add multi_tensor policy to `paddle.optimizer.Momentum` optimizer. Add `set_to_zero` branch to `clear_grad` of `Optimzizer` class. ([#37564](https://github.com/PaddlePaddle/Paddle/pull/37564))
+  
+- Add multi_tensor policy to `paddle.optimizer.Adam` . ([#38010](https://github.com/PaddlePaddle/Paddle/pull/38010))
+  
+- Add multi_precision policy to `paddle.optimizer.SGD` optimizer. ([#38231](https://github.com/PaddlePaddle/Paddle/pull/38231))
+  
+- Add the storage `master weight` parameter to the optimizer `state_dict` method. ([#39121](https://github.com/PaddlePaddle/Paddle/pull/39121))
+  
+- Add support for op CUDA bfloat16 mixed precision training. Support for O1 and O2 modes. Enable the above training modes via `paddle.amp.auto_cast` . ([#39029](https://github.com/PaddlePaddle/Paddle/pull/39029), [#39815](https://github.com/PaddlePaddle/Paddle/pull/39815))
+  
+- Add bfloat16 CUDA Kernel for the following ops: matmul, concat, split, dropout, reshape, slice, squeeze, stack, transpose, unbind, elementwize_max, elementwize_add, elementwize_mul, elementwize_sub, scale, sum, layer_norm, p_norm, reduce_sum, softmax, log_softmax, sigmoid, sqrt, softplus, square, gaussian_random, fill_constant, and fill_any_like. ([#39485](https://github.com/PaddlePaddle/Paddle/pull/39485), [#39380](https://github.com/PaddlePaddle/Paddle/pull/39380), [#39395](https://github.com/PaddlePaddle/Paddle/pull/39380), [#39402](https://github.com/PaddlePaddle/Paddle/pull/39402), [#39457](https://github.com/PaddlePaddle/Paddle/pull/39457), [#39461](https://github.com/PaddlePaddle/Paddle/pull/39461), [#39602](https://github.com/PaddlePaddle/Paddle/pull/39602), [#39716](https://github.com/PaddlePaddle/Paddle/pull/39716), [#39683](https://github.com/PaddlePaddle/Paddle/pull/39683), [#39843](https://github.com/PaddlePaddle/Paddle/pull/39843), [#39999](https://github.com/PaddlePaddle/Paddle/pull/39999), [#40004](https://github.com/PaddlePaddle/Paddle/pull/40004), [#40027](https://github.com/PaddlePaddle/Paddle/pull/40027))
+  
+- Add bfloat16 CPU Kernel for the following ops: dropout, reshape, slice, squeeze, unsqueeze, stack, transpose, unbind, elementwize_max, elementwise_mul, elementwise_sub, and gather. ([#39380](https://github.com/PaddlePaddle/Paddle/pull/39380), [#39395](https://github.com/PaddlePaddle/Paddle/pull/39380), [#39402](https://github.com/PaddlePaddle/Paddle/pull/39402), [#39457](https://github.com/PaddlePaddle/Paddle/pull/39457), [#39461](https://github.com/PaddlePaddle/Paddle/pull/39461), [#39602](https://github.com/PaddlePaddle/Paddle/pull/39602), [#39716](https://github.com/PaddlePaddle/Paddle/pull/39716), [#39683](https://github.com/PaddlePaddle/Paddle/pull/39683))
+  
+- Support printing of Tensor with data of bfloat16. ([#39375](https://github.com/PaddlePaddle/Paddle/pull/39375), [#39370](https://github.com/PaddlePaddle/Paddle/pull/39370))
+  
+- Add support for FP16 computation for `p_norm` , `elementwise_max` , and `fill_constant_batch_size_like ``scatter` . ([#35888](https://github.com/PaddlePaddle/Paddle/pull/35888), [#39907](https://github.com/PaddlePaddle/Paddle/pull/39907), [#38136](https://github.com/PaddlePaddle/Paddle/pull/38136), [#38499](https://github.com/PaddlePaddle/Paddle/pull/38499))
+  
+- Add support for int16_t for the following ops: cumsum, less_than, less_equal, greater_than, greater_equal, equal, not_equal, fill_any_like, grather_nd reduce_sum, where_index, reshape, and unsqueeze. ([#39636](https://github.com/PaddlePaddle/Paddle/pull/39636))
+  
+- Add support for int16_t label type for cross_entropy op. ([#39409](https://github.com/PaddlePaddle/Paddle/pull/39409))
+  
+- Add support for int16_t id type for embedding op. ([#39381](https://github.com/PaddlePaddle/Paddle/pull/39381))
+  
+- Add support for FP16 type for reduce_mean op. ([#38289](https://github.com/PaddlePaddle/Paddle/pull/38289))
+  
+- Add support for FP16 type for elementwise_min op. ([#38123](https://github.com/PaddlePaddle/Paddle/pull/38123))
+  
+- Update bfloat16 AMP oneDNN default support list. ([#39304](https://github.com/PaddlePaddle/Paddle/pull/39304))
+  
 
-#### API
+#### **Paddle HIgh reusability operator library**
 
--   `paddle.slice`: Add the support for `bool` type Tensor and optimize error messages. ([#35586](https://github.com/PaddlePaddle/Paddle/pull/35586), [#35179](https://github.com/PaddlePaddle/Paddle/pull/35179))
--   `paddle.strided_slice`: Add the support for `TensorArray` type input, and adjust the output when `step< 0`. The adjusted result is consistent with `numpy`.  ([#34205](https://github.com/PaddlePaddle/Paddle/pull/34205), [#34172](https://github.com/PaddlePaddle/Paddle/pull/34172))
--   ``paddle.multiply``: Support ``bool`` data type operations.  ([#35551](https://github.com/PaddlePaddle/Paddle/pull/35551))
--   Logical operations (``paddle.logical_not``, ``paddle.logical_and``, ``paddle.logical_or``, ``paddle.logical_xor``): Support non-``bool`` data types (``int8, int16, int32, int64, float, double``).   ([#34141](https://github.com/PaddlePaddle/Paddle/pull/34141))
--   ``paddle.transpose``: Support ``bool`` type operations. ([#35886](https://github.com/PaddlePaddle/Paddle/pull/35886))
--   ``paddle.strided_slice``: Support ``bool`` type operations.  ([#33373](https://github.com/PaddlePaddle/Paddle/pull/33373))
--   ``paddle.set_printoptions``: Support the setting of ``linewidth`` to print ``Tensor``.  ([#35175](https://github.com/PaddlePaddle/Paddle/pull/35175))
--   ``paddle.to_tensor``: Support ``LoDTensor``.  ([#33027](https://github.com/PaddlePaddle/Paddle/pull/33027))
--   ``paddle.linalg.det`` and ``paddle.linalg.slogdet``: Support inverse operations. ([#36013](https://github.com/PaddlePaddle/Paddle/pull/36013))
--   ``paddle.nn.functional.pad``: Support the input of tuple type pad parameter in case of full dimensional pads.  ([35985](https://github.com/PaddlePaddle/Paddle/pull/35985))
--   Optimize error report messages when ``paddle.nn.functional.pad`` input is abnormal.  ([34979](https://github.com/PaddlePaddle/Paddle/pull/34979))
--   For the static graph, support partial ``program``, and generate the corresponding reverse ``program``.  ([#34395](https://github.com/PaddlePaddle/Paddle/pull/34395))
--   oneDNN function optimization
-    - Add the support for oneDNN kernels with multiple operators, including ``clip``, ``slice``, ``split``, ``cast``, ``scale``, ``expand_v2``, ``sigmoid, matmul_v2``, ``PRelu`` forward and reverse oneDNN FP32, and oneNheN BF16. ([#35601](https://github.com/PaddlePaddle/Paddle/pull/35601), [#34332](https://github.com/PaddlePaddle/Paddle/pull/34332), [#34284](https://github.com/PaddlePaddle/Paddle/pull/34284), [#34216](https://github.com/PaddlePaddle/Paddle/pull/34216), [#34192](https://github.com/PaddlePaddle/Paddle/pull/34192),  [#33878](https://github.com/PaddlePaddle/Paddle/pull/33878), [#33584](https://github.com/PaddlePaddle/Paddle/pull/33584), [#33056](https://github.com/PaddlePaddle/Paddle/pull/33056), [#32975](https://github.com/PaddlePaddle/Paddle/pull/32975))
-    - Add the implementation of Selected rows in SGD operator by using oneDNN AXPY. ([33632](https://github.com/PaddlePaddle/Paddle/pull/33632))
--   Support for ``bfloat16`` data type on the GPU with the Ampere architecture. ([#31232](https://github.com/PaddlePaddle/Paddle/pull/32132), [#32221](https://github.com/PaddlePaddle/Paddle/pull/32221), [#32542](https://github.com/PaddlePaddle/Paddle/pull/32542))
--   On the ``Conv`` operator, set the using of Tensor Core on the GPU with Ampere architecture. ([#34409](https://github.com/PaddlePaddle/Paddle/pull/34409))
--   Support ``paddle.device.cuda.current_stream().cuda_stream`` to get bare pointers.  ([#35813](https://github.com/PaddlePaddle/Paddle/pull/35813))
--   Add the ``paddle.optimizer.AdamW`` GPU fuse kernel, to support the layerwise learning rate function.  ([#35020](https://github.com/PaddlePaddle/Paddle/pull/35020), [#35569](https://github.com/PaddlePaddle/Paddle/pull/35569))
--   Support for using the Nvidia's cusparse library function in paddle. ([#35675](https://github.com/PaddlePaddle/Paddle/pull/35675))
--   Add ``paddle.full`` to support the ``int16`` type. ([#35619](https://github.com/PaddlePaddle/Paddle/pull/35619))
--   Optimize the GPU memory usage of ``paddle.nn.ClipGradByGlobalNorm``. ([#34586](https://github.com/PaddlePaddle/Paddle/pull/34586))
--   `reduce_sum` operator supports float16 type ([#32966](https://github.com/PaddlePaddle/Paddle/pull/32966))
--   `paddle.nn.CTCLoss`: Add two grad norm methods: `norm_by_total_logits_len` and `norm_by_batchsize`.  ([#34729](https://github.com/PaddlePaddle/Paddle/pull/34729/)) 
--   Add the public API recommended usages under each path. ([#33313](https://github.com/PaddlePaddle/Paddle/pull/33313), [#33308](https://github.com/PaddlePaddle/Paddle/pull/33308), [#32759](https://github.com/PaddlePaddle/Paddle/pull/32759), [#32695](https://github.com/PaddlePaddle/Paddle/pull/32695), [#32643](https://github.com/PaddlePaddle/Paddle/pull/32643), [#31912](https://github.com/PaddlePaddle/Paddle/pull/31912), [#32650](https://github.com/PaddlePaddle/Paddle/pull/32650), [#32034](https://github.com/PaddlePaddle/Paddle/pull/32034), [#33897](https://github.com/PaddlePaddle/Paddle/pull/33897)) 
--   Restore the original API accessibility under the `paddle.vision` path. ([#34432](https://github.com/PaddlePaddle/Paddle/pull/34432))
--   `paddle.vision.ops.deform_conv2d, paddle.vision.ops.DeformConv2D` : Add the support for the double input type.  ([#35330](https://github.com/PaddlePaddle/Paddle/pull/35330))
--   `paddle.fluid.contrib.layers.shuffle_batch` : Add the GPU Kernel implementation.  [#33938](https://github.com/PaddlePaddle/Paddle/pull/33938) 
--   For the existing APIs, add the public call paths `paddle.linalg.cholesky`, `paddle.linalg.norm`, and `paddle.linalg.inv`. ([#33420](https://github.com/PaddlePaddle/Paddle/pull/33420)) 
--   `paddle.reshape`: Support turning an empty `Tensor` shape into an empty `Tensor` of another shape. ([#36087](https://github.com/PaddlePaddle/Paddle/pull/36087))
--   `paddle.equal`: Add the support for `int`, `float`, and `bool` types for the second input. ([#35695](https://github.com/PaddlePaddle/Paddle/pull/35695))
--   ``paddle.io.DataLoader``: Add the support for persistent_worker mode. ([#34017](https://github.com/PaddlePaddle/Paddle/pull/34017))
--   Optimize ``l2_normalize``, ``p_norm``, ``elementwise_max``, ``prelu,clip_by_norm``, ``lars optimizer`` operators support the float16 computation.  ([#35576](https://github.com/PaddlePaddle/Paddle/pull/35576), [#35888](https://github.com/PaddlePaddle/Paddle/pull/35888), [#35888](https://github.com/PaddlePaddle/Paddle/pull/35888), [35532](https://github.com/PaddlePaddle/Paddle/pull/35532), [#35446](https://github.com/PaddlePaddle/Paddle/pull/35446), [#33280](https://github.com/PaddlePaddle/Paddle/pull/33280))
-- Optimize the reading speed of flowers dataset from several minutes per batch to 1~3 seconds per batch.  ([#31408](https://github.com/PaddlePaddle/Paddle/pull/31408))
-- Support the fuse allreduce sum function in `paddle.distributed.fleet.DistributedStrategy` when the `without_graph_optimize` switch is on.In the FP32, the performance increases by 3%. In the AMP, the performance increases by 8%. ([#34446](https://github.com/PaddlePaddle/Paddle/pull/34446)) 
-- In `paddle.matmul`, switch underlying Op from matmul op to matmul_v2 op.  ([#36374](https://github.com/PaddlePaddle/Paddle/pull/36374))
-- In `paddle.fft` module, add mkl_cdft and hipfft two computational backends.  ([#36537](https://github.com/PaddlePaddle/Paddle/pull/36537))
-- Parameter `shifts` of `paddle.roll` supports `Tensor` as input.  ([#36537](https://github.com/PaddlePaddle/Paddle/pull/36537))
-- `paddle.shape` supports plural type inputs. ([#36835](https://github.com/PaddlePaddle/Paddle/pull/36835))
-- matmul_v2 supports quantization. ([#36469](https://github.com/PaddlePaddle/Paddle/pull/36469))
-- Add `clip_op` support for `float16`. ([#36672](https://github.com/PaddlePaddle/Paddle/pull/36672))
-- In `paddle.fft` module, add cache plan functionality to the cufft backend, optimizing performance. ([#36537](https://github.com/PaddlePaddle/Paddle/pull/36537))
+We anounce PHI as the new Paddle HIgh reusability operator library. PHI provides Primitive API, enabling kernel reuse for operator development. As a refactored functional operator library, PHI aims to solve legacy problems that harm the framework's performance and reusability, in particular on the operator development. Such problems include inefficient ways of cross using operators, unclear operator interfaces and lacking direct calls to the operator library in C++. With PHI, new operators can be easily implemented by composing functions available in the functional library. The library provides over 200 C++ operator class APIs and nearly 500 kernels. Composing new operators through these built-in functions can greatly reduce the user's development effort. PHI supports different types of hardware (e.g., GPU and XPU). In addition, PHI is extensible with plugins for accommodating third party accelerators (such as NPU) in a low cost and reusable fashion. In short, PHI supports low level operator composabilty, the reuse of kernels through Primitives, and accelerators through plugins.The main contents include six parts as below:
 
+- **The implementation of the operator library infrastructure, core components and mechanisms** : The directory structure of the new operator library is reasonably planned, design and implement the common base data structure of the new operator library, the new functional InferMeta and Kernel development paradigm and the corresponding registration and management components. Support the automated compilation object generation and compilation dependency generation of Kernel files, allowing developers to focus only on the functional Kernel implementation, and making the development paradigm clear and concise. ([#34425](https://github.com/PaddlePaddle/Paddle/pull/34425), [#37107](https://github.com/PaddlePaddle/Paddle/pull/37107), [#36946](https://github.com/PaddlePaddle/Paddle/pull/36946), [#36948](https://github.com/PaddlePaddle/Paddle/pull/36948), [#37876](https://github.com/PaddlePaddle/Paddle/pull/37876), [#37916](https://github.com/PaddlePaddle/Paddle/pull/37916), [#37977](https://github.com/PaddlePaddle/Paddle/pull/37977), [38078](https://github.com/PaddlePaddle/Paddle/pull/38078), [#38861](https://github.com/PaddlePaddle/Paddle/pull/38861), [#39123](https://github.com/PaddlePaddle/Paddle/pull/39123), [#39131](https://github.com/PaddlePaddle/Paddle/pull/39131), [#39748](https://github.com/PaddlePaddle/Paddle/pull/39748), [#39790](https://github.com/PaddlePaddle/Paddle/pull/39790), [#39941](https://github.com/PaddlePaddle/Paddle/pull/39941), [#40239](https://github.com/PaddlePaddle/Paddle/pull/40239), [#40635](https://github.com/PaddlePaddle/Paddle/pull/40635), [#41091](https://github.com/PaddlePaddle/Paddle/pull/41091), [#37409](https://github.com/PaddlePaddle/Paddle/pull/37409), [#37942](https://github.com/PaddlePaddle/Paddle/pull/37942), [#39002](https://github.com/PaddlePaddle/Paddle/pull/39002), [#38109](https://github.com/PaddlePaddle/Paddle/pull/38109), [#37881](https://github.com/PaddlePaddle/Paddle/pull/37881), [#37517](https://github.com/PaddlePaddle/Paddle/pull/37517), [#39870](https://github.com/PaddlePaddle/Paddle/pull/39870), [#40975](https://github.com/PaddlePaddle/Paddle/pull/40975), [#39475](https://github.com/PaddlePaddle/Paddle/pull/39475), [#37304](https://github.com/PaddlePaddle/Paddle/pull/37304), #36910, #37120, #37146, #37215, #37255, #37369, #38258, #38257, #38355, #38853, #38937, #38977, #38946, #39085, #39153, #39228, #38301, #38275, #38506, #38607, #38473, #38632, #38811, #38880, #38996, #38914, #39101)
+  
+- **Operator library C++ API system construction**: design and implement yaml configuration file-based operator definition paradigm, to automatically generate more than 200 C++ operator class APIs for internal and external developers to reuse. This reduces the cost of repeated development of basic operators. ([#37668](https://github.com/PaddlePaddle/Paddle/pull/37668), [#36938](https://github.com/PaddlePaddle/Paddle/pull/36938), [#38172](https://github.com/PaddlePaddle/Paddle/pull/38172), [#38182](https://github.com/PaddlePaddle/Paddle/pull/38182), [#38311](https://github.com/PaddlePaddle/Paddle/pull/38311), [#38438](https://github.com/PaddlePaddle/Paddle/pull/38438), [#39057](https://github.com/PaddlePaddle/Paddle/pull/39057), [#39229](https://github.com/PaddlePaddle/Paddle/pull/39229), [#39281](https://github.com/PaddlePaddle/Paddle/pull/39281), [#39263](https://github.com/PaddlePaddle/Paddle/pull/39263), [#39408](https://github.com/PaddlePaddle/Paddle/pull/39408), [#39436](https://github.com/PaddlePaddle/Paddle/pull/39436), [#39482](https://github.com/PaddlePaddle/Paddle/pull/39482), [#39497](https://github.com/PaddlePaddle/Paddle/pull/39497), [#39651](https://github.com/PaddlePaddle/Paddle/pull/39651), [#39521](https://github.com/PaddlePaddle/Paddle/pull/39521), [#39760](https://github.com/PaddlePaddle/Paddle/pull/39760), [#40060](https://github.com/PaddlePaddle/Paddle/pull/40060), [#40196](https://github.com/PaddlePaddle/Paddle/pull/40196), [#40218](https://github.com/PaddlePaddle/Paddle/pull/40218), [#40640](https://github.com/PaddlePaddle/Paddle/pull/40640), [#40732](https://github.com/PaddlePaddle/Paddle/pull/40732), [#40729](https://github.com/PaddlePaddle/Paddle/pull/40729), [#40840](https://github.com/PaddlePaddle/Paddle/pull/40840), [#40867](https://github.com/PaddlePaddle/Paddle/pull/40867), [#41025](https://github.com/PaddlePaddle/Paddle/pull/41025), [#41368](https://github.com/PaddlePaddle/Paddle/pull/41368))
+  
+- **Operator library compatible with various execution systems**: Implement new InferMeta and Kernel to access the original dynamic and static graph execution system. Support the safe removal of the original OpKernel registration and migration to the new Kernel form. ([#34425](https://github.com/PaddlePaddle/Paddle/pull/34425), [#38825](https://github.com/PaddlePaddle/Paddle/pull/38825), [#38837](https://github.com/PaddlePaddle/Paddle/pull/38837), [#38842](https://github.com/PaddlePaddle/Paddle/pull/38842), [#38976](https://github.com/PaddlePaddle/Paddle/pull/38976), [#39134](https://github.com/PaddlePaddle/Paddle/pull/39134), [#39140](https://github.com/PaddlePaddle/Paddle/pull/39140), [#39135](https://github.com/PaddlePaddle/Paddle/pull/39135), [#39252](https://github.com/PaddlePaddle/Paddle/pull/39252), [#39222](https://github.com/PaddlePaddle/Paddle/pull/39222), [#39351](https://github.com/PaddlePaddle/Paddle/pull/39351))
+  
+- **Decouple the underlying data structures and tool functions of the operator library from the framework**: Relieve PHI's dependence on the framework for core data structures, lay the foundation for subsequent independent compilation of PHI, and support infrt, custom Kernel, and a series of Phi-based construction work ([#38583](https://github.com/PaddlePaddle/Paddle/pull/38583), [#39188](https://github.com/PaddlePaddle/Paddle/pull/39188), [#39560](https://github.com/PaddlePaddle/Paddle/pull/39560), [#39931](https://github.com/PaddlePaddle/Paddle/pull/39931), [#39169](https://github.com/PaddlePaddle/Paddle/pull/39169), [#38951](https://github.com/PaddlePaddle/Paddle/pull/38951), [#38898](https://github.com/PaddlePaddle/Paddle/pull/38898), [#38873](https://github.com/PaddlePaddle/Paddle/pull/38873), [#38696](https://github.com/PaddlePaddle/Paddle/pull/38696), [#38651](https://github.com/PaddlePaddle/Paddle/pull/38651), [#39359](https://github.com/PaddlePaddle/Paddle/pull/39359), [#39305](https://github.com/PaddlePaddle/Paddle/pull/39305), [#39234](https://github.com/PaddlePaddle/Paddle/pull/39234), [#39098](https://github.com/PaddlePaddle/Paddle/pull/39098), [#39120](https://github.com/PaddlePaddle/Paddle/pull/39120), [#38979](https://github.com/PaddlePaddle/Paddle/pull/38979), [#38899](https://github.com/PaddlePaddle/Paddle/pull/38899), [#38844](https://github.com/PaddlePaddle/Paddle/pull/38844), [#39714](https://github.com/PaddlePaddle/Paddle/pull/39714), [#39729](https://github.com/PaddlePaddle/Paddle/pull/39729), [#39889](https://github.com/PaddlePaddle/Paddle/pull/39889), [#39587](https://github.com/PaddlePaddle/Paddle/pull/39587), [#39558](https://github.com/PaddlePaddle/Paddle/pull/39558), [#39514](https://github.com/PaddlePaddle/Paddle/pull/39514), [#39502](https://github.com/PaddlePaddle/Paddle/pull/39502), [#39300](https://github.com/PaddlePaddle/Paddle/pull/39300), [#39246](https://github.com/PaddlePaddle/Paddle/pull/39246), [#39124](https://github.com/PaddlePaddle/Paddle/pull/39124))
+  
+- **Integration between custom operator mechanism and Phi with improvement**: support for calling over 200 C++ operator class APIs automatically generated by PHI when writing custom operators. This reduces custom operator development costs. A series of bugs are fixed. ([#37122](https://github.com/PaddlePaddle/Paddle/pull/37122), [#37276](https://github.com/PaddlePaddle/Paddle/pull/37276), [#37281](https://github.com/PaddlePaddle/Paddle/pull/37281), [#37262](https://github.com/PaddlePaddle/Paddle/pull/37281), [#37415](https://github.com/PaddlePaddle/Paddle/pull/37415), [#37423](https://github.com/PaddlePaddle/Paddle/pull/37423), [#37583](https://github.com/PaddlePaddle/Paddle/pull/37683), [#38776](https://github.com/PaddlePaddle/Paddle/pull/38776), [#39353](https://github.com/PaddlePaddle/Paddle/pull/39353), [#41072](https://github.com/PaddlePaddle/Paddle/pull/41072))
+  
+- **Operator scale migration and refactoring**: migrate about 250 high-frequency forward and backward operator Kernel to the new operator library and refactor them as a single function. Achieve the high-performance operator by encapsulating multiple base Kernel functions on the C++ side for the fast combination. Meanwhile, add the corresponding yaml operator definition, and access to the new dynamic graph execution system to improve the python API scheduling performance. The migrated and refactored operators include:
+  
+  - sqrt （[#40727](https://github.com/PaddlePaddle/Paddle/pull/40727)）
+    
+  - square（[#40727](https://github.com/PaddlePaddle/Paddle/pull/40727)）
+    
+  - sin ([#40175](https://github.com/PaddlePaddle/Paddle/pull/40175))
+    
+  - sinh ([#40175](https://github.com/PaddlePaddle/Paddle/pull/40175))
+    
+  - elementwise_fmax（[#40140](https://github.com/PaddlePaddle/Paddle/pull/40140)）
+    
+  - elementwise_fmin（[#40140](https://github.com/PaddlePaddle/Paddle/pull/40140)）
+    
+  - pool2d（[#40208](https://github.com/PaddlePaddle/Paddle/pull/40208), [#41053](https://github.com/PaddlePaddle/Paddle/pull/41053)）
+    
+  - max_pool2d_with_index（[#40208](https://github.com/PaddlePaddle/Paddle/pull/40208), [#41053](https://github.com/PaddlePaddle/Paddle/pull/41053)）
+    
+  - pool3d（[#40208](https://github.com/PaddlePaddle/Paddle/pull/40208), [#41053](https://github.com/PaddlePaddle/Paddle/pull/41053)）
+    
+  - max_pool3d_with_index（[#40208](https://github.com/PaddlePaddle/Paddle/pull/40208), [#41053](https://github.com/PaddlePaddle/Paddle/pull/41053)）
+    
+  - fill_constant ([#36930](https://github.com/PaddlePaddle/Paddle/pull/36930), [#39465](https://github.com/PaddlePaddle/Paddle/pull/39465))
+    
+  - p_norm ([#40819](https://github.com/PaddlePaddle/Paddle/pull/40819))
+    
+  - fill_constant_batch_size_like ([#40784](https://github.com/PaddlePaddle/Paddle/pull/40784))
+    
+  - conv2d（[#39354](https://github.com/PaddlePaddle/Paddle/pull/39354)）
+    
+  - conv2d_transpose（[#40675](https://github.com/PaddlePaddle/Paddle/pull/40675), [#41053](https://github.com/PaddlePaddle/Paddle/pull/41053)）
+    
+  - conv3d（[#39354](https://github.com/PaddlePaddle/Paddle/pull/39354)）
+    
+  - conv3d_transpose（[#40675](https://github.com/PaddlePaddle/Paddle/pull/40675), [#41053](https://github.com/PaddlePaddle/Paddle/pull/41053)）
+    
+  - mish（[#40727](https://github.com/PaddlePaddle/Paddle/pull/40727)）
+    
+  - gather_nd ([#40090](https://github.com/PaddlePaddle/Paddle/pull/40090), [#40043](https://github.com/PaddlePaddle/Paddle/pull/40043))
+    
+  - gather ([#40500](https://github.com/PaddlePaddle/Paddle/pull/40500))
+    
+  - scatter ([#40090](https://github.com/PaddlePaddle/Paddle/pull/40090), [#40043](https://github.com/PaddlePaddle/Paddle/pull/40043))
+    
+  - scatter_nd_add ([#40090](https://github.com/PaddlePaddle/Paddle/pull/40090), [#40043](https://github.com/PaddlePaddle/Paddle/pull/40043))
+    
+  - sgd（[40045](https://github.com/PaddlePaddle/Paddle/pull/40045)）
+    
+  - momentum ([#41319](https://github.com/PaddlePaddle/Paddle/pull/41319))
+    
+  - rmsprop（[#40994](https://github.com/PaddlePaddle/Paddle/pull/40994)）
+    
+  - index_sample（[#38130](https://github.com/PaddlePaddle/Paddle/pull/38130), [#38459](https://github.com/PaddlePaddle/Paddle/pull/38459),[#39905](https://github.com/PaddlePaddle/Paddle/pull/39905)）
+    
+  - adam ([#40351](https://github.com/PaddlePaddle/Paddle/pull/40351))
+    
+  - layer_norm（[#40193](https://github.com/PaddlePaddle/Paddle/pull/40193)）
+    
+  - adagrad（[#40994](https://github.com/PaddlePaddle/Paddle/pull/40994/)）
+    
+  - adamax ([#40173](https://github.com/PaddlePaddle/Paddle/pull/40173))
+    
+  - adadelta ([#40173](https://github.com/PaddlePaddle/Paddle/pull/40173))
+    
+  - clip（[#40602](https://github.com/PaddlePaddle/Paddle/pull/40602), [#41661](https://github.com/PaddlePaddle/Paddle/pull/41661), [#41675](https://github.com/PaddlePaddle/Paddle/pull/41675)）
+    
+  - ceil ([#40913](https://github.com/PaddlePaddle/Paddle/pull/40913))
+    
+  - cos ([#40175](https://github.com/PaddlePaddle/Paddle/pull/40175))
+    
+  - atan ([#40175](https://github.com/PaddlePaddle/Paddle/pull/40175))
+    
+  - cosh ([#40175](https://github.com/PaddlePaddle/Paddle/pull/40175))
+    
+  - erf（[#40388](https://github.com/PaddlePaddle/Paddle/pull/40388)）
+    
+  - asin ([#40175](https://github.com/PaddlePaddle/Paddle/pull/40175))
+    
+  - acos ([#40175](https://github.com/PaddlePaddle/Paddle/pull/40175))
+    
+  - scale ([#39278](https://github.com/PaddlePaddle/Paddle/pull/39278))
+    
+  - elementwise_pow ([#40993](https://github.com/PaddlePaddle/Paddle/pull/40993))
+    
+  - elementwise_sub ([#39225](https://github.com/PaddlePaddle/Paddle/pull/39225), [#37260](https://github.com/PaddlePaddle/Paddle/pull/37260))
+    
+  - round ([#40913](https://github.com/PaddlePaddle/Paddle/pull/40913))
+    
+  - floor ([#40913](https://github.com/PaddlePaddle/Paddle/pull/40913))
+    
+  - pow ([#40913](https://github.com/PaddlePaddle/Paddle/pull/40913))
+    
+  - elementwise_floordiv ([#40993](https://github.com/PaddlePaddle/Paddle/pull/40993))
+    
+  - reciprocal（[#40727](https://github.com/PaddlePaddle/Paddle/pull/40727)）
+    
+  - log1p ([#40785](https://github.com/PaddlePaddle/Paddle/pull/40785))
+    
+  - allclose ([#40469](https://github.com/PaddlePaddle/Paddle/pull/40469))
+    
+  - mul ([#40833](https://github.com/PaddlePaddle/Paddle/pull/40833))
+    
+  - elementwise_max ([#40590](https://github.com/PaddlePaddle/Paddle/pull/40590))
+    
+  - elementwise_min ([#40590](https://github.com/PaddlePaddle/Paddle/pull/40590))
+    
+  - elementwise_mod ([#40590](https://github.com/PaddlePaddle/Paddle/pull/40590))
+    
+  - elementwise_add ([#39048](https://github.com/PaddlePaddle/Paddle/pull/39048), [#37043](https://github.com/PaddlePaddle/Paddle/pull/37043))
+    
+  - matmul_v2 ([#36844](https://github.com/PaddlePaddle/Paddle/pull/36844), [#38713](https://github.com/PaddlePaddle/Paddle/pull/38713))
+    
+  - elementwise_mul ([#41042](https://github.com/PaddlePaddle/Paddle/pull/41042), [#40252](https://github.com/PaddlePaddle/Paddle/pull/40252), [#37471](https://github.com/PaddlePaddle/Paddle/pull/37471))
+    
+  - elementwise_div ([#40172](https://github.com/PaddlePaddle/Paddle/pull/40172), [#40039](https://github.com/PaddlePaddle/Paddle/pull/40039), [#37418](https://github.com/PaddlePaddle/Paddle/pull/37418))
+    
+  - SelectedRows ([#39037](https://github.com/PaddlePaddle/Paddle/pull/39037), [#39087](https://github.com/PaddlePaddle/Paddle/pull/39087), [#39128](https://github.com/PaddlePaddle/Paddle/pull/39128), [#39162](https://github.com/PaddlePaddle/Paddle/pull/39162), [#39236](https://github.com/PaddlePaddle/Paddle/pull/39236))
+    
+  - fill_any_like ([#39807](https://github.com/PaddlePaddle/Paddle/pull/39807))
+    
+  - dot（[#38359](https://github.com/PaddlePaddle/Paddle/pull/38359)）
+    
+  - sum ([#40873](https://github.com/PaddlePaddle/Paddle/pull/40873))
+    
+  - cumsum ([#39976](https://github.com/PaddlePaddle/Paddle/pull/39976), [#40200](https://github.com/PaddlePaddle/Paddle/pull/40200))
+    
+  - diag_v2 ([#39914](https://github.com/PaddlePaddle/Paddle/pull/39914))
+    
+  - auc ([#39976](https://github.com/PaddlePaddle/Paddle/pull/39976), [#40200](https://github.com/PaddlePaddle/Paddle/pull/40200))
+    
+  - log_loss ([#39976](https://github.com/PaddlePaddle/Paddle/pull/39976), [#40200](https://github.com/PaddlePaddle/Paddle/pull/40200))
+    
+  - one_hot_v2（[39876](https://github.com/PaddlePaddle/Paddle/pull/39876)）
+    
+  - sigmoid_cross_entropy_with_logits ([#39976](https://github.com/PaddlePaddle/Paddle/pull/39976), [#40200](https://github.com/PaddlePaddle/Paddle/pull/40200))
+    
+  - bce_loss ([#39868](https://github.com/PaddlePaddle/Paddle/pull/39868))
+    
+  - argsort ([#40151](https://github.com/PaddlePaddle/Paddle/pull/40151))
+    
+  - arg_max ([#40222](https://github.com/PaddlePaddle/Paddle/pull/40222))
+    
+  - arg_min ([#40222](https://github.com/PaddlePaddle/Paddle/pull/40222))
+    
+  - segment_pool ([#40099](https://github.com/PaddlePaddle/Paddle/pull/40099))
+    
+  - frobenius_norm（[#40707](https://github.com/PaddlePaddle/Paddle/pull/40707), [#41053](https://github.com/PaddlePaddle/Paddle/pull/41053)）
+    
+  - dist ([#40178](https://github.com/PaddlePaddle/Paddle/pull/40178))
+    
+  - isnan_v2 ([#40076](https://github.com/PaddlePaddle/Paddle/pull/40076))
+    
+  - logical_and ([#39942](https://github.com/PaddlePaddle/Paddle/pull/39942))
+    
+  - logical_not ([#39942](https://github.com/PaddlePaddle/Paddle/pull/39942))
+    
+  - isfinite_v2 ([#40076](https://github.com/PaddlePaddle/Paddle/pull/40076))
+    
+  - logical_or ([#39942](https://github.com/PaddlePaddle/Paddle/pull/39942))
+    
+  - isinf_v2 ([#40076](https://github.com/PaddlePaddle/Paddle/pull/40076))
+    
+  - is_empty ([#39919](https://github.com/PaddlePaddle/Paddle/pull/39919))
+    
+  - logical_xor ([#39942](https://github.com/PaddlePaddle/Paddle/pull/39942))
+    
+  - less_than（[#39970](https://github.com/PaddlePaddle/Paddle/pull/39970)）
+    
+  - not_equal（[#39970](https://github.com/PaddlePaddle/Paddle/pull/39970)）
+    
+  - equal（[#39970](https://github.com/PaddlePaddle/Paddle/pull/39970)）
+    
+  - less_equal（[#39970](https://github.com/PaddlePaddle/Paddle/pull/39970)）
+    
+  - equal_all（[#39970](https://github.com/PaddlePaddle/Paddle/pull/39970)）
+    
+  - uniform_random ([#39937](https://github.com/PaddlePaddle/Paddle/pull/39937))
+    
+  - randint ([#39876](https://github.com/PaddlePaddle/Paddle/pull/39876), [#41375](https://github.com/PaddlePaddle/Paddle/pull/41375))
+    
+  - randperm ([#41265](https://github.com/PaddlePaddle/Paddle/pull/41265))
+    
+  - unbind ([#39789](https://github.com/PaddlePaddle/Paddle/pull/39789))
+    
+  - bernoulli ([#39590](https://github.com/PaddlePaddle/Paddle/pull/39590))
+    
+  - increment ([#39858](https://github.com/PaddlePaddle/Paddle/pull/39858), [#39913](https://github.com/PaddlePaddle/Paddle/pull/39913))
+    
+  - multinomial ([#39858](https://github.com/PaddlePaddle/Paddle/pull/39858), [#39913](https://github.com/PaddlePaddle/Paddle/pull/39913))
+    
+  - addmm ([#39858](https://github.com/PaddlePaddle/Paddle/pull/39858), [#39913](https://github.com/PaddlePaddle/Paddle/pull/39913))
+    
+  - cholesky ([#39858](https://github.com/PaddlePaddle/Paddle/pull/39858), [#39913](https://github.com/PaddlePaddle/Paddle/pull/39913))
+    
+  - where ([#39811](https://github.com/PaddlePaddle/Paddle/pull/39811))
+    
+  - log10 ([#40785](https://github.com/PaddlePaddle/Paddle/pull/40785))
+    
+  - log2 ([#40785](https://github.com/PaddlePaddle/Paddle/pull/40785))
+    
+  - expm1（[#40727](https://github.com/PaddlePaddle/Paddle/pull/40727)）
+    
+  - atan2 ([#39806](https://github.com/PaddlePaddle/Paddle/pull/39806))
+    
+  - gaussian_random ([#39932](https://github.com/PaddlePaddle/Paddle/pull/39932), [#40122](https://github.com/PaddlePaddle/Paddle/pull/40122), [#40191](https://github.com/PaddlePaddle/Paddle/pull/40191))
+    
+  - empty ([#38334](https://github.com/PaddlePaddle/Paddle/pull/38334))
+    
+  - truncated_gaussian_random ([#39971](https://github.com/PaddlePaddle/Paddle/pull/39971), [#40191](https://github.com/PaddlePaddle/Paddle/pull/40191))
+    
+  - mv ([#39861](https://github.com/PaddlePaddle/Paddle/pull/39861), [#39954](https://github.com/PaddlePaddle/Paddle/pull/39954))
+    
+  - tan ([#40175](https://github.com/PaddlePaddle/Paddle/pull/40175))
+    
+  - set_value ([#40195](https://github.com/PaddlePaddle/Paddle/pull/40195), [#40478](https://github.com/PaddlePaddle/Paddle/pull/40478), [#40636](https://github.com/PaddlePaddle/Paddle/pull/40636))
+    
+  - bitwise_and （[#40031](https://github.com/PaddlePaddle/Paddle/pull/40031)）
+    
+  - bitwise_not（[#40031](https://github.com/PaddlePaddle/Paddle/pull/40031)）
+    
+  - bitwise_or（[#40031](https://github.com/PaddlePaddle/Paddle/pull/40031)）
+    
+  - poisson（[#39814](https://github.com/PaddlePaddle/Paddle/pull/39814)）
+    
+  - cholesky_solve（[#40387](https://github.com/PaddlePaddle/Paddle/pull/40387)）
+    
+  - bitwise_xor（[#40031](https://github.com/PaddlePaddle/Paddle/pull/40031)）
+    
+  - triangular_solve（[#40417](https://github.com/PaddlePaddle/Paddle/pull/40417)）
+    
+  - sigmoid ([#40626](https://github.com/PaddlePaddle/Paddle/pull/40626))
+    
+  - atanh ([#40175](https://github.com/PaddlePaddle/Paddle/pull/40175))
+    
+  - softsign（[#40727](https://github.com/PaddlePaddle/Paddle/pull/40727)）
+    
+  - thresholded_relu ([#40385](https://github.com/PaddlePaddle/Paddle/pull/40385))
+    
+  - tanh_shrink ([#40565](https://github.com/PaddlePaddle/Paddle/pull/40565))
+    
+  - stanh（[#40727](https://github.com/PaddlePaddle/Paddle/pull/40727)）
+    
+  - reduce_mean ([#37559](https://github.com/PaddlePaddle/Paddle/pull/37559))
+    
+  - reduce_max（[#40225](https://github.com/PaddlePaddle/Paddle/pull/40225)）
+    
+  - reduce_min ([#40374](https://github.com/PaddlePaddle/Paddle/pull/40374))
+    
+  - mean ([#40872](https://github.com/PaddlePaddle/Paddle/pull/40872), [#41319](https://github.com/PaddlePaddle/Paddle/pull/41319))
+    
+  - reduce_all ([#40374](https://github.com/PaddlePaddle/Paddle/pull/40374))
+    
+  - reduce_any ([#40374](https://github.com/PaddlePaddle/Paddle/pull/40374))
+    
+  - logsumexp ([#40790](https://github.com/PaddlePaddle/Paddle/pull/40790))
+    
+  - softshrink（[#40565](https://github.com/PaddlePaddle/Paddle/pull/40565)）
+    
+  - range ([#41265](https://github.com/PaddlePaddle/Paddle/pull/41265), [#40581](https://github.com/PaddlePaddle/Paddle/pull/40851))
+    
+  - stack（[#40581](https://github.com/PaddlePaddle/Paddle/pull/40851)）
+    
+  - tile ([#40371](https://github.com/PaddlePaddle/Paddle/pull/40371))
+    
+  - unique（[#40581](https://github.com/PaddlePaddle/Paddle/pull/40851)）
+    
+  - unstack（[#40581](https://github.com/PaddlePaddle/Paddle/pull/40851)）
+    
+  - slice（[#40736](https://github.com/PaddlePaddle/Paddle/pull/40736)）
+    
+  - transpose2（[#39327](https://github.com/PaddlePaddle/Paddle/pull/39327)）
+    
+  - unsqueeze2（ [#40596](https://github.com/PaddlePaddle/Paddle/pull/40596)）
+    
+  - squeeze2（ [#40596](https://github.com/PaddlePaddle/Paddle/pull/40596)）
+    
+  - strided_slice ([#40708](https://github.com/PaddlePaddle/Paddle/pull/40708))
+    
+  - softmax ([#39547](https://github.com/PaddlePaddle/Paddle/pull/39547))
+    
+  - leaky_relu ([#40385](https://github.com/PaddlePaddle/Paddle/pull/40385))
+    
+  - gelu ([#40393](https://github.com/PaddlePaddle/Paddle/pull/40393))
+    
+  - prelu ([#40393](https://github.com/PaddlePaddle/Paddle/pull/40393))
+    
+  - log_softmax ([#40393](https://github.com/PaddlePaddle/Paddle/pull/40393))
+    
+  - elu ([#40565](https://github.com/PaddlePaddle/Paddle/pull/40565))
+    
+  - logsigmoid ([#40626](https://github.com/PaddlePaddle/Paddle/pull/40626))
+    
+  - psroi_pool ([#40353](https://github.com/PaddlePaddle/Paddle/pull/40353), [#41173](https://github.com/PaddlePaddle/Paddle/pull/41173))
+    
+  - kthvalue（[#40575](https://github.com/PaddlePaddle/Paddle/pull/40575)）
+    
+  - mode ([#40571](https://github.com/PaddlePaddle/Paddle/pull/40571))
+    
+  - yolo_box（[#40112](https://github.com/PaddlePaddle/Paddle/pull/40112)）
+    
+  - yolov3_loss ([#40944](https://github.com/PaddlePaddle/Paddle/pull/40944)）
+    
+  - temporal_shift（[#40727](https://github.com/PaddlePaddle/Paddle/pull/40727)）
+    
+  - depthwise_conv2d（[#39354](https://github.com/PaddlePaddle/Paddle/pull/39354)）
+    
+  - pad3d ([#40701](https://github.com/PaddlePaddle/Paddle/pull/40701))
+    
+  - pad（ [#40012](https://github.com/PaddlePaddle/Paddle/pull/40012)）
+    
+  - greater_equal（[#39970](https://github.com/PaddlePaddle/Paddle/pull/39970)）
+    
+  - kldiv_loss ([#39770](https://github.com/PaddlePaddle/Paddle/pull/39770))
+    
+  - isclose ([#39770](https://github.com/PaddlePaddle/Paddle/pull/39770))
+    
+  - silu ([#40565](https://github.com/PaddlePaddle/Paddle/pull/40565))
+    
+  - unfold ([#39778](https://github.com/PaddlePaddle/Paddle/pull/39778))
+    
+  - batch_norm（[39347](https://github.com/PaddlePaddle/Paddle/pull/39347)）
+    
+  - norm（[#39324](https://github.com/PaddlePaddle/Paddle/pull/39324)）
+    
+  - roi_pool ([#40574](https://github.com/PaddlePaddle/Paddle/pull/40574), [#40682](https://github.com/PaddlePaddle/Paddle/pull/40682), [#41173](https://github.com/PaddlePaddle/Paddle/pull/41173))
+    
+  - roi_align ([#40382](https://github.com/PaddlePaddle/Paddle/pull/40382), [#40556](https://github.com/PaddlePaddle/Paddle/pull/40556), [#41402](https://github.com/PaddlePaddle/Paddle/pull/41402))
+    
+  - deformable_conv ([#40700](https://github.com/PaddlePaddle/Paddle/pull/40700), [#40794](https://github.com/PaddlePaddle/Paddle/pull/40794), [#41644](https://github.com/PaddlePaddle/Paddle/pull/41644))
+    
+  - deformable_conv_v1 ([#40794](https://github.com/PaddlePaddle/Paddle/pull/40794), [#41644](https://github.com/PaddlePaddle/Paddle/pull/41644))
+    
+  - label_smooth ([#39796](https://github.com/PaddlePaddle/Paddle/pull/39796))
+    
+  - grid_sampler ([#40585](https://github.com/PaddlePaddle/Paddle/pull/40585))
+    
+  - greater_than（[#39970](https://github.com/PaddlePaddle/Paddle/pull/39970)）
+    
+  - pixel_shuffle ([#39949](https://github.com/PaddlePaddle/Paddle/pull/39949), [#39712](https://github.com/PaddlePaddle/Paddle/pull/39712))
+    
+  - nearest_interp_v2 ([#40855](https://github.com/PaddlePaddle/Paddle/pull/40855))
+    
+  - bilinear_interp_v2 ([#40855](https://github.com/PaddlePaddle/Paddle/pull/40855))
+    
+  - softmax_with_cross_entropy ([#40832](https://github.com/PaddlePaddle/Paddle/pull/40832))
+    
+  - rnn ([#41007](https://github.com/PaddlePaddle/Paddle/pull/41007))
+    
+  - reverse ([#40791](https://github.com/PaddlePaddle/Paddle/pull/40791))
+    
+  - trace ([#39510](https://github.com/PaddlePaddle/Paddle/pull/39510))
+    
+  - kron（[#40427](https://github.com/PaddlePaddle/Paddle/pull/40427)）
+    
+  - accuracy（[#39982](https://github.com/PaddlePaddle/Paddle/pull/39982)）
+    
+  - gather_tree ([#40082](https://github.com/PaddlePaddle/Paddle/pull/40082), [#39844](https://github.com/PaddlePaddle/Paddle/pull/39844))
+    
+  - dropout（[#40148](https://github.com/PaddlePaddle/Paddle/pull/40148)）
+    
+  - bincount ([#39947](https://github.com/PaddlePaddle/Paddle/pull/39947))
+    
+  - warpctc ([#41389](https://github.com/PaddlePaddle/Paddle/pull/41389), [#40023](https://github.com/PaddlePaddle/Paddle/pull/https://github.com/PaddlePaddle/Paddle/pull/40023))
+    
+  - multiplex（[#40007](https://github.com/PaddlePaddle/Paddle/pull/40007), [#40102](https://github.com/PaddlePaddle/Paddle/pull/40102)）
+    
+  - qr（[#40007](https://github.com/PaddlePaddle/Paddle/pull/40007), [#40007](https://github.com/PaddlePaddle/Paddle/pull/40007)）
+    
+  - assign_value ([#40967](https://github.com/PaddlePaddle/Paddle/pull/40967))
+    
+  - assign ([#40022](https://github.com/PaddlePaddle/Paddle/pull/40022))
+    
+  - cast ([#37610](https://github.com/PaddlePaddle/Paddle/pull/37610))
+    
+  - tril_triu（[#40007](https://github.com/PaddlePaddle/Paddle/pull/40007), [#41053](https://github.com/PaddlePaddle/Paddle/pull/41053)）
+    
+  - where_index ([#40255](https://github.com/PaddlePaddle/Paddle/pull/40255))
+    
+  - index_select ([#40260](https://github.com/PaddlePaddle/Paddle/pull/40260), [#41053](https://github.com/PaddlePaddle/Paddle/pull/41053))
+    
+  - roll ([#40257](https://github.com/PaddlePaddle/Paddle/pull/40257), [#41053](https://github.com/PaddlePaddle/Paddle/pull/41053))
+    
+  - cumprod (Xiong Kun [#39770](https://github.com/PaddlePaddle/Paddle/pull/39770))
+    
+  - shard_index ([#40254](https://github.com/PaddlePaddle/Paddle/pull/40254))
+    
+  - reshape2 ([#40914](https://github.com/PaddlePaddle/Paddle/pull/40914), [#39631](https://github.com/PaddlePaddle/Paddle/pull/39631), [#38833](https://github.com/PaddlePaddle/Paddle/pull/38833), [#37164](https://github.com/PaddlePaddle/Paddle/pull/37164))
+    
+  - flip ([#39822](https://github.com/PaddlePaddle/Paddle/pull/39822), [#40974](https://github.com/PaddlePaddle/Paddle/pull/40974))
+    
+  - eye ([#39712](https://github.com/PaddlePaddle/Paddle/pull/39712), [#40105](https://github.com/PaddlePaddle/Paddle/pull/40105), [#41476](https://github.com/PaddlePaddle/Paddle/pull/41476))
+    
+  - lookup_table_v2（[#39901](https://github.com/PaddlePaddle/Paddle/pull/39901)）
+    
+  - searchsorted（[#40520](https://github.com/PaddlePaddle/Paddle/pull/40520), [#41053](https://github.com/PaddlePaddle/Paddle/pull/41053)）
+    
+  - adamw ([#40351](https://github.com/PaddlePaddle/Paddle/pull/40351))
+    
+  - tanh ([#40385](https://github.com/PaddlePaddle/Paddle/pull/40385))
+    
+  - cross ([#39829](https://github.com/PaddlePaddle/Paddle/pull/39829))
+    
+  - concat ([#38955](https://github.com/PaddlePaddle/Paddle/pull/38955), [#41112](https://github.com/PaddlePaddle/Paddle/pull/41112))
+    
+  - split ([#39060](https://github.com/PaddlePaddle/Paddle/pull/39060))
+    
+  - linspace ([#40124](https://github.com/PaddlePaddle/Paddle/pull/40124))
+    
+  - huber_loss ([#39761](https://github.com/PaddlePaddle/Paddle/pull/39761))
+    
+  - hierarchical_sigmoid（[#40553](https://github.com/PaddlePaddle/Paddle/pull/40553)）
+    
+  - nll_loss ([#39936](https://github.com/PaddlePaddle/Paddle/pull/https://github.com/PaddlePaddle/Paddle/pull/39936))
+    
+  - graph_send_recv ([#40092](https://github.com/PaddlePaddle/Paddle/pull/40092), [#40320](https://github.com/PaddlePaddle/Paddle/pull/40320))
+    
+  - abs（[#39492](https://github.com/PaddlePaddle/Paddle/pull/39492), [#39762](https://github.com/PaddlePaddle/Paddle/pull/39762)）
+    
+  - exp（[#40727](https://github.com/PaddlePaddle/Paddle/pull/40727)）
+    
+  - rsqrt（[#40727](https://github.com/PaddlePaddle/Paddle/pull/40727)）
+    
+  - viterbi_decode ([#40186](https://github.com/PaddlePaddle/Paddle/pull/40186))
+    
+  - conj ([#38247](https://github.com/PaddlePaddle/Paddle/pull/38247))
+    
+  - real ([#39777](https://github.com/PaddlePaddle/Paddle/pull/39777), [#41173](https://github.com/PaddlePaddle/Paddle/pull/41173))
+    
+  - imag ([#39777](https://github.com/PaddlePaddle/Paddle/pull/39777), [#41173](https://github.com/PaddlePaddle/Paddle/pull/41173))
+    
+  - take_along_axis ([#39959](https://github.com/PaddlePaddle/Paddle/pull/39959), [#40270](https://github.com/PaddlePaddle/Paddle/pull/40270), [#40974](https://github.com/PaddlePaddle/Paddle/pull/40974))
+    
+  - put_along_axis ([#39959](https://github.com/PaddlePaddle/Paddle/pull/39959), [#40974](https://github.com/PaddlePaddle/Paddle/pull/40974))
+    
+  - lgamma ([#39770](https://github.com/PaddlePaddle/Paddle/pull/39770))
+    
+  - relu ([#40175](https://github.com/PaddlePaddle/Paddle/pull/40175))
+    
+  - maxout ([#39959](https://github.com/PaddlePaddle/Paddle/pull/39959), [#40974](https://github.com/PaddlePaddle/Paddle/pull/40974))
+    
+  - log ([#40785](https://github.com/PaddlePaddle/Paddle/pull/40785))
+    
+  - bilinear_tensor_product（[#39903](https://github.com/PaddlePaddle/Paddle/pull/39903)）
+    
+  - flatten_contiguous_range ([#38712](https://github.com/PaddlePaddle/Paddle/pull/38712), [#36957](https://github.com/PaddlePaddle/Paddle/pull/36957), [#41345](https://github.com/PaddlePaddle/Paddle/pull/41345))
+    
+  - matrix_rank ([#40074](https://github.com/PaddlePaddle/Paddle/pull/40074), [#40519](https://github.com/PaddlePaddle/Paddle/pull/40519), [#41466](https://github.com/PaddlePaddle/Paddle/pull/41466))
+    
+  - logit ([#37844](https://github.com/PaddlePaddle/Paddle/pull/37844))
+    
+  - lerp ([#40105](https://github.com/PaddlePaddle/Paddle/pull/40105), [#39524](https://github.com/PaddlePaddle/Paddle/pull/39524))
+    
+  - erfinv ([#39949](https://github.com/PaddlePaddle/Paddle/pull/39949), [#39712](https://github.com/PaddlePaddle/Paddle/pull/39712))
+    
+  - broadcast_tensors（[#40047](https://github.com/PaddlePaddle/Paddle/pull/40047)）
+    
+  - gumbel_softmax（[#39873](https://github.com/PaddlePaddle/Paddle/pull/39873)）
+    
+  - diagonal （[#39575](https://github.com/PaddlePaddle/Paddle/pull/39575)）
+    
+  - trunc ([#39543](https://github.com/PaddlePaddle/Paddle/pull/39543), [#39772](https://github.com/PaddlePaddle/Paddle/pull/39772))
+    
+  - multi_dot ([#40038](https://github.com/PaddlePaddle/Paddle/pull/40038))
+    
+  - matrix_power ([#40231](https://github.com/PaddlePaddle/Paddle/pull/40231))
+    
+  - digamma（[#39240](https://github.com/PaddlePaddle/Paddle/pull/39240)）
+    
+  - masked_select（[#39193](https://github.com/PaddlePaddle/Paddle/pull/39193)）
+    
+  - determinant ([#40539](https://github.com/PaddlePaddle/Paddle/pull/40539))
+    
+  - eigh ([#40213](https://github.com/PaddlePaddle/Paddle/pull/40213))
+    
+  - size ([#39949](https://github.com/PaddlePaddle/Paddle/pull/39949), [#39712](https://github.com/PaddlePaddle/Paddle/pull/39712))
+    
+  - shape ([#40248](https://github.com/PaddlePaddle/Paddle/pull/40248))
+    
+  - reduce_sum（[#37559](https://github.com/PaddlePaddle/Paddle/pull/37559), [#41295](https://github.com/PaddlePaddle/Paddle/pull/41295)）
+    
+  - reduce_prod ([#39844](https://github.com/PaddlePaddle/Paddle/pull/39844))
+    
+  - histogram（[#39496](https://github.com/PaddlePaddle/Paddle/pull/39496)）
+    
+  - meshgrid ([#41411](https://github.com/PaddlePaddle/Paddle/pull/41411))
+    
+  - brelu ([#40385](https://github.com/PaddlePaddle/Paddle/pull/40385))
+    
+  - hard_swish ([#40913](https://github.com/PaddlePaddle/Paddle/pull/40913))
+    
+  - hard_shrink ([#40565](https://github.com/PaddlePaddle/Paddle/pull/40565))
+    
+  - selu ([#39819](https://github.com/PaddlePaddle/Paddle/pull/39819))
+    
+  - expand_v2 ([#39471](https://github.com/PaddlePaddle/Paddle/pull/39471))
+    
+  - top_k_v2（[#40064](https://github.com/PaddlePaddle/Paddle/pull/40064)）
+    
+  - expand_as_v2（[#40373](https://github.com/PaddlePaddle/Paddle/pull/40373)）
+    
+  - swish ([#40913](https://github.com/PaddlePaddle/Paddle/pull/40913))
+    
+  - hard_sigmoid ([#40626](https://github.com/PaddlePaddle/Paddle/pull/40626))
+    
+
+#### **New Dynamic Graph Execution Mechanism**
+
+To improve scheduling performance and custom development capability of the dynamic graph execution mechanism of the PaddlePaddle, we have reconstructed the underlying execution mechanism of the dynamic graph. With the new execution method, the PHI operator library can be used for efficient runtime execution. For the operators supported by the PHI operator library, switching to the new dynamic graph mode will get a significant improvement in scheduling performance. However, due to the huge workload required in the upgrade of the overall framework execution mechanism and this part of the work is coupled with a lot on the PHI operator library, we still do not use this execution method by default in this version. If you want to try it, you can switch to it by setting the environment variable `FLAGS_enable_eager_mode=1`.The details are as follows:
+
+- **Implementation of dynamic graph execution infrastructure, core components and mechanism**: By staticizing dynamic graph-related execution codes, the original homogeneous operators constructing converted to specific calling for different PHI APIs, thus greatly optimizing the scheduling overhead. ([#36059](https://github.com/PaddlePaddle/Paddle/pull/36059), [#37323](https://github.com/PaddlePaddle/Paddle/pull/37323), [#37556](https://github.com/PaddlePaddle/Paddle/pull/37556), [#37555](https://github.com/PaddlePaddle/Paddle/pull/37555), [#37478](https://github.com/PaddlePaddle/Paddle/pull/37478), [#37458](https://github.com/PaddlePaddle/Paddle/pull/37458), [#37479](https://github.com/PaddlePaddle/Paddle/pull/37479), [#37599](https://github.com/PaddlePaddle/Paddle/pull/37599), [#37659](https://github.com/PaddlePaddle/Paddle/pull/37659), [#37654](https://github.com/PaddlePaddle/Paddle/pull/37654), [#39200](https://github.com/PaddlePaddle/Paddle/pull/39200), [#39309](https://github.com/PaddlePaddle/Paddle/pull/39309), [#39319](https://github.com/PaddlePaddle/Paddle/pull/39319), [#39414](https://github.com/PaddlePaddle/Paddle/pull/39414), [#39504](https://github.com/PaddlePaddle/Paddle/pull/39504), [#39526](https://github.com/PaddlePaddle/Paddle/pull/39526), [#39878](https://github.com/PaddlePaddle/Paddle/pull/39878), [#39963](https://github.com/PaddlePaddle/Paddle/pull/39963))
+  
+- **New dynamic graph execution mechanism sub-function development and adaptation**: support more flexible and complete dynamic graph sub-functions such as hook, pylayer, double_grad, inplace, amp, etc. ([#41396](https://github.com/PaddlePaddle/Paddle/pull/41396), [#40400](https://github.com/PaddlePaddle/Paddle/pull/40400), [#40695](https://github.com/PaddlePaddle/Paddle/pull/40695), [#41043](https://github.com/PaddlePaddle/Paddle/pull/41043), [#40915](https://github.com/PaddlePaddle/Paddle/pull/40915), [#41104](https://github.com/PaddlePaddle/Paddle/pull/41104), [#41350](https://github.com/PaddlePaddle/Paddle/pull/41350), [#41209](https://github.com/PaddlePaddle/Paddle/pull/41209), [#40830](https://github.com/PaddlePaddle/Paddle/pull/40830), [#40891](https://github.com/PaddlePaddle/Paddle/pull/40891), [#36814](https://github.com/PaddlePaddle/Paddle/pull/36814), [#37377](https://github.com/PaddlePaddle/Paddle/pull/37377), [#37193](https://github.com/PaddlePaddle/Paddle/pull/37193), [#36965](https://github.com/PaddlePaddle/Paddle/pull/36965), [#37810](https://github.com/PaddlePaddle/Paddle/pull/37810), [#36837](https://github.com/PaddlePaddle/Paddle/pull/36837), [#38488](https://github.com/PaddlePaddle/Paddle/pull/38488), [#39282](https://github.com/PaddlePaddle/Paddle/pull/39282), [#39449](https://github.com/PaddlePaddle/Paddle/pull/39449), [#39531](https://github.com/PaddlePaddle/Paddle/pull/39531), [#39638](https://github.com/PaddlePaddle/Paddle/pull/39638), [#39674](https://github.com/PaddlePaddle/Paddle/pull/39674), [#39893](https://github.com/PaddlePaddle/Paddle/pull/39893), [#40170](https://github.com/PaddlePaddle/Paddle/pull/40170), [#40693](https://github.com/PaddlePaddle/Paddle/pull/40693), [#40937](https://github.com/PaddlePaddle/Paddle/pull/40937), [#41016](https://github.com/PaddlePaddle/Paddle/pull/41016), [#41051](https://github.com/PaddlePaddle/Paddle/pull/41051), [#41121](https://github.com/PaddlePaddle/Paddle/pull/41121), [#41198](https://github.com/PaddlePaddle/Paddle/pull/41198), [#41287](https://github.com/PaddlePaddle/Paddle/pull/41287), [#41380](https://github.com/PaddlePaddle/Paddle/pull/41380), [#41306](https://github.com/PaddlePaddle/Paddle/pull/41306), [#41387](https://github.com/PaddlePaddle/Paddle/pull/41387), [#40623](https://github.com/PaddlePaddle/Paddle/pull/40623), [#40945](https://github.com/PaddlePaddle/Paddle/pull/40945), [#39282](https://github.com/PaddlePaddle/Paddle/pull/39282), [#39449](https://github.com/PaddlePaddle/Paddle/pull/39449), [#38488](https://github.com/PaddlePaddle/Paddle/pull/38488))
+  
+- **Automatic code generation mechanism for new dynamic graph execution**: When we are trying to split the computation and scheduling logic of a large number of homogeneous operators into different specific scheduling logics, we find that it is a huge workload. So we introduce a new automatic code generation logic to generate code and thus simplify the runtime logic of dynamic graphs. Meanwhile, in order to adapt to the various types of runtime logic in the previous framework, we also use some complicated compilation techniques to obtain information at runtime to generate more accurate scheduling code. ([#37574](https://github.com/PaddlePaddle/Paddle/pull/37574), [#37575](https://github.com/PaddlePaddle/Paddle/pull/37575), [#37639](https://github.com/PaddlePaddle/Paddle/pull/37639), [#37723](https://github.com/PaddlePaddle/Paddle/pull/37723), [#37753](https://github.com/PaddlePaddle/Paddle/pull/37753), [#37812](https://github.com/PaddlePaddle/Paddle/pull/37812), [#37837](https://github.com/PaddlePaddle/Paddle/pull/37837), [#37910](https://github.com/PaddlePaddle/Paddle/pull/37910), [#37943](https://github.com/PaddlePaddle/Paddle/pull/37943), [#37992](https://github.com/PaddlePaddle/Paddle/pull/37992), [#37959](https://github.com/PaddlePaddle/Paddle/pull/37959), [#38017](https://github.com/PaddlePaddle/Paddle/pull/38017), [#37969](https://github.com/PaddlePaddle/Paddle/pull/37969), [#38160](https://github.com/PaddlePaddle/Paddle/pull/38160), [#38085](https://github.com/PaddlePaddle/Paddle/pull/38085), [#38562](https://github.com/PaddlePaddle/Paddle/pull/38562), [#38573](https://github.com/PaddlePaddle/Paddle/pull/38573), [#39192](https://github.com/PaddlePaddle/Paddle/pull/39192), [#39215](https://github.com/PaddlePaddle/Paddle/pull/39215), [#39355](https://github.com/PaddlePaddle/Paddle/pull/39355), [#39358](https://github.com/PaddlePaddle/Paddle/pull/39358), [#39328](https://github.com/PaddlePaddle/Paddle/pull/39328), [#39233](https://github.com/PaddlePaddle/Paddle/pull/39233), [#39628](https://github.com/PaddlePaddle/Paddle/pull/39628), [#39767](https://github.com/PaddlePaddle/Paddle/pull/39767), [#39743](https://github.com/PaddlePaddle/Paddle/pull/39743), [#39897](https://github.com/PaddlePaddle/Paddle/pull/39897), [#39797](https://github.com/PaddlePaddle/Paddle/pull/39797), [#39997](https://github.com/PaddlePaddle/Paddle/pull/39997), [#40058](https://github.com/PaddlePaddle/Paddle/pull/40058), [#40080](https://github.com/PaddlePaddle/Paddle/pull/40080), [#40107](https://github.com/PaddlePaddle/Paddle/pull/40107), [#39962](https://github.com/PaddlePaddle/Paddle/pull/39962), [#40132](https://github.com/PaddlePaddle/Paddle/pull/40132), [#40276](https://github.com/PaddlePaddle/Paddle/pull/40276), [#40266](https://github.com/PaddlePaddle/Paddle/pull/40266), [#40480](https://github.com/PaddlePaddle/Paddle/pull/40480), [#40482](https://github.com/PaddlePaddle/Paddle/pull/40482), [#40368](https://github.com/PaddlePaddle/Paddle/pull/40368), [#40650](https://github.com/PaddlePaddle/Paddle/pull/40650), [#40815](https://github.com/PaddlePaddle/Paddle/pull/40815), [#40907](https://github.com/PaddlePaddle/Paddle/pull/40907), [#40935](https://github.com/PaddlePaddle/Paddle/pull/40935), [#41089](https://github.com/PaddlePaddle/Paddle/pull/41089))
+  
+- **New dynamic graph execution mechanism accessed into the main framework and Integration test**: we currently use some environment variables to distinguish between static graph mode and dynamic graph mode (including new dynamic graph and old dynamic graph mode). We have adapted most logics of dynamic graphs in these modes. However, there are still a lot of problems being fixed. ([#37638](https://github.com/PaddlePaddle/Paddle/pull/37638), [#37643](https://github.com/PaddlePaddle/Paddle/pull/37643), [#37653](https://github.com/PaddlePaddle/Paddle/pull/37653), [#38314](https://github.com/PaddlePaddle/Paddle/pull/38314), [#38337](https://github.com/PaddlePaddle/Paddle/pull/38337), [#38338](https://github.com/PaddlePaddle/Paddle/pull/38338), [#39164](https://github.com/PaddlePaddle/Paddle/pull/39164), [#39326](https://github.com/PaddlePaddle/Paddle/pull/39326), [#40391](https://github.com/PaddlePaddle/Paddle/pull/40391), [#40201](https://github.com/PaddlePaddle/Paddle/pull/40201), [#40854](https://github.com/PaddlePaddle/Paddle/pull/40854), [#40887](https://github.com/PaddlePaddle/Paddle/pull/40887))
+  
+- **Update some judgment logics under dynamic graphs, to support fast execution paths for dynamic graphs in compatible forms**：（[#40786](https://github.com/PaddlePaddle/Paddle/pull/40786)）
+  
+  - Non-static graph mode (current transition scheme): `_non_static_mode()`。
+    
+  - Determined as new dynamic graph in dynamic graph mode (recommended judgment logic): `_in_dygrah_mode()`。
+    
+  - Determined as old dynamic graph in dynamic graph mode (Not recommended. It will be deprecated in future versions): `_in_legacy_dygraph()`。
+    
+  - Enable old dynamic graph and disable new dynamic graph in dynamic graph mode: `_enable_legacy_dygraph()` or exit `_test_eager_guard()`。
+    
+  - Enable new dynamic graph and disable old dynamic graph in dynamic graph mode: `_disable_legacy_dygraph()` or with `with _test_eager_guard()`。
+    
+  - Determine in new dynamic graph in static or dynamic graph mode: `_in_eager_without_dygraph_check()`。
+    
+- **Support inplace after dynamic graph reconstruction**: input and output are the same Tensor.
+  
+  - - Adapt the inplace strategy for dynamic graph reconstruction intermediate states.([#40400](https://github.com/PaddlePaddle/Paddle/pull/40400))
+      
+    - Adapt the inplace strategy to the final state of the dynamic graph reconstruction. ([#40695](https://github.com/PaddlePaddle/Paddle/pull/40695))
+      
+    - Add inplace strategy to PyLayer function after dynamical graph reconstruction. ([#41043](https://github.com/PaddlePaddle/Paddle/pull/41043))
+      
+    - Add inplace strategy for Tensor's setitem function after dynamical graph reconstruction. ([#40915](https://github.com/PaddlePaddle/Paddle/pull/40915))
+      
+    - Add `_reset_grad_inplace_version` interface after dynamic graph reconstruction, to set the inplace version of the Tensor's gradient to 0. ([#41101](https://github.com/PaddlePaddle/Paddle/pull/41101))
+      
+    - If the value of the forward Tensor is not needed during the inverse computation (no need buffer property), the inplace version detection operation is not needed for that Tensor. For Tensor with no_need_buffer, skip the inplace version check. ([#41350](https://github.com/PaddlePaddle/Paddle/pull/41350))
+      
+    - Unify error messages for inplace version checks after and before reconstruction of dynamic graphs. ([#41209](https://github.com/PaddlePaddle/Paddle/pull/41209))
+      
+- **Support view strategy after dynamical graph reconstruction**: input and output Tensor share underlying data.
+  
+  - - Adapt the view strategy for dynamic graph reconstruction intermediate states. Include `reshape` , `squeeze` , `unsqueeze` , and `flatten` APIs. ([#40830](https://github.com/PaddlePaddle/Paddle/pull/40830))
+      
+    - Adapt the view strategy for dynamic graph reconstruction final state. Include `reshape` API. ([#40891](https://github.com/PaddlePaddle/Paddle/pull/40891))
+      
 
+#### **New Static Graph Executor**
 
-#### IR(Intermediate Representation)
+In order to solve the problem that the original static graph executor of the PaddlePaddle is not good enough for scheduling in some scenarios and it is not easy to use multiple streams, we have implemented a new static graph executor with superior performance. It is easy to take advantage of the asynchronous scheduling capabilities of multi-streams and multi-threads. The new executor is a compatible upgrade of the original executor. At present, it is used by default in single-card scenarios. Users do not need to make any changes in the training codes. It can be used automatically. Of course, we also provide an interface to switch back to the original executor. Users can switch back to the original executor by setting the environment variable: `FLAGS_USE_STANDALONE_EXECUTOR=false`. ([#41179](https://github.com/PaddlePaddle/Paddle/pull/41179)) The main contents are as follows.
 
-- Dynamic graph to static graph  
-  - Optimize dynamic to static error reporting format, hide unnecessary error reporting stack at the framework level, add user code error line location identifier and context.  ([#35365](https://github.com/PaddlePaddle/Paddle/pull/35365), [#35320](https://github.com/PaddlePaddle/Paddle/pull/35320))
-  - Optimize the conversion logic of the ``list.append`` syntax in the control flow. ([#35212](https://github.com/PaddlePaddle/Paddle/pull/35212)) 
-  - Optimize the logic of dynamic to static training codes, upgrade the internal ``Program`` cache mechanism, and add an advance copy policy for input ``Tensor`` to improve training performance.   ([#34181](https://github.com/PaddlePaddle/Paddle/pull/34181), [#33796](https://github.com/PaddlePaddle/Paddle/pull/33796))
-  - Optimize the internal actuator memory recycling strategy for dynamic to static graphs, reducing the GPU memory usage during training.  ([#34177](https://github.com/PaddlePaddle/Paddle/pull/34177))
-  - Integrate the source codes of ``Gast`` triple dependency library, decoupling version dependencies.  ([#34556](https://github.com/PaddlePaddle/Paddle/pull/34556)) 
-  - Display partial frame level error reporting information in case of dynamic-to-static error reporting. It is easier to locate the problem. ([#36765](https://github.com/PaddlePaddle/Paddle/pull/36765))
-  - Remove duplicate temporary file removal function `remove_static_file()` in the dynamic to static error reporting module. ([#36375](https://github.com/PaddlePaddle/Paddle/pull/36375))
-  - Optimize processing of `input_specs` parameter in RegisterPass, to support graph optimization as a matching subgraph condition.  ([#36453](https://github.com/PaddlePaddle/Paddle/pull/36453))
-
-#### **Distributed training**
-
-- Basic functions of distributed training
-  - Enhance the check of the static graph pipeline parallel stage and persist var. ([#34193](https://github.com/PaddlePaddle/Paddle/pull/34193), [#34870](https://github.com/PaddlePaddle/Paddle/pull/34870), [#35453](https://github.com/PaddlePaddle/Paddle/pull/35453))
-  - Optimize static graph pipeline parallel. In the 1F1B scheduling, the GPU memory does not increase as global batch size increases. ([#34230](https://github.com/PaddlePaddle/Paddle/pull/34230))
-  - For the GPU Parameter Server, optimize the build phase hashmap. In the build phase, the performance increases by up to 7x on some tasks. ([#34175](https://github.com/PaddlePaddle/Paddle/pull/34175)) 
-  - For the GPU Parameter Server, add the multi-stream parallel in the pull/push phase. ([#34276](https://github.com/PaddlePaddle/Paddle/pull/34276)) 
-  - For the GPU Parameter Server, support the remote pull of parameters between machines in multi-machine training mode.  ([#35396](https://github.com/PaddlePaddle/Paddle/pull/35396))
-  - For the CPU Parameter Server, support SSD storage. ([#33031](https://github.com/PaddlePaddle/Paddle/pull/33031))
-  - `paddle.io.Dataset`: Support the dynamic library parsing data. ([#33969](https://github.com/PaddlePaddle/Paddle/pull/33969))
-  - In the `paddle.distributed.fleet.dataset.DatasetBase`, add the consistency check function for generated data of the `use_var_list` and `pipe_command`.  ([#34463](https://github.com/PaddlePaddle/Paddle/pull/34463))
-  - Add the consistency check between the `emd` dimension of `paddle.fluid.layers.embedding` and `emb` dimension of `sparse table` in `fleet`.  ([#34249](https://github.com/PaddlePaddle/Paddle/pull/34249))
-  - Dynamic graph hybrid parallel supports for Pure FP16 training. ([#36707](https://github.com/PaddlePaddle/Paddle/pull/36707))
-  - Static graph hybrid parallel supports dropout using a fixed random seed generator to ensure consistency of global variables and randomness of local variables in model parallel.  ([#36682](https://github.com/PaddlePaddle/Paddle/pull/36682))
-  - Implement CPU parallelism and support for adding custom backend parameters when calling spawn or launch.  Available backend options are "gloo", "nccl", "bkcl", and "auto", for CPU parallel, GPU parallel, XPU parallel, and automatic selection by Paddle version, respectively.  ([#35745](https://github.com/PaddlePaddle/Paddle/pull/35745))
-  - Optimize dynamic graph hybrid parallel HybridParallelClipGrad policy, to support 4D hybrid parallel + Pure FP16 training.  ([#36707](https://github.com/PaddlePaddle/Paddle/pull/36707))
-  - Add SlotRecordDataset class to support GPU parameter server training.  ([#36710](https://github.com/PaddlePaddle/Paddle/pull/36710))
-  - In the GPU parameter server building phase, support use of SlotRecordDataset. ([#36723](https://github.com/PaddlePaddle/Paddle/pull/36723))
-
-
-- Static graph hybrid parallel
-  - Optimize hybrid parallel loss scale and reduce the number of scale op insertions. ([#35775](https://github.com/PaddlePaddle/Paddle/pull/35775))
-  - Optimize the pipeline scheduler, cache duplicate CPU jobs, and reduce CPU overhead.  ([#35680](https://github.com/PaddlePaddle/Paddle/pull/35680))
-  - Optimize the number of times of checkpoint send/recv in pipeline parallel + recompute.  ([#34248](https://github.com/PaddlePaddle/Paddle/pull/34248))
-
-
-#### **Others**
-
-- Error debugging optimization
-  - Unify the error reporting mechanism for third-party libraries, and optimize the error reporting messages for ``CURAND, CUDNN, CUBLAS, CUSOLVER, and NCCL``. This makes the error reporting more detailed and standardized. ([#33003](https://github.com/PaddlePaddle/Paddle/pull/33003), [#33743](https://github.com/PaddlePaddle/Paddle/pull/33743))
-  - Optimize avx and no_avx related installation error messages to simplify redundant and complex contents. ([#33818](https://github.com/PaddlePaddle/Paddle/pull/33818)) 
-  - Optimize the error report of the ``paddle.nn.functional.gather_tree``, ``paddle.nn.Transformer``, ``paddle.nn.TransformerDecoderLayer``, ``paddle.nn.TransformerEncoderLayer``, and ``paddle.nn.MultiHeadAttention``.  ([#34322](https://github.com/PaddlePaddle/Paddle/pull/34322), [#33859](https://github.com/PaddlePaddle/Paddle/pull/33859))
-  - Support the configuration of ``FLAGS_check_nan_inf`` environment variable under dynamic graphs for runtime checking and localization of model ``nan`` and ``inf``.  ([#32635](https://github.com/PaddlePaddle/Paddle/pull/32635))
-  - Remove the stack information introduced by Signal class error messages due to the capture of Signal, to avoid misleading users. ([#34842 ](https://github.com/PaddlePaddle/Paddle/pull/34842))
-  - Fix error message for ``elementwise`` class operator when input x or y is an empty Tensor.  ([#33928](https://github.com/PaddlePaddle/Paddle/pull/33928))
-
-- Model saving and loading
-  - Fix the ``paddle.jit.save`` interface and model pruning logic. It is unnecessary to add an associated ``scale_op`` for output variables, and to properly export models containing outputs of type ``bool`` and ``float16``.  ([#35730](https://github.com/PaddlePaddle/Paddle/pull/35730), [#36132](https://github.com/PaddlePaddle/Paddle/pull/36132))
-- Custom OP
-  - Remove unnecessary ``cudaStreamSynchronize`` operations from ``paddle::Tensor's`` ``copy`` method, to improve performance.  ([#35802](https://github.com/PaddlePaddle/Paddle/pull/35802))
-- Add C++ to support for GeneratePass development registration. The development mode is aligned with Python side. ([#36302](https://github.com/PaddlePaddle/Paddle/pull/36302))
-- Automic SParsity
-
-- Add `paddle.static.sparsity`, to support generating sparse parameters for `n:m` sparse mode. Currently, it only supports static graph ASP training. FP32 and FP16 on A100 are set with `1:2` and `2:4` sparse modes, respectively, to train saved sparse models, which can be used to accelerate inference tasks by calling TensorRT 8 based on the sparse Tensor Core of Ampere architecture. The current version provides a total of 5 APIs:  ([#32995](https://github.com/PaddlePaddle/Paddle/pull/32995)、[#33132](https://github.com/PaddlePaddle/Paddle/pull/33132)、[#33558](https://github.com/PaddlePaddle/Paddle/pull/33558)、[#36525](https://github.com/PaddlePaddle/Paddle/pull/36525))
-  - `paddle.static.sparsity.calculate_density`: calculates the density of the input Tensor.  
-  - `paddle.static.sparsity.decorate`: wraps the given optimizer as `OptimizerWithSparsityGuarantee`, automatically inserting necessary operations for the ASP workflow when calling `optimizer.minimize()`.    
-  - `paddle.static.sparsity.prune_model`: prunes the parameters of the supported layers in `main_program` based on the mask generator function specified by `mask_algo`. 
-  - `paddle.static.sparsity.set_excluded_layers`: sets the names of the parameters of layers that will not be trimmed.   
-  - `paddle.static.sparsity.reset_excluded_layers`: resets the `excluded_layers` setting corresponding to `main_program`. 
+- Basic components: High-performance thread pool for multi-threaded scheduling in the executor ([#35470](https://github.com/PaddlePaddle/Paddle/pull/35470), [#35930](https://github.com/PaddlePaddle/Paddle/pull/35930), [#36030](https://github.com/PaddlePaddle/Paddle/pull/36030), [#36480](https://github.com/PaddlePaddle/Paddle/pull/36480), [#36688](https://github.com/PaddlePaddle/Paddle/pull/36688), [#36740](https://github.com/PaddlePaddle/Paddle/pull/36740), [#38335](https://github.com/PaddlePaddle/Paddle/pull/38335), [#40770](https://github.com/PaddlePaddle/Paddle/pull/40770)) and thread co-op component ([#38779](https://github.com/PaddlePaddle/Paddle/pull/38779), [#40876](https://github.com/PaddlePaddle/Paddle/pull/40876), [#40912](https://github.com/PaddlePaddle/Paddle/pull/40912)) . There is the timely memory recovery after operator execution ([#37642](https://github.com/PaddlePaddle/Paddle/pull/37642), [#39617](https://github.com/PaddlePaddle/Paddle/pull/39617), [#40859](https://github.com/PaddlePaddle/Paddle/pull/40859)). There is the new dependency analysis algorithm for parallel executor ([#37231](https://github.com/PaddlePaddle/Paddle/pull/37231)) etc.
+  
+- Scheduling logic: Optimize the scheduling method of operator in the executor. Support multi-stream multi-threaded asynchronous scheduling mechanism. Change transforms such as data type, device, and layout to the operator scheduling to improve performance. Support caching the selection of operator Kernel. Support the selection of new PHI operator.（[#35024](https://github.com/PaddlePaddle/Paddle/pull/35024), [#34922](https://github.com/PaddlePaddle/Paddle/pull/34922), [#35711](https://github.com/PaddlePaddle/Paddle/pull/35711), [#35928](https://github.com/PaddlePaddle/Paddle/pull/35928), [#39458](https://github.com/PaddlePaddle/Paddle/pull/39458)，[#36899](https://github.com/PaddlePaddle/Paddle/pull/36899)）。
+  
+- Interface compatibility: Compatible with the user interface and functionality of the original executor, such as alignment with python interface Executor.run(), support for managing Tensor in Scope, etc. This ensures that users can switch to the new executor without perception. ([#37278](https://github.com/PaddlePaddle/Paddle/pull/37278), [#37379](https://github.com/PaddlePaddle/Paddle/pull/37379), [#37445](https://github.com/PaddlePaddle/Paddle/pull/37445), [#37510](https://github.com/PaddlePaddle/Paddle/pull/37510), [#40955](https://github.com/PaddlePaddle/Paddle/pull/40955), [#41778](https://github.com/PaddlePaddle/Paddle/pull/41178), [#41058](https://github.com/PaddlePaddle/Paddle/pull/41058), [#38584](https://github.com/PaddlePaddle/Paddle/pull/38584), [#37957](https://github.com/PaddlePaddle/Paddle/pull/37957), [#37672](https://github.com/PaddlePaddle/Paddle/pull/37672), [#37474](https://github.com/PaddlePaddle/Paddle/pull/37474), [#37085](https://github.com/PaddlePaddle/Paddle/pull/37085), [#37061](https://github.com/PaddlePaddle/Paddle/pull/37061), [#36945](https://github.com/PaddlePaddle/Paddle/pull/36945))
+  
+- Enhance debugging and error reporting in multi-threaded scenarios by capturing error reports from sub-threads and throwing them uniformly in the main thread. This can improve user experience. ([#36692](https://github.com/PaddlePaddle/Paddle/pull/36692)，[#36802](https://github.com/PaddlePaddle/Paddle/pull/36802))
+  
 
+#### **Distributed Training**
 
-### **(3) Performance optimization**
+- Basic functions of multi-machine multi-card parallel training based on collective communication
+  
+  - Add support for elastic training, enables scaling up and down the number of workers, enables training process resuming when node failure，to improve the fault tolerance of distributed training. ([#36684](https://github.com/PaddlePaddle/Paddle/pull/36684), [#37177](https://github.com/PaddlePaddle/Paddle/pull/37177), [#37781](https://github.com/PaddlePaddle/Paddle/pull/37781))
+    
+  - Refactor launch startup module, add `master` collaboration and node number `nnodes` definition, to improve the ease of using the distributed startup.([#40086](https://github.com/PaddlePaddle/Paddle/pull/40086), [#40568](https://github.com/PaddlePaddle/Paddle/pull/40568), [#40782](https://github.com/PaddlePaddle/Paddle/pull/40782), [#40844](https://github.com/PaddlePaddle/Paddle/pull/40844), [#40936](https://github.com/PaddlePaddle/Paddle/pull/40936), [#41190](https://github.com/PaddlePaddle/Paddle/pull/41190), [#41314](https://github.com/PaddlePaddle/Paddle/pull/41314))
+    
+  - Add support for GPU/NPU/XPU multi-hardware heterogeneous training. ([#37613](https://github.com/PaddlePaddle/Paddle/pull/37613), [#37998](https://github.com/PaddlePaddle/Paddle/pull/37998))
+    
+  - Add fleet_executor asynchronous pipeline executor. ([#36966](https://github.com/PaddlePaddle/Paddle/pull/36966), [#37049](https://github.com/PaddlePaddle/Paddle/pull/37049), [#37087](https://github.com/PaddlePaddle/Paddle/pull/37087), [#37126](https://github.com/PaddlePaddle/Paddle/pull/37126), [#37150](https://github.com/PaddlePaddle/Paddle/pull/37150), [#37203](https://github.com/PaddlePaddle/Paddle/pull/37203), [#37167](https://github.com/PaddlePaddle/Paddle/pull/37167), [#37282](https://github.com/PaddlePaddle/Paddle/pull/37282), [#37319](https://github.com/PaddlePaddle/Paddle/pull/37319), [#37462](https://github.com/PaddlePaddle/Paddle/pull/37462), [#37507](https://github.com/PaddlePaddle/Paddle/pull/37507), [#37533](https://github.com/PaddlePaddle/Paddle/pull/37533), [#37576](https://github.com/PaddlePaddle/Paddle/pull/37576), [#37605](https://github.com/PaddlePaddle/Paddle/pull/37605), [#37691](https://github.com/PaddlePaddle/Paddle/pull/37691), [#37742](https://github.com/PaddlePaddle/Paddle/pull/37742), [#37783](https://github.com/PaddlePaddle/Paddle/pull/37783), [#37809](https://github.com/PaddlePaddle/Paddle/pull/37809), [#37862](https://github.com/PaddlePaddle/Paddle/pull/37862), [#37882](https://github.com/PaddlePaddle/Paddle/pull/37882), [#37934](https://github.com/PaddlePaddle/Paddle/pull/37934), [#38024](https://github.com/PaddlePaddle/Paddle/pull/38024), [#38083](https://github.com/PaddlePaddle/Paddle/pull/38083), [#38164](https://github.com/PaddlePaddle/Paddle/pull/38164), [#38261](https://github.com/PaddlePaddle/Paddle/pull/38261), [#38290](https://github.com/PaddlePaddle/Paddle/pull/38290), [#40607](https://github.com/PaddlePaddle/Paddle/pull/40607), [#37093](https://github.com/PaddlePaddle/Paddle/pull/37093), [#37106](https://github.com/PaddlePaddle/Paddle/pull/37106), [#37143](https://github.com/PaddlePaddle/Paddle/pull/37143), [#37338](https://github.com/PaddlePaddle/Paddle/pull/37338), [#37376](https://github.com/PaddlePaddle/Paddle/pull/37376), [#37485](https://github.com/PaddlePaddle/Paddle/pull/37485), [#37531](https://github.com/PaddlePaddle/Paddle/pull/37531), [#37623](https://github.com/PaddlePaddle/Paddle/pull/37623), [#37693](https://github.com/PaddlePaddle/Paddle/pull/37693), [#37755](https://github.com/PaddlePaddle/Paddle/pull/37755), [#37807](https://github.com/PaddlePaddle/Paddle/pull/37807), [#37889](https://github.com/PaddlePaddle/Paddle/pull/37889), [#38420](https://github.com/PaddlePaddle/Paddle/pull/38420), [#38539](https://github.com/PaddlePaddle/Paddle/pull/38539), [#36892](https://github.com/PaddlePaddle/Paddle/pull/36892), [#37084](https://github.com/PaddlePaddle/Paddle/pull/37084), [#37158](https://github.com/PaddlePaddle/Paddle/pull/37158), [#37361](https://github.com/PaddlePaddle/Paddle/pull/37361), [#37509](https://github.com/PaddlePaddle/Paddle/pull/37509), [#37603](https://github.com/PaddlePaddle/Paddle/pull/37603), [#37703](https://github.com/PaddlePaddle/Paddle/pull/37703), [#37824](https://github.com/PaddlePaddle/Paddle/pull/37824), [#38114](https://github.com/PaddlePaddle/Paddle/pull/38114), [#38322](https://github.com/PaddlePaddle/Paddle/pull/38322), [#38535](https://github.com/PaddlePaddle/Paddle/pull/38535), [#38650](https://github.com/PaddlePaddle/Paddle/pull/38650), [#38709](https://github.com/PaddlePaddle/Paddle/pull/38709), [#38799](https://github.com/PaddlePaddle/Paddle/pull/38799), [#38839](https://github.com/PaddlePaddle/Paddle/pull/38839), [#38904](https://github.com/PaddlePaddle/Paddle/pull/38904))
+    
+  - Add distributed inference function for large-scale model. ([#38795](https://github.com/PaddlePaddle/Paddle/pull/38795), [#39012](https://github.com/PaddlePaddle/Paddle/pull/39012), [#39032](https://github.com/PaddlePaddle/Paddle/pull/39032), [#39076](https://github.com/PaddlePaddle/Paddle/pull/39076), [#39194](https://github.com/PaddlePaddle/Paddle/pull/39194), [#39207](https://github.com/PaddlePaddle/Paddle/pull/39207), [#39241](https://github.com/PaddlePaddle/Paddle/pull/39241), [#39603](https://github.com/PaddlePaddle/Paddle/pull/39603), [#39758](https://github.com/PaddlePaddle/Paddle/pull/39758), [#39992](https://github.com/PaddlePaddle/Paddle/pull/39992)).
+    
+- Dynamic graph hybrid parallelism
+  
+  - Reconstruct `paddle.distributed.fleet.utils.recompute`, to support new dynamic computational graph. ([#41396](https://github.com/PaddlePaddle/Paddle/pull/41396))
+    
+  - Add pure FP16 training to support data parallelism. ([#36420](https://github.com/PaddlePaddle/Paddle/pull/36420))
+    
+  - Add MoE (Mixture of Experts) parallel strategy, to support large-scale MoE model training. ([#41092](https://github.com/PaddlePaddle/Paddle/pull/41092), [#40895](https://github.com/PaddlePaddle/Paddle/pull/40895), [#40850](https://github.com/PaddlePaddle/Paddle/pull/40580), [#39224](https://github.com/PaddlePaddle/Paddle/pull/39224))
+    
+  - Add GroupSharded parallel strategy. Support stage1, stage2, stage3, and it supports synchronous and asynchronous communication. It can be used together with the basic function combinations such as Recompute, AMP O1\O2, Offload, GroupShardedClipGrad, and GroupShardedScaler. ([#37489](https://github.com/PaddlePaddle/Paddle/pull/37489), [#37568](https://github.com/PaddlePaddle/Paddle/pull/37568), [#37707](https://github.com/PaddlePaddle/Paddle/pull/37707), [#37836](https://github.com/PaddlePaddle/Paddle/pull/37836), [#37947](https://github.com/PaddlePaddle/Paddle/pull/37947), [#38151](https://github.com/PaddlePaddle/Paddle/pull/38151), [#38407](https://github.com/PaddlePaddle/Paddle/pull/38407), [#38052](https://github.com/PaddlePaddle/Paddle/pull/38052), [#39112](https://github.com/PaddlePaddle/Paddle/pull/39112), [#38989](https://github.com/PaddlePaddle/Paddle/pull/38989), [#39171](https://github.com/PaddlePaddle/Paddle/pull/39171), [#39285](https://github.com/PaddlePaddle/Paddle/pull/39285), [#39334](https://github.com/PaddlePaddle/Paddle/pull/39334), [#39397](https://github.com/PaddlePaddle/Paddle/pull/39397), [#39581](https://github.com/PaddlePaddle/Paddle/pull/39581), [#39668](https://github.com/PaddlePaddle/Paddle/pull/39668), [#40129](https://github.com/PaddlePaddle/Paddle/pull/40129), [#40396](https://github.com/PaddlePaddle/Paddle/pull/40396), [#40488](https://github.com/PaddlePaddle/Paddle/pull/40488), [#40601](https://github.com/PaddlePaddle/Paddle/pull/40601)，[#37725](https://github.com/PaddlePaddle/Paddle/pull/37725)，[#37904](https://github.com/PaddlePaddle/Paddle/pull/37904), [#38064](https://github.com/PaddlePaddle/Paddle/pull/38064))
+    
+- Static graph hybrid parallelism
+  
+  - Add `scale_gradient` flag bit to `gradient_scale_configs` to control the position where the gradient aggregation operation averages the gradients under pipeline parallelism. ([#36384](https://github.com/PaddlePaddle/Paddle/pull/36384))
+    
+  - Under tensor parallelism, the dropout op supports the settings of deterministic random seed generators, to ensure random consistency for non-distributed variables and randomness of distributed variables. ([#36228](https://github.com/PaddlePaddle/Paddle/pull/36228))
+    
+  - NPU hybrid parallelism supports Offload, with saving 40% of NPU memory. ([#37224](https://github.com/PaddlePaddle/Paddle/pull/37224))
+    
+  - Add `force_cpu` optional parameter to the seed op, to allow dropout to read seed values directly from CPU. ([#35820](https://github.com/PaddlePaddle/Paddle/pull/35820))
+    
+  - Improve the Automatic Sparsity (ASP) sharding strategy and support the selection of sharding strategy according to the program. ([#40028](https://github.com/PaddlePaddle/Paddle/pull/40028))
+    
+- Automatic parallel
+  
+  - Add the process restart (relaunch) after automatic mapping between logical processes and physical devices. ([#37523](https://github.com/PaddlePaddle/Paddle/pull/37523), [#37326](https://github.com/PaddlePaddle/Paddle/pull/37326))
+    
+  - Improve the underlying mechanism and interface for automatic parallel to facilitate the unification of modules and add the optimized pass. ([#36617](https://github.com/PaddlePaddle/Paddle/pull/36617), [#38132](https://github.com/PaddlePaddle/Paddle/pull/38132))
+    
+  - Add unified resource representation, to support for automatic mapping between logical processes and physical devices. ([#37091](https://github.com/PaddlePaddle/Paddle/pull/37091), [#37482](https://github.com/PaddlePaddle/Paddle/pull/37482), [#37094](https://github.com/PaddlePaddle/Paddle/pull/37094))
+    
+  - Improve the distributed attribute complementation for the backward and update parts of the computation graph. ([#36744](https://github.com/PaddlePaddle/Paddle/pull/36744))
+    
+  - Add data slicing function. ([#36055](https://github.com/PaddlePaddle/Paddle/pull/36055))
+    
+  - Add tensor resharding function to reshard the tensor according to the distributed properties of the tensor and operator. ([#40865](https://github.com/PaddlePaddle/Paddle/pull/40865), [#41106](https://github.com/PaddlePaddle/Paddle/pull/41106))
+    
+  - Add the automatic conversion pass of distributed parameters when the number of resources or parallel policy changes. ([#40434](https://github.com/PaddlePaddle/Paddle/pull/40434))
+    
+  - Add GradientMerge pass to reduce the number of communications and improve training efficiency. ([#38259](https://github.com/PaddlePaddle/Paddle/pull/38259), [#40737](https://github.com/PaddlePaddle/Paddle/pull/40737))
+    
+  - Add Recompute pass to reduce the activation memory storage. ([#38920](https://github.com/PaddlePaddle/Paddle/pull/38920))
+    
+  - Add Sharding optimization pass, to support p-g-os 3 stage optimization. ([#38502](https://github.com/PaddlePaddle/Paddle/pull/38502))
+    
+  - Add AMP + FP16 optimization pass. ([#38764](https://github.com/PaddlePaddle/Paddle/pull/38764), [#40615](https://github.com/PaddlePaddle/Paddle/pull/40615))
+    
+  - Add fused QKV parallelization for Transformer class model. ([#39080](https://github.com/PaddlePaddle/Paddle/pull/39080))
+    
+  - Improve the sharding propagation for while op to ensure convergence of the fix-point algorithm. ([#39939](https://github.com/PaddlePaddle/Paddle/pull/39939), [#39086](https://github.com/PaddlePaddle/Paddle/pull/39086), [#39014](https://github.com/PaddlePaddle/Paddle/pull/39014))
+    
+  - Support training and inference for sub-block and while op control flow. ([#39612](https://github.com/PaddlePaddle/Paddle/pull/39612), [#39895](https://github.com/PaddlePaddle/Paddle/pull/39895), [#40077](https://github.com/PaddlePaddle/Paddle/pull/40077))
+    
+- Parameter Server
+  
+  - Add NaN/Inf value checking tool under GPUPS. ([#38131](https://github.com/PaddlePaddle/Paddle/pull/38131))
+    
+  - Under GPUPS, add set_date interface to adapt incremental training. ([#36194](https://github.com/PaddlePaddle/Paddle/pull/36194))
+    
+  - Under GPUPS, add asynchronous release dataset function. ([#37790](https://github.com/PaddlePaddle/Paddle/pull/37790))
+    
+  - Under GPUPS, support the Dump parameters and intermediate layers（[#36157](https://github.com/PaddlePaddle/Paddle/pull/36157)）；
+    
+  - Under GPUPS, support the optimizer parameter configuration. ([#39783](https://github.com/PaddlePaddle/Paddle/pull/39783), [#39849](https://github.com/PaddlePaddle/Paddle/pull/39849))
+    
+  - Under the Unified Parameter Server, refactor the base classes of each module such as communication and storage, to improve the ease of secondary development of each module. ([#41207](https://github.com/PaddlePaddle/Paddle/pull/41207), [#41022](https://github.com/PaddlePaddle/Paddle/pull/41022), [#40702](https://github.com/PaddlePaddle/Paddle/pull/40702), [#39341](https://github.com/PaddlePaddle/Paddle/pull/39341) [#39377](https://github.com/PaddlePaddle/Paddle/pull/39377), [#39191](https://github.com/PaddlePaddle/Paddle/pull/39191), [#39064](https://github.com/PaddlePaddle/Paddle/pull/39064))
+    
+  - Add evaluation metrics module under the Unified Parameter Server, to support AUC/WuAUC/MaskAUC and other evaluation metrics calculation and customizable extensions. ([#38789](https://github.com/PaddlePaddle/Paddle/pull/38789))
+    
+
+#### Profiler
+
+- Add the performance analysis module `paddle.profiler` in the Python layer: Provide the ability to collect, export, and count performance data during the training push. ([#40065](https://github.com/PaddlePaddle/Paddle/pull/40065), [#40357](https://github.com/PaddlePaddle/Paddle/pull/40357), [#40888](https://github.com/PaddlePaddle/Paddle/pull/40888))
+  
+  - `paddle.profiler.Profiler` : performance analyzer, interface for user interaction. ([#41029](https://github.com/PaddlePaddle/Paddle/pull/41029), [#41524](https://github.com/PaddlePaddle/Paddle/pull/41524), [#41157](https://github.com/PaddlePaddle/Paddle/pull/41157), [#40249](https://github.com/PaddlePaddle/Paddle/pull/40249), [#40111](https://github.com/PaddlePaddle/Paddle/pull/40111), [#39964](https://github.com/PaddlePaddle/Paddle/pull/39964), [#40133](https://github.com/PaddlePaddle/Paddle/pull/40133))
+    
+  - `paddle.profiler.RecordEvent`: provide custom punches to record time. ([#39693](https://github.com/PaddlePaddle/Paddle/pull/39693), [#39694](https://github.com/PaddlePaddle/Paddle/pull/39694), [#39695](https://github.com/PaddlePaddle/Paddle/pull/39695), [#39675](https://github.com/PaddlePaddle/Paddle/pull/39675),[#41445](https://github.com/PaddlePaddle/Paddle/pull/41445), [#41132](https://github.com/PaddlePaddle/Paddle/pull/41132))
+    
+  - `paddle.profiler.ProfilerTarget`: specify the target device for performance analysis.
+    
+  - `paddle.profiler.ProfilerState`: indicate the state of the performance analyzer.
+    
+  - `paddle.profiler.SortedKeys` : specify the sorting method of the data within the statistics form.
+    
+  - `paddle.profiler.make_scheduler`: the scheduler generating the performance analyzer state and implement the periodic control of the collection scope.
+    
+  - `paddle.profiler.export_chrome_tracing`: save performance data to a google chrome tracing file viewable by the chrome://tracing plugin. ([#39316](https://github.com/PaddlePaddle/Paddle/pull/39316), [#39984](https://github.com/PaddlePaddle/Paddle/pull/39984), [#41029](https://github.com/PaddlePaddle/Paddle/pull/41029))
+    
+  - `paddle.profiler.export_protobuf`: save performance data to a protobuf file represented by internal structure. ([#39519](https://github.com/PaddlePaddle/Paddle/pull/39519), [#39109](https://github.com/PaddlePaddle/Paddle/pull/39109), [#39474](https://github.com/PaddlePaddle/Paddle/pull/39474))
+    
+  - `paddle.profiler.load_profiler_result`: load the performance data saved to a protobuf file.
+    
+  - `paddle.profiler.Profiler` generate statistics for data reading, step overhead and throughput for the model training by specifying the `timer_only` parameter.([#40386](https://github.com/PaddlePaddle/Paddle/pull/40386))
+    
+- Refactor Profiler underlying infrastructure in C++ layer
+  
+  - Refactor the Profiler's controller architecture.（[#38826](https://github.com/PaddlePaddle/Paddle/pull/38826), [#39230](https://github.com/PaddlePaddle/Paddle/pull/39230), [#39779](https://github.com/PaddlePaddle/Paddle/pull/39779) ）
+    
+  - Add Host Tracer to collect host-side performance metrics.（[#37629](https://github.com/PaddlePaddle/Paddle/pull/39629), [#37766](https://github.com/PaddlePaddle/Paddle/pull/37766), [#37944](https://github.com/PaddlePaddle/Paddle/pull/37944), [#38280](https://github.com/PaddlePaddle/Paddle/pull/38280), [#39975](https://github.com/PaddlePaddle/Paddle/pull/39975), [#40460](https://github.com/PaddlePaddle/Paddle/pull/40460)）
+    
+  - Add CUDA Tracer to collect device-side performance metrics.（[#39488](https://github.com/PaddlePaddle/Paddle/pull/39488)）
+    
+  - Profiler support for grading.（[#39926](https://github.com/PaddlePaddle/Paddle/pull/39926)）
+    
 
-#### **Distributed training-static graph hybrid parallel**
-
-- Optimize the AMP grey list when model parallel + AMP. Support the model parallel operator. The performance improves by 8%. ([#33660](https://github.com/PaddlePaddle/Paddle/pull/33660))
-- Optimize the `device` property setting for reverse gradient accumulation in case of pipeline parallel. The performance improves by 1-3%. ([#33946](https://github.com/PaddlePaddle/Paddle/pull/33946))
-- Optimize the debug part of the pipeline parallel executor. The performance improves by 60-140%.   ([#33948](https://gifothub.com/PaddlePaddle/Paddle/pull/33948))
-- Support the `Program` cache in the pipeline parallel. The performance improves by 10-40%.  ([#33998](https://github.com/PaddlePaddle/Paddle/pull/33998), [#33954](https://github.com/PaddlePaddle/Paddle/pull/33954))
-- Optimize the communication waiting for the pipeline parallel `send`. The performance improves by 0.3-2%.  ([#34086](https://github.com/PaddlePaddle/Paddle/pull/34086)) 
-- Optimize the size of `send/recv` data volume in case of model parallel + pipeline parallel. The performance improves by 36% in the 8-machine test.  ([#34110](https://github.com/PaddlePaddle/Paddle/pull/34110))
-- Optimize the cast of parameters in the hybrid parallel + AMP. It is controlled by `optimize_cast`. The performance improves by 5-7%.  ([#34965](https://github.com/PaddlePaddle/Paddle/pull/34965))
-- Optimize the performance when pipeline parallel + recompute + amp. The performance improves by 13%.  ([#34519](https://github.com/PaddlePaddle/Paddle/pull/34519))
-- Support the ``float16`` communication when pipeline parallel + data parallel. It is controlled by ``distributed_strategy.fp16_allreduce``. The performance improves by 13% performance improvement.  ([#34762](https://github.com/PaddlePaddle/Paddle/pull/34762))
-
-#### **Operator optimization**
-
-- Design and implement the generic Reduce CUDA algorithm. It is applied to 7 Reduce operators, increasing by 1.0x ~ 22.7x. ([#32697](https://github.com/PaddlePaddle/Paddle/pull/32697), [#32974](https://github.com/PaddlePaddle/Paddle/pull/32974), [#33267](https://github.com/PaddlePaddle/Paddle/pull/33267), [#32885](https://github.com/PaddlePaddle/Paddle/pull/32885), [#33144](https://github.com/PaddlePaddle/Paddle/pull/33144),  [#33761](https://github.com/PaddlePaddle/Paddle/pull/33761), [#33901](https://github.com/PaddlePaddle/Paddle/pull/33901), [#34143](https://github.com/PaddlePaddle/Paddle/pull/34143),  [#34436](https://github.com/PaddlePaddle/Paddle/pull/34436))
-- Design and implement the generic Elementwise and Broadcast CUDA algorithms.  ([#32512](https://github.com/PaddlePaddle/Paddle/pull/32512), [#32928](https://github.com/PaddlePaddle/Paddle/pull/32928), [#33976](https://github.com/PaddlePaddle/Paddle/pull/33976), [#32148](https://github.com/PaddlePaddle/Paddle/pull/32148), [#32414](https://github.com/PaddlePaddle/Paddle/pull/32414)): Applied to 41 monadic and activation operators. ([#32348](https://github.com/PaddlePaddle/Paddle/pull/32348), [#32622](https://github.com/PaddlePaddle/Paddle/pull/32622), [#32823](https://github.com/PaddlePaddle/Paddle/pull/32823)). The performance improves by 1.1x ~ 1.4x. It is applied to 19 dualistic (9 basic computation class, 6 comparison class, and 4 logic class) operators. ([#33050](https://github.com/PaddlePaddle/Paddle/pull/33050), [33052](https://github.com/PaddlePaddle/Paddle/pull/33052), [#33053](https://github.com/PaddlePaddle/Paddle/pull/33053), [#33051](https://github.com/PaddlePaddle/Paddle/pull/33051), [#33089](https://github.com/PaddlePaddle/Paddle/pull/33089)) . The performance improves by 1.02x ~ 3.21x.  
-- Optimize the ``roll`` operator CUDA implementation. The performance improves by more than 10% and 50% in case of single-dimensional and multi-dimensional inputs, respectively.  ([#32880](https://github.com/PaddlePaddle/Paddle/pull/32880))
-- Optimize the ``roll`` operator index computation. The performance improves by 15% and 70% for single-dimensional and multi-dimensional input, respectively. ([#33909](https://github.com/PaddlePaddle/Paddle/pull/33909))
-- Optimize the CUDA implementation of the `update_loss_scaling_op` operator. The performance improves by 2.06x.  ([#32554](https://github.com/PaddlePaddle/Paddle/pull/32554))
-- Optimize the ``softmax_with_cross_entropy (hard label)`` GPU operator performance. The acceleration ratio is 1.0x ~ 10.0x.  ([#35660](https://github.com/PaddlePaddle/Paddle/pull/35660))
-- Optimize the CPU implementation of ``index_select`` forward and inverse operators. The acceleration ratio is 2.09x ~ 9.34x. ([#32863](https://github.com/PaddlePaddle/Paddle/pull/32863),  [#32955](https://github.com/PaddlePaddle/Paddle/pull/32955))
-- Optimize the CPU implementation of the ``batch_norm`` operator for 2-dimensional inputs. The acceleration ratio is 22.68x~30.00x.  ([#34585](https://github.com/PaddlePaddle/Paddle/pull/34585))
-- Optimize the GPU performance of the ``batch_norm`` operator in the initialization method and 2-dimensional input. The acceleration ratio is 1.25x~25x.  ([#33851](https://github.com/PaddlePaddle/Paddle/pull/33851), [#33887](https://github.com/PaddlePaddle/Paddle/pull/33887))
-- Optimize the ``log_softmax`` operator performance, and fix the related bug. The kernel performance improves by 4.22x~32.29x after optimization. ([#31630](https://github.com/PaddlePaddle/Paddle/pull/31630), [#32180](https://github.com/PaddlePaddle/Paddle/pull/32180), [#32396](https://github.com/PaddlePaddle/Paddle/pull/32396), [#32937](https://github.com/PaddlePaddle/Paddle/pull/32937))
-- Optimize the ``concat_and_split`` operator, to solve the problem that computation and communication cannot overlap when training BERT on Hygon DCU chips in dynamic graphs. The performance of BERT distributed training on Hygon DCU chip increases by 27%. ([#33982](https://github.com/PaddlePaddle/Paddle/pull/33982))
-- Optimize the ``fused_elemwise_act`` operator, with more than ten times performance improvement in the MB computing scale. ([#33480](https://github.com/PaddlePaddle/Paddle/pull/33480))
-
-#### **Strategy optimization**
-
-- Add the ``build_strategy.fix_op_run_order`` strategy, to immobilize the order of op execution. The speed of the ResNet model with single machine 8 cards increases by 1.8%. ([#34427](https://github.com/PaddlePaddle/Paddle/pull/34427))
-- For the dynamic graph inverse computation, support and automatically enable partial operator inplace strategy. The performance of the dynamic graph gpt model pure float16 training increases by 4.8%. ([#35412](https://github.com/PaddlePaddle/Paddle/pull/35412))
-- Optimize the dynamic graph performance by stripping logic executed only on static graphs from the execution path of dynamic graphs. ([#34024](https://github.com/PaddlePaddle/Paddle/pull/34024))
-- For the IR Pass, optimize the capability exposed as a general-purpose capability. Support both single machine and distributed optimization.The performance improves by 3%-5% in GPT mixed parallel scenarios. ([#34955](https://github.com/PaddlePaddle/Paddle/pull/34955), [#35704](https://github.com/PaddlePaddle/Paddle/pull/35704), [#34730](https://github.com/PaddlePaddle/Paddle/pull/34730), [#34524](https://github.com/PaddlePaddle/Paddle/pull/34524))
-- Optimize the ctc loss grad computation, increase the speed by ~3x. Correspondingly, the GPU memory usage increases.  ([#34729](https://github.com/PaddlePadle/Paddle/pull/34729))
-- transformer encoder Performance Optimization
-  - Optimization method: add `paddle.incubate.nn.FusedMultiHeadAttention` and `paddle.incubate.nn.FusedFeedForward`. In the implementation, q, k, v gemm fusion and multiple kernel fusion optimization techniques are used to improve performance of the transformer encoder.    
-    - FusedAttention
-      - Add `paddle.incubate.nn.functional.fused_multi_head_attention`, to support fusion computation of multi-head attention.  ([#35905](https://github.com/PaddlePaddle/Paddle/pull/35905) [35903](https://github.com/PaddlePaddle/Paddle/pull/35903) [#36803](https://github.com/PaddlePaddle/Paddle/pull/36803) [#36793](https://github.com/PaddlePaddle/Paddle/pull/36793) [36185](https://github.com/PaddlePaddle/Paddle/pull/36185))
-      - Add `paddle.incubate.nn.FusedMultiHeadAttention` for layer networking of the fused multi-head attention.  ([#36498](https://github.com/PaddlePaddle/Paddle/pull/36498) )
-      - This module uses q, k, v gemm fusion and bias add + dropout + residual add + layer_norm kernel fusion optimization techniques, resulting in 1.08x-1.45x acceleration. 
-
-    - FusedFeedForward
-      - Add `paddle.incubate.nn.functional.fused_feedforward`, to support feedforward fusion computation.  ([#36729](https://github.com/PaddlePaddle/Paddle/pull/36729) [#36730](https://github.com/PaddlePaddle/Paddle/pull/36730))
-      - Add `paddle.incubate.nn.FusedFeedForward` for layer networking of fused feedforward.  ([#36776](https://github.com/PaddlePaddle/Paddle/pull/36776))
-      - Performance is improved by about 1.04x~1.22x over pre-optimization.
-      - Add `paddle.incubate.nn.FusedTransformerEncoderLayer`, to support layer networking by using fused multi-head attention and fused feedforward computation.  ([#36776](https://github.com/PaddlePaddle/Paddle/pull/36776))
-
-
-
-
-### **(4) Troubleshooting**
+#### **Other**
 
-#### API
+- Model quantization
+  
+  - Upgrade quantization storage format to unify quantization formats for dynamic and static graphs. ([#41041](https://github.com/PaddlePaddle/Paddle/pull/41041))
+    
+  - Add new post training quantization (PTQ): EMD and Adaround. ([#40421](https://github.com/PaddlePaddle/Paddle/pull/40421), [#38460](https://github.com/PaddlePaddle/Paddle/pull/38460))
+    
+  - Support to quantize more operations in PTQ and QAT, such as crop, split, ab, unsqueeze etc. ([#40083](https://github.com/PaddlePaddle/Paddle/pull/40083))
+    
+  - Support to quantize operators in control flow. ([#37498](https://github.com/PaddlePaddle/Paddle/pull/37498))
+    
+  - Support quantization of matmul_v2 operator. ([#36469](https://github.com/PaddlePaddle/Paddle/pull/36469))
+    
+  - Add support for quantized matmul_v2 inference on TensorRT. ([#36594](https://github.com/PaddlePaddle/Paddle/pull/36594))
+    
+- CUDA memory optimization
+  
+  - Implement multi-stream safe Allocator to support safe and efficient use of CUDA memory in asynchronous computing scenarios. ([#37290](https://github.com/PaddlePaddle/Paddle/pull/37290))
+    
+  - Add new APIs (paddle.device.cuda.max_memory_allocated, paddle.device.cuda.max_memory_reserved, paddle.device.cuda.memory_allocated and paddle.device.cuda.memory_reserved) for GPU memory monitoring in runtime. ([#38657](https://github.com/PaddlePaddle/Paddle/pull/38657))
+    
+  - Support allocate CUDA Managed Memory to train super large models in memory-constrained scenarios. ([#39075](https://github.com/PaddlePaddle/Paddle/pull/39075))
+    
+  - Add GetBasePtr interface in C++ to get device address created with *cudaMalloc*. ([#37978](https://github.com/PaddlePaddle/Paddle/pull/37978))
+    
+  - Reduce the number of free blocks in AutoGrowth Allocator to improve memory allocation performance. ([#35732](https://github.com/PaddlePaddle/Paddle/pull/35732))
+    
+  - Remove redundant Float32 temporary tensor and cast operation for tensor with data type FP16 in`initializer.Normal` and `initializer.Constant`to save 2x memory. ([#38818](https://github.com/PaddlePaddle/Paddle/pull/38818))
+    
+- High-order derivative testing for models in dynamic graphs.
+  
+  - Add third-order derivative testing for network in dynamic graphs. ([#36814](https://github.com/PaddlePaddle/Paddle/pull/36814) , [#37377](https://github.com/PaddlePaddle/Paddle/pull/37377))
+- Custom op: Support to custom op in ROCm(HIP) platform. ([#36771](https://github.com/PaddlePaddle/Paddle/pull/36771))
+  
+- Cost Model: Add basic Cost Model based on profiling infomation. ([#35774](https://github.com/PaddlePaddle/Paddle/pull/35774))
+  
+- Added a function to allow user to add their own layer and correspond pruning way to ASP support. ([#40253](https://github.com/PaddlePaddle/Paddle/pull/40253))
+  
+- Add string tensor data structure, allowing the framework to have the ability to represent and process string. ([#39830](https://github.com/PaddlePaddle/Paddle/pull/39830), [#40992](https://github.com/PaddlePaddle/Paddle/pull/40992))
+  
+- Add or upgrade oneDNN FP32/int8/bfloat16 Kernel, including:
+  
+  - ELU ([#37149](https://github.com/PaddlePaddle/Paddle/pull/37149))
+    
+  - exp ([#38624](https://github.com/PaddlePaddle/Paddle/pull/38624))
+    
+  - stack ([#37002](https://github.com/PaddlePaddle/Paddle/pull/37002))
+    
+  - softplus ([#36382](https://github.com/PaddlePaddle/Paddle/pull/36382))
+    
+  - round ([#39653](https://github.com/PaddlePaddle/Paddle/pull/39653))
+    
+  - shape ([#36033](https://github.com/PaddlePaddle/Paddle/pull/36033))
+    
+  - flatten and flatten2 ([#35892](https://github.com/PaddlePaddle/Paddle/pull/35892))
+    
+  - slice ([#37630](https://github.com/PaddlePaddle/Paddle/pull/37630))
+    
+  - elementwise_mul ([#40546](https://github.com/PaddlePaddle/Paddle/pull/40546))
+    
+  - elementwise_add ([#38176](https://github.com/PaddlePaddle/Paddle/pull/38176))
+    
+  - ementwise_div ([#36158](https://github.com/PaddlePaddle/Paddle/pull/36158))
+    
+  - elementwise_sub ([#35662](https://github.com/PaddlePaddle/Paddle/pull/35662))
+    
+  - roi_align ([#37848](https://github.com/PaddlePaddle/Paddle/pull/37848))
+    
+  - nearest_interp and nearest_interp_v2 ([#37985](https://github.com/PaddlePaddle/Paddle/pull/37985)，[#38622](https://github.com/PaddlePaddle/Paddle/pull/38622)，[#39490](https://github.com/PaddlePaddle/Paddle/pull/39490))
+    
+  - assembly optimized Adam ([#39158](https://github.com/PaddlePaddle/Paddle/pull/39158))
+    
+  - logsoftmax ([#39793](https://github.com/PaddlePaddle/Paddle/pull/39793))
+    
+  - activation ([#40721](https://github.com/PaddlePaddle/Paddle/pull/40721))
+    
+  - mul ([#38552](https://github.com/PaddlePaddle/Paddle/pull/38552))
+    
+  - mean ([#37104](https://github.com/PaddlePaddle/Paddle/pull/37104))
+    
+  - relu ([#36265](https://github.com/PaddlePaddle/Paddle/pull/36265))
+    
+  - pool2d ([#37081](https://github.com/PaddlePaddle/Paddle/pull/37081))
+    
+  - concat ([#35889](https://github.com/PaddlePaddle/Paddle/pull/35889))
+    
+  - conv2d ([#38507](https://github.com/PaddlePaddle/Paddle/pull/38507)，[#38938](https://github.com/PaddlePaddle/Paddle/pull/38938)，[#36284](https://github.com/PaddlePaddle/Paddle/pull/36284))
+    
+  - LayerNorm ([#40418](https://github.com/PaddlePaddle/Paddle/pull/40418))
+    
 
--  Optimize the `depthwise_conv` numerical stability.  ([#35161](https://github.com/PaddlePaddle/Paddle/pull/35161))
--  Add the shape check at parameter creation, to ensure that the `size` of each axis of the parameter is greater than 0.  ([#33265](https://github.com/PaddlePaddle/Paddle/pull/33265))
--  Optimize the ``paddle.nn.LayerNorm`` computation, and fix the related data overflow bugs.  ([#34432](https://github.com/PaddlePaddle/Paddle/pull/34432), [#33658](https://github.com/PaddlePaddle/Paddle/pull/33658))
--  Support Windows application scenarios, integrate PaddlePaddle framework capabilities into MFC/QT/C# desktop software environments, and fix the bug in the process nesting that causes system crashes. ([#34312](https://github.com/PaddlePaddle/Paddle/pull/34312))
--  Fix the bug of the NLP model loss in the Reduce data initialization.  ([#34941](https://github.com/PaddlePaddle/Paddle/pull/34941))
--  Fix the bug when ``batch_size=1`` in ``paddle.nn.LayerNorm``. ([#35480](https://github.com/PaddlePaddle/Paddle/pull/35480))
--  Fix the bug of incorrectly catching an error when the input of ``paddle.static.nn.group_norm`` is empty. ([#35613](https://github.com/PaddlePaddle/Paddle/pull/35613))
--  Fix the bug of empty mean/variance when ``is_test=True`` in ``paddle.nn.functional.batch_norm``.  ([#35328](https://github.com/PaddlePaddle/Paddle/pull/35328))
--  Fix the out-of-bounds access bug when ``paddle.nn.functional.instance_norm`` and ``paddle.nn.functional.batch_norm`` are entered as null. ([#35341](https://github.com/PaddlePaddle/Paddle/pull/35341), [#34107](https://github.com/PaddlePaddle/Paddle/pull/34107))
--  Fix the bug where quantitative models do not count the output of ``paddle.nn.LayerNorm``.  ([#33610](https://github.com/PaddlePaddle/Paddle/pull/33610))
--  Fix the bug where ``paddle.nn.SyncBatchNorm.convert_sync_batchnorm()`` does not support 1D/3D.  ([#32989](https://github.com/PaddlePaddle/Paddle/pull/32989))
--  Fix the bug of failure to add the inverse in case of ``is_test=True`` in ``paddle.nn.BatchNorm1D, paddle.nn.BatchNorm2D, paddle.nn.BatchNorm3D``.  ([#32678](https://github.com/PaddlePaddle/Paddle/pull/32678))
--  Fix the bug where the `Tensor.cuda` does not support `device_id` configured to `None`.   ([#34416](https://github.com/PaddlePaddle/Paddle/pull/34416))
--  Fix the bug where the ``paddle.to_tensor`` does not support built-in types such as ``Tensor.dtype, core.Tensor``.  ([#31931](https://github.com/PaddlePaddle/Paddle/pull/31931), [#33430](https://github.com/PaddlePaddle/Paddle/pull/33430))
--  Fix the bug where the `paddle.nn.functional.log_softmax` does not support input dimension of 0.   ([#34635](https://github.com/PaddlePaddle/Paddle/pull/34635))
--  Fix the bug that the relative error between the CPU calculation result and accurate value of ``paddle.nn.GroupNorm`` under float32 is greater than that of 1e-5. ([#33176](https://github.com/PaddlePaddle/Paddle/pull/33176))
--  Fix the bug where the returned result is not 0 when the parameter ``offset`` exceeds the dimension size in the ``paddle.trace``, and fix the bug of the stack overflow when the parameters ``axis1`` and ``axis2`` entered are illegal values. ([#33922](https://github.com/PaddlePaddle/Paddle/pull/33922), [#35419](https://github.com/PaddlePaddle/Paddle/pull/35419))
--  Fix the bug where the output type is not int when the ``paddle.sum`` input parameter is the bool type.The output type is wrong when the input parameter type and output parameter type are inconsistent and the number of reduce elements corresponding to the axis is 1. ([#34313](https://github.com/PaddlePaddle/Paddle/pull/34313), [#36123](https://github.com/PaddlePaddle/Paddle/pull/36123))
--  Fix the bug of the division by 0 error and array out-of-bound when ``paddle.nn.conv2d/conv3d/conv2d_transpose/conv3d_transpose`` is the illegal input. ([#35337](https://github.com/PaddlePaddle/Paddle/pull/35337))
--  Fix the heap buffer overflow bug on illegal input of ``paddle.nn.conv2d_transpose``.  ([#35340](https://github.com/PaddlePaddle/Paddle/pull/35340))
--  Fix the bug where writing a null address to ``paddle.bmm`` causes the program to crash at runtime.  ([#35098](https://github.com/PaddlePaddle/Paddle/pull/35098))
--  Fix the bug when the ``cast`` operator cannot support Tensor conversion from int16 to float32.  ([#35156](https://github.com/PaddlePaddle/Paddle/pull/35156))
--  Fix the bug where the` assign` does not support float16 or uint8. ([#35153](https://github.com/PaddlePaddle/Paddle/pull/35153))
--  Fix the bug of `concat`'s tendency to overflow when the input is greater than shape tensor.  ([#34319](https://github.com/PaddlePaddle/Paddle/pull/34319))
--  Fix the bug where the `concat` in dynamic graphs does not support empty tensor as an input.  ([#35845](https://github.com/PaddlePaddle/Paddle/pull/35845))
--  Fix the bug where the ``paddle.where`` does not support broadcast.  ([#35092](https://github.com/PaddlePaddle/Paddle/pull/35092))
--  Fix the bug of ``paddle.reshape`` not checking input legality in the empty tensor. ([#35642](https://github.com/PaddlePaddle/Paddle/pull/35642))
--  Fix the bug of ``layernorm`` operator mis-matching with cuda kernel in the large shape.  ( [#33748](https://github.com/PaddlePaddle/Paddle/pull/33748))
--  Fix the bug of wrong setting of stop_gradient property in the static graph of ``random`` class operator. ( [#33959](https://github.com/PaddlePaddle/Paddle/pull/33959))
--  Fix the bug of wrong behavior of ``split`` operator with empty tensor input. ([#334356](https://github.com/PaddlePaddle/Paddle/pull/334356))
--  Fix the GPU memory leak bug in tensor's slice left-value assignment. ([#35013](https://github.com/PaddlePaddle/Paddle/pull/35013))
--  Fix the bug of the dynamic graph Layers not being used bycloudpickle dump and load. ([#35538](https://github.com/PaddlePaddle/Paddle/pull/35538))
--  Fix the bug of division by zero error in the illegal parameter settings for simple_rnn_cell, gru_cell, and lstm_cell APIs. ([#34627](https://github.com/PaddlePaddle/Paddle/pull/34627))
--  Fix the bug of the null pointer dereference in case of illegal input of ``paddle.nn.functional.linear``.  ([#34696](https://github.com/PaddlePaddle/Paddle/pull/34696))
--  Fix the memory out-of-bounds bug of the ``paddle.strided_slice``, ``paddle.transpose``. ([#35062](https://github.com/PaddlePaddle/Paddle/pull/35062), [#35079](https://github.com/PaddlePaddle/Paddle/pull/35079))
--  Fix the bug of the division by 0 error when the ``roll`` operator has an illegal input. ([#34499](https://github.com/PaddlePaddle/Paddle/pull/34499))
--  Fix an array out-of-bounds bug in the illegal input of the ``gather`` operator. ([#34096](https://github.com/PaddlePaddle/Paddle/pull/34096), [#34138](https://github.com/PaddlePaddle/Paddle/pull/34138), [#34200](https://github.com/PaddlePaddle/Paddle/pull/34200))
--  Fix the bug of division by 0 error in the illegal input of the ``prelu``, ``softlax`` operators. ([#34499](https://github.com/PaddlePaddle/Paddle/pull/34499))
--  Fix the bug where the ``split`` operator does not perform a legality check on input parameters.  ([#34630](https://github.com/PaddlePaddle/Paddle/pull/34630))
--  Fix the bug where the ``memcpy`` operator does not support Hygon DCU chips.  ([#35394](https://github.com/PaddlePaddle/Paddle/pull/35394))
--  Fix the bug of training error reporting of the ``slice`` operator when ``batch_size=1``. ([#34265](https://github.com/PaddlePaddle/Paddle/pull/34265))
--  Fix the overflow bug of the ``reduce_sum`` operator in the AMP.  ([#33960](https://github.com/PaddlePaddle/Paddle/pull/33960))
--  Fix the ANSI escape code error on windows.  ([#33689](https://github.com/PaddlePaddle/Paddle/pull/33689))
--  Fix the inconsistency bug between ``paddle.hub`` parsed file names and downloaded and saved files.  ([#33214](https://github.com/PaddlePaddle/Paddle/pull/33214))
--  Fix the memory leak bug when inputting empty tensor for ``matmul``, ``diag_embed``, and ``auc`` operators.  ([#34978](https://github.com/PaddlePaddle/Paddle/pull/34978))
--  Fix the bug of large computational accuracy error of broadcast for ``paddle.less_equal, paddle.less_than, paddle.greater_equal, and paddle.greater_than``. ([#32941](https://github.com/PaddlePaddle/Paddle/pull/32941))
--  Fix the crash bug of ``interpolate`` operator in case of a large input shape. ([#35577](https://github.com/PaddlePaddle/Paddle/pull/35577))
--  Fix legality check for ``interpolate``, ``unfold``, and ``spectral_norm`` operators in case of empty tensor input.  ([#33941](https://github.com/PaddlePaddle/Paddle/pull/33941), [#34943](https://github.com/PaddlePaddle/Paddle/pull/34943), [#35005](https://github.com/PaddlePaddle/Paddle/pull/35005))
--  Fix a possible negative sign (integer overflow) in `paddle.flops` when computing the output FLOPs. ([#33576](https://github.com/PaddlePaddle/Paddle/pull/33576))
--  Fix the bug of reporting an error when ``paddle.summary`` encounters a layer whose return value contains a non-Tensor element. ([#34160](https://github.com/PaddlePaddle/Paddle/pull/34160))
--  Fix the bug where the output shape is calculated incorrectly when the ``pool`` operator is entered illegally.  ([#35106](https://github.com/PaddlePaddle/Paddle/pull/35106))
--  Fix the legality check bug of the input shape for ``unfold, dice_loss, and reshape`` operators.  ([#34673](https://github.com/PaddlePaddle/Paddle/pull/34673), [#34757](https://github.com/PaddlePaddle/Paddle/pull/34757), [#35016](https://github.com/PaddlePaddle/Paddle/pull/35016))
--  Fix the input zero tensor bug of the ``unique, and unstack`` operators. ([#36021](https://github.com/PaddlePaddle/Paddle/pull/36021))
--  Fix the bug when the reverse input of stack operator is null. ([#362877](https://github.com/PaddlePaddle/Paddle/pull/32877))
--  Fix the bug of the division by 0 error in the CPU execution when the shape of the input Tensor of ``paddle.inverse`` is ``[0, 0, 0]``.  ([#34996](https://github.com/PaddlePaddle/Paddle/pull/34996))
--  Fix the bug of the CUDA error reported by ``paddle.nn.functional.grid_sample`` for special input cases. ([#33100](https://github.com/PaddlePaddle/Paddle/pull/33100))
--  Fix a compile-time dimension calculation error in ``paddle.flatten`` for special input cases of static graphs. ([#35321](https://github.com/PaddlePaddle/Paddle/pull/35321))
--  Fix a compile-time check error in ``paddle.nn.conv2d/conv3d/conv2d\_transpose/conv3d\_transpose`` when calculating output shape.  ([#35693](https://github.com/PaddlePaddle/Paddle/pull/35693))
--  Fix the bug where ``paddle.data.flowers`` is prone to data reading errors in multi-card training situations.  ([#33738](https://github.com/PaddlePaddle/Paddle/pull/33738))
--  Fix the bug that the loss is nan when the pact quantizes the se module. ([#35392](https://github.com/PaddlePaddle/Paddle/pull/35392))
--  Fix the bug of error reporting in the quantization `flatten_contiguous_range`. ([35410](https://github.com/PaddlePaddle/Paddle/pull/35410))
--  Fix the bug of pact quantization in dynamic graph mode.  ([#35407](https://github.com/PaddlePaddle/Paddle/pull/35407))
--  Fix the bug of the error report by channel-wise quantization bert. ([#34948](https://github.com/PaddlePaddle/Paddle/pull/34948))
--  Fix the bug with quantization when all parameters are 0. ([#34647](https://github.com/PaddlePaddle/Paddle/pull/34647))
--  Fix a bug in channel-wise quantization when the number of channels is 1. ([#33753](https://github.com/PaddlePaddle/Paddle/pull/33753))
--  Fix the bug of thread insecurity of the dynamic graph ``@no_grad``.  ([#34649](https://github.com/PaddlePaddle/Paddle/pull/34649))
--  Fix the bug where the ``paddle.grad`` interface will hang in some scenarios. ([#34023](https://github.com/PaddlePaddle/Paddle/pull/34023))
--  Fix the bug of shape derivation in ``paddle.masked_select`` in static graphs. ([#33167](https://github.com/PaddlePaddle/Paddle/pull/33167))
--  Fix the bug of ``paddle.slice`` not supporting ``numpy.ndarray`` type index in some scenarios, and error when ``axes`` is the ``tuple`` type. ([#35748](https://github.com/PaddlePaddle/Paddle/pull/35748), [#35267](https://github.com/PaddlePaddle/Paddle/pull/35267))
--  Fix the `set_value` reverse gradient truncation bug. ([#34304](https://github.com/PaddlePaddle/Paddle/pull/34304))
--  Fix the ``paddle.regularizer.L1Decay`` duplicate gradient setting bug in the non-inplace computation.  ([32710](https://github.com/PaddlePaddle/Paddle/pull/32710))
--  Fix the bug with learning rate not taking effect when grouping ``adamw`` parameters. ([#34468](https://github.com/PaddlePaddle/Paddle/pull/34468))
--  Optimize illegal ``dilate`` input check in convolution class APIs. ([#35894](https://github.com/PaddlePaddle/Paddle/pull/35894))
--  Fix the bug of the `paddle.io.DataLoader` iteration mid-break error reporting. ([#34501](https://github.com/PaddlePaddle/Paddle/pull/34501)) DataLoader memory leak bug. ([#34140](https://github.com/PaddlePaddle/Paddle/pull/34140)) DataLoader wrongly reporting the warning information. ([#33712](https://github.com/PaddlePaddle/Paddle/pull/33712))          DataLoader sub-process random state consistency bug. ([#33310](https://github.com/PaddlePaddle/Paddle/pull/33310))
--  Fix drop_last not taking effect in IterableDataset. ([#34801](https://github.com/PaddlePaddle/Paddle/pull/34801))
--  Fix the bug with optimizer state recovery caused by ``paddle.optimizer.lr.LRScheduler``.   ( [#33984](https://github.com/PaddlePaddle/Paddle/pull/33984))
--  Fix the bug of using ``axis`` for infershape in ``gather`` operator. ([#33413](https://github.com/PaddlePaddle/Paddle/pull/33413))
--  Fix a bug of getting stuck in Executor where fetch_list type is a tuple. ([#35726](https://github.com/PaddlePaddle/Paddle/pull/35726))
--  Fix the ``paddle.nn.GroupNorm`` divided by zero error, and add channel with the exact division check by group. ([#35644](https://github.com/PaddlePaddle/Paddle/pull/35644))
--  Fix the bug with referencing the freed memory in tensor formatter. ([#35399](https://github.com/PaddlePddle/Paddle/pull/35399))
--  Fix the bug of the ``beta`` parameter precision loss at ``float64`` precision for the Adam optimizer.  ([#33381](https://github.com/PaddlePaddle/Paddle/pull/33381))
--  Fix the precision misalignment bug caused by unbroadcasted initialization of tensor parallel non-tangent parameters. ([#35326](https://github.com/PaddlePaddle/Paddle/pull/35326))
--  Migrate the ``topk`` operator in the ``paddle.static.accuracy`` API to the ``topk_v2`` operator.  ([#35494](https://github.com/PaddlePaddle/Paddle/pull/35494))
--  Migrate the ``expand`` operator to ``tile`` operator in ``paddle.nn.dynamic_decode``, and ``topk`` operator to ``topk_v2`` operator in the ``paddle.nn.BeamSearchDecoder``. ([#35656](https://github.com/PaddlePaddle/Paddle/pull/35656))
--  Migrate the one_hot operator in ``paddle.nn.functional.dice_loss`` API to the ``one_hot_v2`` operator. ([#35734](https://github.com/PaddlePaddle/Paddle/pull/35734))
--  Fix the bug of usage in the static graph mode in ``paddle.summary``. ([#35303](https://github.com/PaddlePaddle/Paddle/pull/35303))
--  Fix the multi-card startup bug in ``paddle.Model.prepare`` static graph mode.  ([#34311](https://github.com/PaddlePaddle/Paddle/pull/34311))
-- Fix error report of `paddle.nn.functional.cross_entropy` when `weight` is given and `axis` is specified as a legal dimension other than -1. ([#36647](https://github.com/PaddlePaddle/Paddle/pull/36647))
-- Fix a bug with `paddle.utils.dlpack.to_dlpack` that prevents it from encoding multidimensional `Tensor`, and fix a bug with its generated DLPack objects not being shared across deep learning frameworks. ([#36177](https://github.com/PaddlePaddle/Paddle/pull/36177))
-- Fix a bug in the `sample` method using `paddle.distribution.Categorical`, specifically, due to an out-of-bounds array access in the multinomial op's cuda kernel. The bug causes access to values beyond the subscript of the array, causing an error to be reported. ([#36511](https://github.com/PaddlePaddle/Paddle/pull/36511))
-- Fix a bug in the dynamic graph `_BatchNormBase` base class where the default_dtype is modified, resulting in the wrong type of subsequent networking parameters. Affected APIs are `paddle.nn.BatchNorm1D`, `paddle.nn.BatchNorm2D`, ` paddle.nn.BatchNorm3D`, and `paddle.nn.SyncBatchNorm`. The specific reason is that when `get_default_dtype() == 'float16'`, the default parameter data type is modified by `set_default_dtype('float32')`. The parameter type of dynamic graph networking is created by default_dtype. Therefore, when the default parameter type is modified, subsequent networking parameter type is consequently incorrect.  ([#36376](https://github.com/PaddlePaddle/Paddle/pull/36376))
-- Fix an exception in `paddle.nn.functional.grid_sample` caused by special input. ([#36625](https://github.com/PaddlePaddle/Paddle/pull/36625))
-- Fix calculation error of `paddle.fft.ffft`, `paddle.fft.ifft`, `paddle.fft.rfft` , `paddle.fft.irfft`, `paddle.fft.hfft`, and `paddle.fft.ihfft` when input ` axis=0`. ([#36537](https://github.com/PaddlePaddle/Paddle/pull/36537))
-- Fix a bug of errors of `paddle.fft.fftshift` and `paddle.fft.ifftshift` under static graphs.  ([#36537](https://github.com/PaddlePaddle/Paddle/pull/36537))
-- Fix a bug where `paddle.fft.ifftshift` is not calculated correctly. ([#36835](https://github.com/PaddlePaddle/Paddle/pull/36835))
-- Fix error message prompt for `paddle.nn.functional.pad` in `replicate` mode. ([#36531](https://github.com/PaddlePaddle/Paddle/pull/36531))
+### **(2) Function optimization**
 
+#### API
 
+- Add support for mixed precision training O2 mode for `paddle.Model`, i.e., support for Pure FP16 training mode of the original dynamic/static graphs. ([#36441](https://github.com/PaddlePaddle/Paddle/pull/40962441))
+  
+- Support for self chain calls for `paddle.nn.Layer`. ([#36609](https://github.com/PaddlePaddle/Paddle/pull/36609))
+  
+- Add settings of `is_distributed` property for the `to` method of `paddle.nn.Layer` to ensure that the distributed properties remain consistent before and after network parameter transform. ([#36221](https://github.com/PaddlePaddle/Paddle/pull/36221))
+  
+- Improve the parameter conversion logic of the `to` method of `paddle.nn.Layer`, to reduce the peak memory consumption of the conversion process and improve the conversion success rate. ([#36862](https://github.com/PaddlePaddle/Paddle/pull/36862))
+  
+- Support settings of the shape of the output Tensor for `paddle.incubate.graph_send_recv` to reduce the memory usage during the actual computation. ([#40509](https://github.com/PaddlePaddle/Paddle/pull/40509))
+  
+- Add the support of int32 and int64 data types for `paddle.incubate.segment_sum`, `segment_mean`, `segment_max`, and `segment_min`. ([#40577](https://github.com/PaddlePaddle/Paddle/pull/40577))
+  
+- Add the support of the bool type for transpose op. ([#35886](https://github.com/PaddlePaddle/Paddle/pull/35886))
+  
+- Switch the `paddle.mm` underlying operator from matmul to matmul_v2. ([#35770](https://github.com/PaddlePaddle/Paddle/pull/35770))
+  
+- Support static graph mode and support the unknown shape for `paddle.einsum`. ([#40360](https://github.com/PaddlePaddle/Paddle/pull/40360))
+  
+- Support data`parallelism for paddle.nn.functional.margin_cross_entropy` and `paddle.nn.functional.class_center_sample`. ([#39852](https://github.com/PaddlePaddle/Paddle/pull/39852))
+  
+- Support input of shape [1] for `paddle.nn.functional.grid_sample` . （[#36183](https://github.com/PaddlePaddle/Paddle/pull/36183)）
+  
+- Support NHWC data format for `paddle.nn.PRelu` . ([#37019](https://github.com/PaddlePaddle/Paddle/pull/37019))
+  
+- Support the fixed random state using `paddle.seed` for `paddle.nn.functional.class_center_sample` . ([#38248](https://github.com/PaddlePaddle/Paddle/pull/38248))
+  
+- Add ROCM backend support for all APIs under `paddle.fft` , and optimize CUFFT backend error messages. ([#36415](https://github.com/PaddlePaddle/Paddle/pull/36415), [#36114](https://github.com/PaddlePaddle/Paddle/pull/36114/files))
+  
+- Support the function that the slicing dimension i 0, that is, allow slicing index results to be empty . ([#37313](https://github.com/PaddlePaddle/Paddle/pull/37313))
+  
+- Support int and bool type Tensor with using bool index for `Tensor.setitem` . ([#37761](https://github.com/PaddlePaddle/Paddle/pull/37761))
+  
+- Support nearest mode for `paddle.nn.functional.interpolate` when the input shape is 5D. ([#38868](https://github.com/PaddlePaddle/Paddle/pull/38868))
+  
+- Add the support of int16 for `paddle.nn.Embedding`and`paddle.gather`. ([#40964](https://github.com/PaddlePaddle/Paddle/pull/40964), [#40052](https://github.com/PaddlePaddle/Paddle/pull/40052))
+  
+- Support data`parallelism on single machine on``CPU platform``in paddle.distributed.spawn` . ([#35745](https://github.com/PaddlePaddle/Paddle/pull/35745), [#36758](https://github.com/PaddlePaddle/Paddle/pull/36758), [#36637](https://github.com/PaddlePaddle/Paddle/pull/36637))
+  
+- Add `depthwise_conv2d` MKLDNN operator. ([#38484](https://github.com/PaddlePaddle/Paddle/pull/38484))
+  
+- Add complex types check in the static graph model for API`paddle.abs` , `paddle.transpose` , `paddle.squeeze` , `paddle.unsqueeze` , `paddle.matmul` , and `paddle.full` . ([#40113](https://github.com/PaddlePaddle/Paddle/pull/40113))
+  
+- Support tuple and list type arguments for `paddle.autograd.PyLayer` . ([#38146](https://github.com/PaddlePaddle/Paddle/pull/38146))
+  
+- Add check whether tensor is inplace and leaf when calculate gradient . ([#37931](https://github.com/PaddlePaddle/Paddle/pull/37931))
+  
+- Support HIP library for `paddle.autograd.PyLayer` . ([#38184](https://github.com/PaddlePaddle/Paddle/pull/38184))
+  
+- Support more size inputs for `paddle.take_along_axis` and `paddle.put_along_axis` , and allow index matrix shape size to be larger than array matrix shape size. ([#39072](https://github.com/PaddlePaddle/Paddle/pull/39072))
+  
+- Optimize the error report message of API `paddle.nn.Pad2D` when replicate is 0. ([#36510](https://github.com/PaddlePaddle/Paddle/pull/36510/files))
+  
+- Support pad input in tuple format for API `paddle.nn.Pad2D` . ([#35985](https://github.com/PaddlePaddle/Paddle/pull/35985/files))
+  
+- Add tdm_sample API in `paddle.distributed.InMemoryDataset` to support sampling operations in TDM algorithms. ([#37044](https://github.com/PaddlePaddle/Paddle/pull/37044))
+  
+- Add Pre-saving Hooks mechanism for `paddle.jit.save` . ([#38186](https://github.com/PaddlePaddle/Paddle/pull/38186)）
+  
+- Add new higher-order differentiation-related APIs.
+  
+  - `elementwise_add`: add third-order Kernel, to support computation of third-order differentiation. ([#36508](https://github.com/PaddlePaddle/Paddle/pull/36508), [#36618](https://github.com/PaddlePaddle/Paddle/pull/36618))
+    
+  - `matmul_v2`: add third-order Kernel, to support computation of third-order differentiation. ([#36459](https://github.com/PaddlePaddle/Paddle/pull/36459))
+    
+  - `elementwise_mul`: Add third-order Kernel, to support computation of third-order differentiation. ([#37152](https://github.com/PaddlePaddle/Paddle/pull/37547))
+    
+- Improve the logic of the `paddle.amp.GradScaler` to call check_finite_and_unscale op, to eliminate the cudaMemcpy introduced by the creation of the bool variable. ([#37770](https://github.com/PaddlePaddle/Paddle/pull/37770))
+  
+- Add check for unstack and unique op in case of input Tensor with 0 elements. ([#36021](https://github.com/PaddlePaddle/Paddle/pull/36021))
+  
 
 #### IR(Intermediate Representation)
 
-- Dynamic graph to static graph
-  - Fix an abnormal growth of GPU memory under ``paddle.no_grad`` semantics after dynamic to static. ([#35725](https://github.com/PaddlePaddle/Paddle/pull/35725))
-  - Fix a misidentification and conversion bug in the ``paddle.no_grad`` interface.  ([#34136](https://github.com/PaddlePaddle/Paddle/pull/34136)) 
-  - Fix a bug of reporting an error in dynamic to static training when stop_gradient=True is set in the middle of the model in some scenarios. ([#36353](https://github.com/PaddlePaddle/Paddle/pull/36353))
-  - Fix a bug of reporting an error when checking the return result in some scenarios where the control flow “if” is converted. ([#36830](https://github.com/PaddlePaddle/Paddle/pull/36830))
-  - Fix a bug that the return type changes unexpectedly due to additional dynamic to static aligning in the return length when “ifelse” branch returns unequal results. ([#36565](https://github.com/PaddlePaddle/Paddle/pull/36565))
-  - Fix a bug where video memory will keep growing in train mode and no_grad contexts after loading a model via the jit.save/load interface. ([#36463](https://github.com/PaddlePaddle/Paddle/pull/36463))
-
-#### **Distributed training**
-
-- Basic functions of distributed training
-  - Fix a potential stack overflow bug in the graph engine. ([#33055](https://github.com/PaddlePaddle/Paddle/pull/33055)) 
-  - Fix a potential deadlock bug in the distributed training. ([#34461](https://github.com/PaddlePaddle/Paddle/pull/34461))
-  - Fix the bug where tensor parallel is incorrectly sliced in the multi-headed attention computation of transformer class models. Optimize the speed of tensor parallel in mixed precision computations. ([#33015](https://github.com/PaddlePaddle/Paddle/pull/33015)) 
-  - Fix the bug where the norm of non-distributed vars is computed for multiple times when using `paddle.nn.ClipGradientByGlobalNorm` in the model parallel. ([#35713](https://github.com/PaddlePaddle/Paddle/pull/35713))
-  - Fix the bias addition position error in the row slice in the model parallel `paddle.distributed.split` Parallel Linear. ([#35186](https://github.com/PaddlePaddle/Paddle/pull/35186))
-  - Fix the possible hang bug in the pipeline parallel initialization communication group. ([#33476](https://github.com/PaddlePaddle/Paddle/pull/33476))
-  - Fix the bug where the `Tensor` GPU memory in pipeline parallel is released before it is actually used. ([#33996](https://github.com/PaddlePaddle/Paddle/pull/33996))
-  - Fix the bug where the pipeline parallel reverse gradient accumulation `op_device` is empty.  ([#33875](https://github.com/PaddlePaddle/Paddle/pull/33875))
-  - Fix the bug with pipeline parallel running `sub-block` errors.  ([#32727](https://github.com/PaddlePaddle/Paddle/pull/32727))
-  - Fix the bug where the pipeline parallel reverse gradient accumulation `op_device` is empty. ([#33875](https://github.com/PaddlePaddle/Paddle/pull/33875))
-  - Fix an occasional hang bug when initializing Sharding parallel communication. ([#33327](https://github.com/PaddlePaddle/Paddle/pull/33327))
-  - Fix the `paddle.distributed.barrier` synchronization flow error bug.  ([#33476](https://github.com/PaddlePaddle/Paddle/pull/33476))
-  - Fix the `paddle.distributed.alltoall` communication group setting error bug. ([#32890](https://github.com/PaddlePaddle/Paddle/pull/3492890))
-  - Fix a precision misalignment caused by a static graph tensor parallel parameter initial swap broadcast error. ([35326](https://github.com/PaddlePaddle/Paddle/pull/35326))
-  - Fix the bug where dynamic graph data parallel does not support custom operators such as `recompute` inheriting from `PyLayer` class. ([#35401](https://github.com/PaddlePaddle/Paddle/pull/35401))
-  - Fix the hang bug in case of pipeline parallel + data parallel in the mixed parallel. ([#34142](https://github.com/PaddlePaddle/Paddle/pull/34142))
-  - Fix the `fleet.get_loss_scaling` failure bug in case of enabling AMP.  ([#33935](https://github.com/PaddlePaddle/Paddle/pull/33935))
-  - Fix the Connection Refused problem caused by a `fleet` multi-machine master not waiting for other nodes to be ready. ([#32889](https://github.com/PaddlePaddle/Paddle/pull/32889))
-  - Fix the bug where the distributed prediction `infer_from_dataset` still updates parameter gradients. ([#35698](https://github.com/PaddlePaddle/Paddle/pull/35698))
-  - Fix the bug in `data_feed` where the dense feature LOD attribute is incorrectly set. ([#35000](https://github.com/PaddlePaddle/Paddle/pull/35000))
-  - Fix the save bug with the `gradient_merge_cond` variable when using `gradientmerge` for static graphs. ([#35578](https://github.com/PaddlePaddle/Paddle/pull/35578))
-  - Fix the save bug with the `paddle.hub` download file name and the` nt_merge_cond variable`.  ([#35578](https://github.com/PaddlePaddle/Paddle/pull/35578))
-  - Fix the bug of unclearly reporting an error when `fleet` is enabled with `dump_slot`. ([#34173](https://github.com/PaddlePaddle/Paddle/pull/34173))
-  - Fix the RCCL bug on Hygon DCU chips in the hybrid parallel training. ([#32808](https://github.com/PaddlePaddle/Paddle/pull/32808))
-  - Fix GPU parameter server exit error reporting bug. ([#33724](https://github.com/PaddlePaddle/Paddle/pull/33724))
-  - Fix the bug of unavailability of upload/download function of the hdfs tool. ([#33903](https://github.com/PaddlePaddle/Paddle/pull/33903))
-  - Fix the bug of the GPU parameter server getting stuck during training because the sample cannot exactly divide the worker number. ([#32640](https://github.com/PaddlePaddle/Paddle/pull/32640))
-  - Fix the GPU parameter server error reported by using non-0 card training. ([#33078](https://github.com/PaddlePaddle/Paddle/pull/33078))
-  - Fix the bug of the delta score and scale show in the GPU Parameter Server. ([#33492](https://github.com/PaddlePaddle/Paddle/pull/33078), [#33492](https://github.com/PaddlePaddle/Paddle/pull/33492))
-  - Fix the bug with GPU Parameter Server not merging dense after training, in incorrect g2sum calculation. For data norm, add the optimize op. ([#35029](https://github.com/PaddlePaddle/Paddle/pull/35029))
-  - Fix an error reported if the gradient is empty when using the fuse all reduce ops switch.  ([#36231](https://github.com/PaddlePaddle/Paddle/pull/36231))
-  - Fix a bug with dist_transformer files showing undefined variables. ([#36211](https://github.com/PaddlePaddle/Paddle/pull/36211))
-
-- Dynamic graph hybrid parallel
-  - Fix the precision error in pipeline parallel due to communication asynchronization. [#35556](https://github.com/PaddlePaddle/Paddle/pull/35556)
-  - Fix the precision exception bug in ``paddle.distributed.fleet.meta_parallel.RowParallelLinear`` reverse computation under tensor parallel. [#33207](https://github.com/PaddlePaddle/Paddle/pull/33207)
-  - Fix a bug in tensor parallel causing parameter initialization error and precision exception due to randomness control error.  [#32897](https://github.com/PaddlePaddle/Paddle/pull/32897) ([#32897](https://github.com/PaddlePaddle/Paddle/pull/32897))
-  - Fix the random hang bug when creating a communication group with ``paddle.distributed.new_group()``.  [#33141](https://github.com/PaddlePaddle/Paddle/pull/33141)
-  - Fix the bug of causing an error in traversing the reverse graph to resolve control flow networking under data parallel.  [#32715](https://github.com/PaddlePaddle/Paddle/pull/32715)
-  - Fix the bug of causing an error when synchronizing the parameters of each process under data parallel. [#33955](https://github.com/PaddlePaddle/Paddle/pull/33955)
-
-- Static graph hybrid parallel
-  - Fix a slice error in TensorParallel in Multi-Head Attention networks, and optimize the training speed when TensorParallel is used together with mixed precision. ([#32897](https://github.com/PaddlePaddle/Paddle/pull/32897))
-
-#### **Others**
-
-- Custom OP
-  - Fix the bug where the ``cast`` method of ``paddle::Tensor`` does not take effect in the GPU.  ([#34884](https://github.com/PaddlePaddle/Paddle/pull/34884))
-  - Fix the bug where custom operators cannot load multiple modules at the same time.  ([#34505](https://github.com/PaddlePaddle/Paddle/pull/34505))
-  - Fix the bug where the ``PADDLE_WITH_CUDA`` macro does not take effect in co-compiling of .cc and .cu files. ([#35448](https://github.com/PaddlePaddle/Paddle/pull/35448))
-- Remove changes to ``logging`` library global settings.  ([#32673](https://github.com/PaddlePaddle/Paddle/pull/32673))
-- Add ``GlooParallelContext``, to adapt the ``Reducer`` module logic, and provide underlying communication component support for ``DataParallel`` subsequently supporting CPU parallel later.  ([#35154](https://github.com/PaddlePaddle/Paddle/pull/35154))
-- Migrate `top_k` op in `paddle.metric.accuracy` to `top_k_v2` op.   ([#35789](https://github.com/PaddlePaddle/Paddle/pull/35789))
-- Fix the bug where the default `attr` is not found running under `MKLDNN`. ([#34567](https://github.com/PaddlePaddle/Paddle/pull/34567))
-- Fix the bug in `optimizer` where `device_key` is not added to the `clear_float_status` OP. ([#34431](https://github.com/PaddlePaddle/Paddle/pull/34431))
+- Dynamic Graphs to Static Graphs
+  
+  - Optimize the behavior of the `ProgramCache.last` interface for dynamic graph to static graph so that it returns the most recently used Program instead of the final generated Program. ([#39541](https://github.com/PaddlePaddle/Paddle/pull/39541))
+    
+  - Optimize the error report message for the `paddle.reshape` API for dynamic graph to static graph, and add a new recommended usage hint. ([#40599](https://github.com/PaddlePaddle/Paddle/pull/40599))
+    
+  - Optimize the type of exception catch in the `is_api_in_module` function when transcribing dynamic code to static code. ([#40243](https://github.com/PaddlePaddle/Paddle/pull/40243))
+    
+  - Optimize the hint of error message for dynamic graph to static graph，hide warning information by default. ([#39730](https://github.com/PaddlePaddle/Paddle/pull/https://github.com/PaddlePaddle/Paddle/pull/39730))
+    
+  - Add the support of type hint syntax for dynamic graph to static graph to improve the accuracy of variable type analysis. ([#39572](https://github.com/PaddlePaddle/Paddle/pull/39572))
+    
+  - Optimize the `paddle.cond` function to allow values are equal for basic types such as bool and int . ([#37888](https://github.com/PaddlePaddle/Paddle/pull/37888))
+    
+  - Optimize the decorate function `@to_static` to allow the switch of the train/eval mode. ([#37383](https://github.com/PaddlePaddle/Paddle/pull/37383))
+    
+  - Optimize the stack of error report for dynamic graph to static graph, to highlight user-related codes and reduce the framework redundant error stack. ([#36741](https://github.com/PaddlePaddle/Paddle/pull/36741))
+    
+  - Remove `no_value` placeholder from the return value of `paddle.cond`. ([#36513](https://github.com/PaddlePaddle/Paddle/pull/36513)、[#36826](https://github.com/PaddlePaddle/Paddle/pull/36826))
+    
+  - Adapt the run_program op to the new dynamic graph mode. ([#40198](https://github.com/PaddlePaddle/Paddle/pull/40198), [#40355](https://github.com/PaddlePaddle/Paddle/pull/40355))
+    
+  - Add check for zip syntax. ([#37846](https://github.com/PaddlePaddle/Paddle/pull/https://github.com/PaddlePaddle/Paddle/pull/37846))
+    
+  - Fix the dynamic graph to static graph failure due to the error of dimension and type judgment in the `paddle.signal.frame`, `paddle.signal.stft` and `paddle.signal.istft`. ([#40113](https://github.com/PaddlePaddle/Paddle/pull/40113))
+    
+  - Add registration of plural type Kernel for mean, pad3d ops. ([#40113](https://github.com/PaddlePaddle/Paddle/pull/40113))
+    
 
+#### **Mixed Precision Training**
 
+- Add GPU Compute Capability environment check for amp. Add the usage warning for GPU environments that the fail acceleration for training. ([#38086](https://github.com/PaddlePaddle/Paddle/pull/38086))
+  
+- Add check of calling order when using `paddle.amp.decorate` and `paddle.DataParallel` at the same time. ([#38785](https://github.com/PaddlePaddle/Paddle/pull/38785))
+  
 
-## **4. Deployment Direction (Paddle Inference)**
+#### **Distributed Training**
 
-### **(1) New features**
+- Basic functions of the distributed training
+  
+  - Optimize Fleet API and DistributedStrategy configuration to use dynamic graph parallel function conveniently. ([#40408](https://github.com/PaddlePaddle/Paddle/pull/40408))
+    
+  - Optimize Dynamic Graph mixed parallel HybridParallelClipGrad strategy, support 4D hybrid parallel and Pure FP16 training. ([#36237](https://github.com/PaddlePaddle/Paddle/pull/36237), [#36555](https://github.com/PaddlePaddle/Paddle/pull/36555))
+    
+  - Restructure dynamic graph data parallel strategy, to support new dynamic graph and communication. ([#40389](https://github.com/PaddlePaddle/Paddle/pull/40389), [#40593](https://github.com/PaddlePaddle/Paddle/pull/40593), [#40836](https://github.com/PaddlePaddle/Paddle/pull/40836), [#41119](https://github.com/PaddlePaddle/Paddle/pull/41119), [#41413](https://github.com/PaddlePaddle/Paddle/pull/41413), [#39987](https://github.com/PaddlePaddle/Paddle/pull/39987))
+    
+  - Support distributed tensor model parallel for fused_attention op. ([#40101](https://github.com/PaddlePaddle/Paddle/pull/40101))
+    
+  - Support the distributed tensor model parallel for fused_feedforward op. ([#40160](https://github.com/PaddlePaddle/Paddle/pull/40160))
+    
+- Graph retrieval engine
+  
+  - Optimize the data format returned by the graph sampling interface of the graph engine, with a 3x improvement of the sampling speed. ([#37315](https://github.com/PaddlePaddle/Paddle/pull/37315))
+    
+  - Reduce the amount of graph engine threads to improve performance. ([#37098](https://github.com/PaddlePaddle/Paddle/pull/37098))
+    
+  - Optimize graph engine data transfer to improve performance. ([#37341](https://github.com/PaddlePaddle/Paddle/pull/37341))
+    
+  - Optimize the merge logic of embedding op to improve performance by exploiting the topological relationship of embedding op in the model. [(#35942)](https://github.com/PaddlePaddle/Paddle/pull/35942)
+    
+- Communication library: restructure the communication library to improve the scalability and development of the communication library, and support heterogeneous communication. ([#41398](https://github.com/PaddlePaddle/Paddle/pull/41398), [#39720](https://github.com/PaddlePaddle/Paddle/pull/39720), [#40911](https://github.com/PaddlePaddle/Paddle/pull/40911), [#40579](https://github.com/PaddlePaddle/Paddle/pull/40579), [#40629](https://github.com/PaddlePaddle/Paddle/pull/40629), [#40437](https://github.com/PaddlePaddle/Paddle/pull/40437), [#40430](https://github.com/PaddlePaddle/Paddle/pull/40430), [#40228](https://github.com/PaddlePaddle/Paddle/pull/40228), [#40181](https://github.com/PaddlePaddle/Paddle/pull/40181), [#40100](https://github.com/PaddlePaddle/Paddle/pull/40100), [#40097](https://github.com/PaddlePaddle/Paddle/pull/40097), [#39892](https://github.com/PaddlePaddle/Paddle/pull/39892), [#39384](https://github.com/PaddlePaddle/Paddle/pull/39384), [#39737](https://github.com/PaddlePaddle/Paddle/pull/39737), [#40040](https://github.com/PaddlePaddle/Paddle/pull/40040))
+  
 
-#### **Back-end capability enhancement**
+#### **Other**
 
-- Add the dynamic shape auto-configuration function in TensorRT sub-graph mode. Add TensorRT offline tune dynamic shape setting method. For scenarios where the model is cut into multiple TensorRT sub-graphs, improve ease of use. [#34806](https://github.com/PaddlePaddle/Paddle/pull/34806) [#35771](https://github.com/PaddlePaddle/Paddle/pull/35771), For examples, see the [demo](https://github.com/PaddlePaddle/Paddle-Inference-Demo/tree/master/c%2B%2B/paddle-trt/tuned_dynamic_shape).
+- Error report and debugging optimization
+  
+  - Optimize `the error message of the label` boundary check for the cross_entropy op. ([#40001](https://github.com/PaddlePaddle/Paddle/pull/40001))
+    
+  - Add profile record for `infer_shape` and `compute` methods of op execution of dynamic graphs, show their cost in timeline. ([#39023](https://github.com/PaddlePaddle/Paddle/pull/39023))
+    
+  - Replace `pybind::index_error` error hint on Windows for unknown exceptions. ([#40538](https://github.com/PaddlePaddle/Paddle/pull/40538))
+    
+  - Add the error message in the out-of-bounds checks for user scatter op. ([#37429](https://github.com/PaddlePaddle/Paddle/pull/37429))
+    
+- Download tool: For the problem of slow decompression of directories with multiple files in `paddle.utils.download.get_path_from_url`, replace the original way (traverse directory in loop) of decompressing files in directories one by one by calling extractall on the directory, which greatly improves the decompression speed. ([#37311](https://github.com/PaddlePaddle/Paddle/pull/37311))
+  
+- Speed up the quantization training for`fake_quantize_range_abs_max`、`fake_quantize_abs_max`、`fake_quantize_dequantize_abs_max`、 `fake_quantize_moving_average_abs_max`, etc. ([#40491](https://github.com/PaddlePaddle/Paddle/pull/40491))
+  
 
-  - The basic idea of the ease of use optimization: to use Paddle to run natively to statistically calculate the shape ranges of all temporary tensors in the graph for the batch data input by the user, and set the statistically calculated shape ranges to the input of TensorRT sub-graphs, thus avoiding the user to manually calculate the shape ranges of the input tensors of internal sub-graphs and improving ease of use.
-    - Basic process of offline tuned dynamic shape: After the user code is completed, set the config, enable the shape range collection capability c++ interface `config. CollectShapeRangeInfo("shape_range.pbtxt")` or python interface `config. collect_shape_range_info('shape_range.pbtxt')`, to store the obtained shape range locally in prototxt format, modify the config to disable shape collection, and enable tensorrt and dynamic shape capability, c++ interface `config. EnableTunedTensorRtDynamicShape("shape_range.pbtxt", true)` or python interface `config.enable_tuned_tensorrt_dynamic_shape('shape_range.pbtxt', True)`. Thus, run run directly.
+### **(3) Performance optimization**
 
+#### **Distributed Training**
 
-- Add native support for Ascend series hardware
-  - sub-graphs are accessed to ascend310 hardware [#35226](https://github.com/PaddlePaddle/Paddle/pull/35226) by supporting Paddle-Lite NNAdapter. For the example, see the [demo](https://github.com/PaddlePaddle/Paddle-Inference-Demo/tree/master/c%2B%2B/ascend310_lite_subgraph/image_classification_demo).
-  - New Ascend 910 inference support [#34101](https://github.com/PaddlePaddle/Paddle/pull/34101)
-- Add pool3d OP to support for TensorRT. ([#36545](https://github.com/PaddlePaddle/Paddle/pull/36545))
+- Hybrid parallel optimizer `sharding_optimizer` supports `optimize_cast` optimization, which move the parameter cast during forward and backwark stage to the optimizer stage. This improves performance by 7%. ([#35878](https://github.com/PaddlePaddle/Paddle/pull/35878))
+  
+- GPUPS optimization: support for gradient fuse allreduce training. This improves training performance by 20%. ([#35131](https://github.com/PaddlePaddle/Paddle/pull/35131))
+  
+- GPUPS optimization: dump CPU optimization speed improves by 3.21x. ([#40068](https://github.com/PaddlePaddle/Paddle/pull/40068))
+  
+- CPU parameter server streaming training optimization: support for automatic statistics of sparse parameter statistics, incremental saving of sparse parameters, etc. The training performance improves by 20%. ([#36465](https://github.com/PaddlePaddle/Paddle/pull/36465), [#36601](https://github.com/PaddlePaddle/Paddle/pull/36601), [#36734](https://github.com/PaddlePaddle/Paddle/pull/36734), [#36909](https://github.com/PaddlePaddle/Paddle/pull/36909), [#36943](https://github.com/PaddlePaddle/Paddle/pull/36943), [#37181](https://github.com/PaddlePaddle/Paddle/pull/37181), [#37194](https://github.com/PaddlePaddle/Paddle/pull/37194), [#37515](https://github.com/PaddlePaddle/Paddle/pull/37515), [#37626](https://github.com/PaddlePaddle/Paddle/pull/37626), [#37995](https://github.com/PaddlePaddle/Paddle/pull/37995), [#38582](https://github.com/PaddlePaddle/Paddle/pull/38582), [#39250](https://github.com/PaddlePaddle/Paddle/pull/39250), [#40762](https://github.com/PaddlePaddle/Paddle/pull/40762), [#41234](https://github.com/PaddlePaddle/Paddle/pull/41234), [#41320](https://github.com/PaddlePaddle/Paddle/pull/41320), [#41400](https://github.com/PaddlePaddle/Paddle/pull/41400))
+  
 
-### **(2) Function optimization**
+#### **Operator Optimization**
 
-#### **Framework and API updates**
+- Optimize `FasterTokenizer` performance, with a 10% performance improvement compared to pre-optimization. ([#36701](https://github.com/PaddlePaddle/Paddle/pull/36701))
+  
+- Optimize `index_select` inverse computation, with 3.7~25.2x performance improvement over pre-optimization. ([#37055](https://github.com/PaddlePaddle/Paddle/pull/37055))
+  
+- Optimize the performance of `paddle.nn.ClipByGlobalNorm` . Take 10*10 `paddle.nn.Linear` as an example. In contrast to pre-optimization, the performance improves by about 30%. ([#38209](https://github.com/PaddlePaddle/Paddle/pull/38209))
+  
+- Optimize the performance of `pnorm` with very large or very small `axis` dimensions, with 31-96x improvement in forward speed and 1.1-19x improvement in backward speed. ([#37685](https://github.com/PaddlePaddle/Paddle/pull/37685), [#38215](https://github.com/PaddlePaddle/Paddle/pull/38215), [#39011](https://github.com/PaddlePaddle/Paddle/pull/39011))
+  
+- Optimize `softmax` forward and backward performance, with a speedup ratio of about 2x for the `axis!=-1` configuration. ([#38602](https://github.com/PaddlePaddle/Paddle/pull/38602), [#38609](https://github.com/PaddlePaddle/Paddle/pull/38609), [#32387](https://github.com/PaddlePaddle/Paddle/pull/32387), [#37927](https://github.com/PaddlePaddle/Paddle/pull/37927/files))
+  
+- Optimize `log_softmax` forward and backward performance, with a speedup ratio of about 6x to 20x for `axis!=-1` configurations. ([#38992](https://github.com/PaddlePaddle/Paddle/pull/38992), [#40612](https://github.com/PaddlePaddle/Paddle/pull/40612))
+  
+- Optimize `softmax_with_cross_entropy` forward and backward performance, with a speedup ratio of about 1.3x for the `hard_label` configuration. ([#39553](https://github.com/PaddlePaddle/Paddle/pull/39553), [#40424](https://github.com/PaddlePaddle/Paddle/pull/40424), [#40643](https://github.com/PaddlePaddle/Paddle/pull/40643))
+  
+- Optimize `top_k` performance, with a speedup ratio of more than 22x for one-dimension and larger `k` (k=5000) configuration. ([#40941](https://github.com/PaddlePaddle/Paddle/pull/40941))
+  
+- Optimize `elementwise_mul` backward computation, with 1.85~12.16x performance improvement over pre-optimization. ([#37728](https://github.com/PaddlePaddle/Paddle/pull/37728))
+  
+- Optimize `elementwise_min` and `elementwise_max` backward computation, to equalize or improve performance by 1.05x to 18.75x over pre-optimization. ([#38236](https://github.com/PaddlePaddle/Paddle/pull/38236), [#37906](https://github.com/PaddlePaddle/Paddle/pull/37906))
+  
+- Optimize `nearest_interp` forward and backward computation, with forward performance improvement by 1.5x to 2.3x over pre-optimization, and backward performance improvement by 60% to 1.8x over pre-optimization. ([#38528](https://github.com/PaddlePaddle/Paddle/pull/38528), [#39067](https://github.com/PaddlePaddle/Paddle/pull/39067))
+  
+- Optimize `bilinear_interp` forward and backward computation, with forward performance improvement by 0.4x to 2.3x over pre-optimization, and backward performance improvement by 10%-30% over pre-optimization. ([#39243](https://github.com/PaddlePaddle/Paddle/pull/39243), [#39423](https://github.com/PaddlePaddle/Paddle/pull/39423))
+  
+- Optimize `dropout` forward and backward computation, with performance improvement by about 20%. ([#39795](https://github.com/PaddlePaddle/Paddle/pull/39795), [#38859](https://github.com/PaddlePaddle/Paddle/pull/38859), [#38279](https://github.com/PaddlePaddle/Paddle/pull/38279), [#40053](https://github.com/PaddlePaddle/Paddle/pull/40053))
+  
+- Optimize `grid_sampler` forward and backward computation, with forward performance improvement by 10% to 30% over pre-optimization, and backward performance improvement by 10% to 60% over pre-optimization. ([#39751](https://github.com/PaddlePaddle/Paddle/pull/39751))
+  
+- Optimize `group_norm` forward and backward computation, with the forward performance improvement by 1.04x to 2.35x, and backward performance improvement by 1.12x to 1.18x. ([#39944](https://github.com/PaddlePaddle/Paddle/pull/39944), [#40657](https://github.com/PaddlePaddle/Paddle/pull/40657), [#39596](https://github.com/PaddlePaddle/Paddle/pull/39596))
+  
+- Optimize `conv1d` forward and backward computation, with the forward performance improvement by 1.00x to 2.01x, and backward performance improvement by 1.01x to 474.56x. ([#38425](https://github.com/PaddlePaddle/Paddle/pull/38425))
+  
+- Optimize `elementwise_div` backward computation, with the backward performance improvement by 1.02x to 29.25x. ([#38044](https://github.com/PaddlePaddle/Paddle/pull/38044))
+  
+- Optimize `gelu` forward and backward computation, with the backward performance improvement by 1.13x to 1.43x, and reverse performance improvement by 1.10x to 1.55x. ([#38188](https://github.com/PaddlePaddle/Paddle/pull/38188), [#38263](https://github.com/PaddlePaddle/Paddle/pull/38263))
+  
+- Optimize `elementwise_sub` backward computation, with the backward performance improvement by 1.04x to 15.64x. ([#37754](https://github.com/PaddlePaddle/Paddle/pull/37754))
+  
+- Optimize `flip's` forward performance on one-dimensional data input, with the performance improvement by 100%. ([#37825](https://github.com/PaddlePaddle/Paddle/pull/37825))
+  
+- Optimize `layer_norm` forward and backward computation, with the forward performance improvement by 2x to 5x over pre-optimization, and backward performance improvement by 20% to 50% over pre-optimization. ([#39167](https://github.com/PaddlePaddle/Paddle/pull/39167), [#39247](https://github.com/PaddlePaddle/Paddle/pull/39247))
+  
+- Optimize `embedding` forward and backward computation, with a maximum improvement of 1.51x in forward performance and 1.03x to 7.79x in backward performance. ([#39856](https://github.com/PaddlePaddle/Paddle/pull/39856), [#39886](https://github.com/PaddlePaddle/Paddle/pull/398866))
+  
+- Optimize `gelu` FP16 forward and backward calculations, with forward performance improvement by 9% to 12% over pre-optimization, and backward performance improvement by 2% to 9% over pre-optimization. ([#38980](https://github.com/PaddlePaddle/Paddle/pull/38980))
+  
+- Remove CPU -> GPU explicit data transfer operation in `gather_nd` forward and backward operators, and remove the explicit synchronous operation in `index_select` forward and backward operators. Change GPU -> GPU data transfer in `scatter_nd` from synchronous operation to asynchronous operation. ([#40933](https://github.com/PaddlePaddle/Paddle/pull/40933))
+  
+- Optimize `Lars optimzier` computation, with the training performance improvement of Resnet50 PF16 model by 5.1% over pre-optimization. ([#35652](https://github.com/PaddlePaddle/Paddle/pull/35652), [#35476](https://github.com/PaddlePaddle/Paddle/pull/35476))
+  
+- Optimize `AvgPool2dGrad` computation, with the performance improvement by 2.6x over pre-optimization. ([#35389](https://github.com/PaddlePaddle/Paddle/pull/35389))
+  
+- Optimize `Elementwise` computation for multivariate output, improving performance by up to 15% over pre-optimization. （[#38329](https://github.com/PaddlePaddle/Paddle/pull/38329), [#38410](https://github.com/PaddlePaddle/Paddle/pull/38410)）
+  
+- Optimize `Categorical`the probs computation, simplify the computation logic, and improve the performance by 4x to 5x. ([#42178](https://github.com/PaddlePaddle/Paddle/pull/42178))
+  
 
-- Quantification support
-  - Refactor dynamic graph quantization inference pass, to support non-analog quantization OP and analog quantization OP. ([#35907](https://github.com/PaddlePaddle/Paddle/pull/35907))
-  - Add int8 for analog quantized OP matmul (the case where weights are multiplied by tensor).  ([#34359](https://github.com/PaddlePaddle/Paddle/pull/34359))
-  - Fix a bug that MobileNetV3 model "Loss” out of NAN during quantization training due to the quantization parameter being 0.  ([#36763](https://github.com/PaddlePaddle/Paddle/pull/36763))
+### **(4) Bug fixing**
 
-- API enhancements
-  - Refactor GO API based on new version of CAPI, [#33113](https://github.com/PaddlePaddle/Paddle/pull/33113). For the example, see the [demo](https://github.com/PaddlePaddle/Paddle-Inference-Demo/tree/master/go/resnet50).
-  - Predict python api `copy_from_cpu` and `copy_to_cpu` interfaces to support float16 data types . ([#34676](https://github.com/PaddlePaddle/Paddle/pull/34676))
-  - Add `config.Summary()` interface to print config configuration information. ([#34122](https://github.com/PaddlePaddle/Paddle/pull/34122))
-  - In the prediction library `version.txt`, record trt version information patch, e.g., v7.2.3.4 instead of v7. ( [#33690](https://github.com/PaddlePaddle/Paddle/pull/33690))
+#### API
 
-- Library volume compression
-  - In the Linux, the volume of the prediction library is pruned by strip, and the volume is compressed by 30m. ([#34895](https://github.com/PaddlePaddle/Paddle/pull/34895))
+- Fix the output type error with `paddle.sum` when the input parameter type and output parameter type do not match and the number of reduce elements on the `axis` is 1. ([#36123](https://github.com/PaddlePaddle/Paddle/pull/36123))
+  
+- Fix an `AttributeError` in `paddle.flops` when the layer output type is tuple. ([#38850](https://github.com/PaddlePaddle/Paddle/pull/38850))
+  
+- Fix the `paddle.diag` failing to propagate gradients because there is no backward kernel. ([#40447](https://github.com/PaddlePaddle/Paddle/pull/40447))
+  
+- Fix an error in sorting `paddle.sort` input with NaN values. ([#41070](https://github.com/PaddlePaddle/Paddle/pull/41070))
+  
+- Fix the error when`paddle.full_like`'s input contains INF value. ([#40232](https://github.com/PaddlePaddle/Paddle/pull/40232))
+  
+- Fix the bug in `paddle.strided_slice`: strided_slice result does not consistent with slice when the data in the input of starts is less than -rank. ([#39066](https://github.com/PaddlePaddle/Paddle/pull/39066))
+  
+- Fix the bug in the `max_pool` family of operators where infer_shape is calculated incorrectly when index is returned. This affects the APIs: `paddle.nn.functional.max_pool1d/2d/3d`, `paddle.nn.functional.adaptive_max_pool1d/2d/3d`, `paddle.nn.MaxPool1D/2D/3D`, `paddle.nn.AdaptiveMaxPool1D/2D/3D`. ([#40139](https://github.com/PaddlePaddle/Paddle/pull/40139))
+  
+- Fix an issue where the dtype of pooling_mask returned by the `max_pool` family of operators is incorrect. Now the dtype of pooling_mask is int32. The affected APIs are `paddle.nn.functional.max_pool1d/2d/3d`, `paddle.nn.functional.adaptive_max_pool1d/2d/3d`, `paddle.nn.MaxPool1D/2D/3D`, `paddle.nn.AdaptiveMaxPool1D/2D/3D`. ([#39314](https://github.com/PaddlePaddle/Paddle/pull/39314) )
+  
+- Fix the bug with `paddle.shape` where the backward gradient by default causes a computation error. ([#37340](https://github.com/PaddlePaddle/Paddle/pull/37340))
+  
+- Fix the bug in `paddle.nn.Layer's` `to` method when converting both dtype and place at the same time. ([#37007](https://github.com/PaddlePaddle/Paddle/pull/38007))
+  
+- Fix the bug that `paddle.amp.decorate` fails to rewrite the parameters of non-leaf network layers to FP16. ([#38402](https://github.com/PaddlePaddle/Paddle/pull/38402))
+  
+- Fix the bug that the `paddle.amp.decorate` rewrites the non-input parameter in `paddle.nn.BatchNorm1D`, `paddle.nn.BatchNorm2D`, and `paddle.nn.BatchNorm3D` to FP16. ([#38541](https://github.com/PaddlePaddle/Paddle/pull/38541))
+  
+- Fix the bug that the `paddle.amp.decorate` rewrites the non-input parameter in `paddle.nn.SyncBatchNorm` to FP16. ([#40943](https://github.com/PaddlePaddle/Paddle/pull/40943))
+  
+- Fix redundant warnings in `paddle.nn.Layer.to`. ([#36700](https://github.com/PaddlePaddle/Paddle/pull/36700))
+  
+- Fix the bug in `paddle.nn.RNN` when being used inside control flow. ([#41162](https://github.com/PaddlePaddle/Paddle/pull/41162))
+  
+- Fix the bug that the `paddle.to_tensor` fails to specify the CUDAPlace of the Tensor. ([#39662](https://github.com/PaddlePaddle/Paddle/pull/39662))
+  
+- Fix the issue that`paddle.nn.Identity` is not exposed. ([#39615](https://github.com/PaddlePaddle/Paddle/pull/39615))
+  
+- Fix the bug where the output values of the `fill_` and `zero_` inplace APIs are incorrect when the input is on a CUDAPinned Place after dynamic graph reconstruction. ([#41229](https://github.com/PaddlePaddle/Paddle/pull/41229))
+  
+- After refactoring the dynamic graph, fix the bug of incorrect inplace version value of the output Tensor when calling assign op using the append op. Change it to call assign op using the `_C_ops`. ([#41118](https://github.com/PaddlePaddle/Paddle/pull/41118))
+  
+- Remove unreasonable codes in the `elementwise_add` 's third-order kernel, and fix an uninitialized issue in the network creation process. ([#36618](https://github.com/PaddlePaddle/Paddle/pull/36618))
+  
+- Fix the missing attribute bug in `conv2d` execution of cuDNN Kernel. ([#38827](https://github.com/PaddlePaddle/Paddle/pull/38827))
+  
+- Fix an issue where `multiclass_nms3` output shape is incorrect. ([#40059](https://github.com/PaddlePaddle/Paddle/pull/40059))
+  
+- Fix an issue with `yolo_box` outputting incorrect shape. ([#40056](https://github.com/PaddlePaddle/Paddle/pull/40056))
+  
+- Fix an issue where the higher-order differentiation `gradients` interface does not take effect as expected when target_grad is specified. ([#40940](https://github.com/PaddlePaddle/Paddle/pull/40940/))
+  
+- Fix an issue that the network parameter type is incorrect when the default_dtype is modified in the op`_BatchNormBase` base class in the dynamic graph mode. The affected APIs are `paddle.nn.BatchNorm1D`，`paddle.nn.BatchNorm2D`，`paddle.nn.BatchNorm3D`， and `paddle.nn.SyncBatchNorm`. Specific reason: when `get_default_dtype() == 'float16'`, the default parameter data type is modified by `set_default_dtype('float32')` . The parameter type in dynamic graph mode is created by default_dtype; therefore, the change of the default parameter type causes the subsequent networking Parameter type error. ([#36376](https://github.com/PaddlePaddle/Paddle/pull/36376))
+  
+- Fix the bug of the undefined intermediate variable in the backward op in batchnorm op in case that the data type is FP32 and the data dimension is `dims = 2 and data_layout = NHWC`. ([#37020](https://github.com/PaddlePaddle/Paddle/pull/37020))
+  
+- Fix the bug that shape of weights is incorrect, when using`paddle.static.nn.prelu` in static graph mode, and input format is`NHWC`, `mode==channel`. ([#38310](https://github.com/PaddlePaddle/Paddle/pull/38310))
+  
+- Fix the bug of `paddle.nn.functional.class_center_sample`: CUDA seed setting issue in multi-machine case. ([#38815](https://github.com/PaddlePaddle/Paddle/pull/38815))
+  
+- Fix the bug of failing to report error when the input of`paddle.nn.functional.one_hot`is incorrect. ([#41335](https://github.com/PaddlePaddle/Paddle/pull/41335))
+  
+- Fix an issue where a callback to reclaim device memory on a DCU device is not triggered in time, resulting in an OOM of the device memory. ([#40445](https://github.com/PaddlePaddle/Paddle/pull/40445))
+  
+- Fix the bugs of `setitem` backward gradient abnormal and inplace logic handling abnormal in some dynamic graph scenarios. ([#37023](https://github.com/PaddlePaddle/Paddle/pull/37023), [#38298](https://github.com/PaddlePaddle/Paddle/pull/38298))
+  
+- Fix the bug of index abnormal when Tensor array uses the Slice to index in the dynamic to static scenarios. ([#39251](https://github.com/PaddlePaddle/Paddle/pull/39251))
+  
+- Fix the bug of memory or device memory leaks caused by some temporary variables not being correctly destructed when `paddle.Tensor.register_hook` interface is used. ([#40716](https://github.com/PaddlePaddle/Paddle/pull/40716))
+  
+- Fix the bug that `Tensor.getitem` cannot get the value when the index is a bool Tensor with all False. ([#41297](https://github.com/PaddlePaddle/Paddle/pull/41297))
+  
+- Fix the bug that `Tensor.getitem` cannot get the value when the index is a bool scalar Tensor. ([#40829](https://github.com/PaddlePaddle/Paddle/pull/40829))
+  
+- Fix the bug in `paddle.index_select` when index is a 0-shape Tensor. ([#41383](https://github.com/PaddlePaddle/Paddle/pull/41383))
+  
+- Fix the bug when the number of GPU threads requested by `paddle.index_select` and `paddle.index_sample` exceeds the limited machine resources. ([#41127](https://github.com/PaddlePaddle/Paddle/pull/41127), [#37816](https://github.com/PaddlePaddle/Paddle/pull/37816), [#39736](https://github.com/PaddlePaddle/Paddle/pull/39736), [#41563](https://github.com/PaddlePaddle/Paddle/pull/41563))
+  
+- Fix the bug when ReduceConfig, elemwise_grad, gather, gather_nd, and scatter ops request more GPU threads than the limited machine resources. ([#40813](https://github.com/PaddlePaddle/Paddle/pull/40813), [#41127](https://github.com/PaddlePaddle/Paddle/pull/41127))
+  
+- Fix the bug that the memory access is out of boundary when NX ! = 1 in ReadData, ReadDataBc, and ReadDataReduce in Kernel Primitive API. ([#36373](https://github.com/PaddlePaddle/Paddle/pull/36373))
+  
+- Fix the bug of the computation result abnormal due to data overflow caused by the IndexRandom data type error. ([#39867](https://github.com/PaddlePaddle/Paddle/pull/39867), [#39891](https://github.com/PaddlePaddle/Paddle/pull/39891))
+  
+- Fix the bug of the returned computing result error of reduce op when reduce_num = 1. ([#38771](https://github.com/PaddlePaddle/Paddle/pull/38771))
+  
+- Fix the bug of the memory access out-of-bound of reduce op in the middle dimension of reduce in HIP environments. ([#41273](https://github.com/PaddlePaddle/Paddle/pull/41273))
+  
+- Fix the bug of Kernel failed to properly release in the computation of two FP16 one-dimensional vectors of matmul op.
+  
+- Fix the bug caused by CUDA integer computation overflow for some operators, including: bernoulli, gaussian_random, gumbel_softmax, multinomial, truncated_gaussian_random, uniform_ random_inplace, and uniform_random ops. ([#37670](https://github.com/PaddlePaddle/Paddle/pull/37670))
+  
+- Fix the bug where `paddle.nn.Sequential` reports a KeyError error when traversing sublayers in a for loop. ([#39372](https://github.com/PaddlePaddle/Paddle/pull/39372))
+  
+- Fix the bug of the check shape error in `paddle.nn.functional.unfold` when compiling in static graphs. ([#38907](https://github.com/PaddlePaddle/Paddle/pull/38907), [#38819](https://github.com/PaddlePaddle/Paddle/pull/38819))
+  
+- Fix the bug of reporting an error if `axis` is specified when using dropout for static graphs. ([#37223](https://github.com/PaddlePaddle/Paddle/pull/37223))
+  
+- Migrate the matmul operator in the `paddle.nn.MultiHeadAttention` to the matmul_v2 operator. ([#36222](https://github.com/PaddlePaddle/Paddle/pull/36222))
+  
+- Fix the bug occurred in throwing FPE when the empty Tensor is used in `paddle.nn.functional.label_smooth`. ([#35861](https://github.com/PaddlePaddle/Paddle/pull/35861)）
+  
+- Fix the deformation bug of reshape op when input is an empty Tensor. Support the empty Tensor rehape to [-1]. ([#36087](https://github.com/PaddlePaddle/Paddle/pull/36087))
+  
+- Fix the bug of the modified values will incorrectly override other rows when the `fill_diagonal` 's input parameter offset is non-zero. ([#36212](https://github.com/PaddlePaddle/Paddle/pull/36212))
+  
+- Modify stop_gradient returned by the range op bing set to True in dynamic graph mode. ([#37486](https://github.com/PaddlePaddle/Paddle/pull/37486))
+  
+- Fix the bug where Lamb optimizer is updated incorrectly when Beta1Pow and Beta2Pow are on the GPU. ([#38518](https://github.com/PaddlePaddle/Paddle/pull/38518))
+  
+- Fix the bug where the conv2d operator doesn't respect to FLAGS_cudnn_deterministic. ([#37173](https://github.com/PaddlePaddle/Paddle/pull/37173))
+  
+- Fix the bug caused by an earlier version of cufft that does not define CUFFT_VERSION. ([#37312](https://github.com/PaddlePaddle/Paddle/pull/37312))
+  
+- Fix the computing error of `paddle.ifftshit` and `paddle.fftshift`. ([#36834](https://github.com/PaddlePaddle/Paddle/pull/36834), [#36748](https://github.com/PaddlePaddle/Paddle/pull/36748))
+  
+- Fix the `axis` computation error in `paddle.fft` series of APIs. ([#36321](https://github.com/PaddlePaddle/Paddle/pull/36321))
+  
 
-- Other updates
-  - Add the helper tool to catch runtime exceptions and convert them to the appropriate error state.  ([#35624](https://github.com/PaddlePaddle/Paddle/pull/35624))
-  - Add the related base data structure to enhance the accuracy of the PaddlePaddle operator definition. ([#33098](https://github.com/PaddlePaddle/Paddle/pull/33098))
+#### IR(Intermediate Representation)
 
-#### **Back-end capability enhancement**
+- Dynamic to static graphs
+  
+  - Fix a type derivation error in reverse gradient accumulation when the `tensor_array` is used with the control flow. ([#39585](https://github.com/PaddlePaddle/Paddle/pull/39585), [#39689](https://github.com/PaddlePaddle/Paddle/pull/39689))
+    
+  - Fix an issue where the parameter gradient type is not set correctly during dynamic to static AMP training. ([#40938](https://github.com/PaddlePaddle/Paddle/pull/40938))
+    
+  - Fix an issue of reporting an error in the dynamic to static transcription when there are misplaced annotations in the codes. ([#39035](https://github.com/PaddlePaddle/Paddle/pull/39035), [#38003](https://github.com/PaddlePaddle/Paddle/pull/38003))
+    
+  - Fix an issue where Tensor is not properly converted to Variable when calling a non-forward function in dynamic to static codes. ([#37296](https://github.com/PaddlePaddle/Paddle/pull/37296), [#38540](https://github.com/PaddlePaddle/Paddle/pull/38540))
+    
+  - Fix an issue where `paddle` is incorrectly passed as a variable when dynamic to static transcription. ([#37999](https://github.com/PaddlePaddle/Paddle/pull/37999))
+    
+  - Fix an issue where model parameters are incorrectly counted when calling `paddle.flops` after model dynamic to static conversion. ([#36852](https://github.com/PaddlePaddle/Paddle/pull/36852))
+    
+  - Fix an issue where GPU memory will keep growing in train mode and no_grad contexts after loading models using the `paddle.jit.save/load` interface. ([#36434](https://github.com/PaddlePaddle/Paddle/pull/36434))
+    
+  - Add warning in function of convert_call when converting the generator function. ([#35369](https://github.com/PaddlePaddle/Paddle/pull/35369))
+    
+  - Fix the run_program op dependency analysis bug. ([#38470](https://github.com/PaddlePaddle/Paddle/pull/38470))
+    
+  - Fix the code conversion bug when returning a single value in control flow For. ([#40683](https://github.com/PaddlePaddle/Paddle/pull/40683))
+    
+  - Fix the bug when generating a reverse op when the input to conditional_block op contains LoDTensorArray. ([#39585](https://github.com/PaddlePaddle/Paddle/pull/39585))
+    
+
+#### **Distributed Training**
+
+- Distributed training basic functions
+  
+  - Fix the bug of a port reporting error in the distributed multi-machine training. ([#37274](https://github.com/PaddlePaddle/Paddle/pull/37274))
+    
+  - Fix the brpc compilation dependency bug. ([#37064](https://github.com/PaddlePaddle/Paddle/pull/37064))
+    
+  - Fix an occupied port issue due to tcp self-connections when Fleet starts. ([#38174](https://github.com/PaddlePaddle/Paddle/pull/38174))
+    
+  - Fix the precision degradation bug under data parallel due to inconsistent initialization of FP16 parameters under multiple cards. ([#38838](https://github.com/PaddlePaddle/Paddle/pull/38838), [#38563](https://github.com/PaddlePaddle/Paddle/pull/38563), [#38405](https://github.com/PaddlePaddle/Paddle/pull/38405))
+    
+  - Fix the precision degradation under data parallel due to FP16 gradient synchronization without dividing by the number of cards. ([#38378](https://github.com/PaddlePaddle/Paddle/pull/38378))
+    
+- Dynamic graph mixing parallel
+  
+  - Fix the bug where parameters are not updated in FP16 mode under mixed parallel by using the new update interface. ([#36017](https://github.com/PaddlePaddle/Paddle/pull/36017))
+- Static graph mixing parallel
+  
+  - Fix an issue where grad merge is not compatible with ClipGradientByGlobalNorm in distributed dp mode. ([#36334](https://github.com/PaddlePaddle/Paddle/pull/36334))
+    
+  - Fix an issue under hybrid parallelism where the non-distributed parameters of tensor model parallelism are not broadcast during the initialization phase, resulting in inconsistent non-distributed parameters across cards. ([#36186](https://github.com/PaddlePaddle/Paddle/pull/36186))
+    
+  - Fix the issue that sharding's save_persistables interface does not save FP16 parameters and offload persistent variables when sharding is enabled with offload. ([#40477](https://github.com/PaddlePaddle/Paddle/pull/40477))
+    
+  - Fix the bug where ema parameters are not saved on non-0 cards when sharding is enabled for training. ([#39860](https://github.com/PaddlePaddle/Paddle/pull/39860))
+    
+  - Fix an issue where FC incorrectly calculates gradients according to column cuts. ([#38724](https://github.com/PaddlePaddle/Paddle/pull/38724))
+    
+  - Fix the bug reported when DistributedStrategy is set to without_graph_optimizer when used with rnn. ([#36176](https://github.com/PaddlePaddle/Paddle/pull/36176))
+    
+- GPUPS Parameter Server Training
+  
+  - Fix the CPU branch compilation bug triggered by the GPUPS macro definition. ([#37248](https://github.com/PaddlePaddle/Paddle/pull/37248))
+    
+  - Fix an occasional error raised when saving delta and pullsparse concurrency during GPUPS streamline training. ([#37233](https://github.com/PaddlePaddle/Paddle/pull/37233))
+    
+  - Fix a download error issue caused by HDFSClient querying a directory without returning the full path. ([#36590](https://github.com/PaddlePaddle/Paddle/pull/36590))
+    
+  - Fix the bug with pulling old parameters in GPUPS streamline training. ([#36512](https://github.com/PaddlePaddle/Paddle/pull/36512))
+    
+  - Fix a GPUPS multi-stream allocation issue. ([#37476](https://github.com/PaddlePaddle/Paddle/pull/37476))
+    
+  - Fix the bug of the GPUPS pybind out of core. ([#37287](https://github.com/PaddlePaddle/Paddle/pull/37287))
+    
+
+#### **Other**
+
+- Fix the clip_extra issue when saving models for dynamic graph quantization training. ([#38323](https://github.com/PaddlePaddle/Paddle/pull/38323))
+  
+- Fix an issue with abs_max scale initialization for dynamic graph quantization training. ([#39307](https://github.com/PaddlePaddle/Paddle/pull/39307))
+  
+- Fix an issue of exceptions in saving model in dynamic graph quantization training. ([#38102](https://github.com/PaddlePaddle/Paddle/pull/38102), [#38012](https://github.com/PaddlePaddle/Paddle/pull/38012))
+  
+- Fix the offline quantization flatten op output error. ([#37722](https://github.com/PaddlePaddle/Paddle/pull/37722))
+  
+- Fix the non-matching dimension bug in case of inverse quantization matmul op. ([#36982](https://github.com/PaddlePaddle/Paddle/pull/36982))
+  
+- Fix the bug of adding quantization op when quantizing matmul_v2 without weights. ([#36593](https://github.com/PaddlePaddle/Paddle/pull/36593))
+  
+- Fix the error of saving the quant_axis attribute in the conv op channel-wise quantization when saving the models. ([#39054](https://github.com/PaddlePaddle/Paddle/pull/39054))
+  
+- Fix the slow training of channel-wise quantization. ([#40772](https://github.com/PaddlePaddle/Paddle/pull/40772))
+  
+- Fix the bug of quantization training when dividing by tensor(initialized as 0) leads to nan. ([#36762](https://github.com/PaddlePaddle/Paddle/pull/36762))
+  
+- Fix incorrect settings of amp_level for mixed precision in multi-threaded scenarios. ([#39198](https://github.com/PaddlePaddle/Paddle/pull/39198))
+  
+- Fix an issue where PyLayer and Recompute is not set mixed precision correctly when mixed precision training is used with PyLayer and Recompute. ([#39950](https://github.com/PaddlePaddle/Paddle/pull/39950), [#40042](https://github.com/PaddlePaddle/Paddle/pull/40042))
+  
+- Fix an issue where `D_GLIBCXX_USE_CXX11_ABI` does not take effect when compiling custom operators under Mac. ([#37878](https://github.com/PaddlePaddle/Paddle/pull/37878))
+  
+- Fix the bug of inconsistent dynamic and static behaviors in case of block=None the initializer-related API. ([#37827](https://github.com/PaddlePaddle/Paddle/pull/37827))
+  
+- Fix the bug in python 3.6 where there is no fluid module. ([#35862](https://github.com/PaddlePaddle/Paddle/pull/35862))
+  
+- Fix the bug where optimizer `paddle.optimizer.Adamw` incorrectly calls adam op. ([#36028](https://github.com/PaddlePaddle/Paddle/pull/36028))
+  
+- Fix a logic error when the `paddle.optimizer.Momentum` optimizer parameter `regularizer` property is None under the multi tensor policy. ([#38344](https://github.com/PaddlePaddle/Paddle/pull/38344))
+  
+- Fix the bug that the `paddle.optimizer.Momentum` and `paddle.optimizer.Adam` optimizers modify the `multi_precision` property under the multi tensor policy. ([#38991](https://github.com/PaddlePaddle/Paddle/pull/38991))
+  
+- Fix the code compilation error when using final-state API amp in combination with optional Tensor. ([#40980](https://github.com/PaddlePaddle/Paddle/pull/40980))
+  
+- Fix the bug where paddle+lite+xpu prediction library would report an error when calling lite CPU prediction, and fix the bug where paddle+lite(without NNAdapter) would report an error when compiling. ([#37449](https://github.com/PaddlePaddle/Paddle/pull/37449))
+  
+- Fix the bug in Debug compile mode where LoDTensorArray crashes due to inconsistent Pybind11 bindings. ([#37954](https://github.com/PaddlePaddle/Paddle/pull/37954))
+  
+- Fix the bug that prevents correct construction of Tensor in the extreme case where the shape parameter is a list of Tensor mix with int. ([#38284](https://github.com/PaddlePaddle/Paddle/pull/38284))
+  
+- Fix a compatibility issue with the `paddle.optimizer.AdamW` API. ([#37905](https://github.com/PaddlePaddle/Paddle/pull/37905))
+  
+- Fix the bug in _InstanceNormBase where the returne value of extra_repr is incorrect. ([#38537](https://github.com/PaddlePaddle/Paddle/pull/38537))
+  
+- Fix the bug that the Paddle Inference lacks of the symbol `paddle::distributed::TensorTable` when the -DWITH_DISTRIBUTED is uesd. ([#41128](https://github.com/PaddlePaddle/Paddle/pull/41128))
+  
+- matmul_v2 op reports error when there is a 0 value in the shape. ([#35791](https://github.com/PaddlePaddle/Paddle/pull/35791))
+  
+- Fix the problem of the repeated printing for no gradient input hint message of the recomputed in dynamic graphs. Change it to the printing only once with using warning. ([#38293](https://github.com/PaddlePaddle/Paddle/pull/38293))
+  
+- Fix the low accuracy bug on the validation set in later epoch training in visual models in the gelu op. ([#38450](https://github.com/PaddlePaddle/Paddle/pull/38450))
+  
+- Fix adamw op error in numerical computation. ([#37746](https://github.com/PaddlePaddle/Paddle/pull/37746))
+  
+- Add the parameters in the sparse_momentum `_C_ops` interface. ([#39969](https://github.com/PaddlePaddle/Paddle/pull/39969))
+  
+- Fix the bug where there is no `distributed` module in python 3.6. ([#35848](https://github.com/PaddlePaddle/Paddle/pull/35848))
+  
+- Fix the eigh unit test data initialization problem. ([#39568](https://github.com/PaddlePaddle/Paddle/pull/39568))
+  
+- Fix the eigvalsh unit test data initialization problem. ([#39841](https://github.com/PaddlePaddle/Paddle/pull/39841))
+  
+- Fix the bug of not working properly due to excessive register usage on V100 by segment op. ([#38113](https://github.com/PaddlePaddle/Paddle/pull/38113))
+  
+- Fix the bug with conv-related op sparsification incorrectly set dimension. ([#36054](https://github.com/PaddlePaddle/Paddle/pull/36054))
+  
+- Provide Automatic SParsity training for static graph-related function Alias to `Paddle.static.sparsity` . ([#36525](https://github.com/PaddlePaddle/Paddle/pull/36525))
+  
+- Fix the bug where divide op’s integer division is still an integer. ([#40890](https://github.com/PaddlePaddle/Paddle/pull/40890))
+  
+- Fix the crash bug of`paddle.multiplex` when input Tensor value is 0. ([#34972](https://github.com/PaddlePaddle/Paddle/pull/34972))
+  
+- Fix a speed exception for set `reduction` parameter in `paddlpaddle.nn.functional.kl_div` . ([#37283](https://github.com/PaddlePaddle/Paddle/pull/37283))
+  
+- Fix the data source unsorted bug in loading the Cifar dataset. ([#37272](https://github.com/PaddlePaddle/Paddle/pull/37272))
+  
+- Fix the conversion of loss from uint16 to float in the ProgressBar class. ([#39231](https://github.com/PaddlePaddle/Paddle/pull/39231))
+  
+- Fix the ShareBufferWith shared data type problem. ([#37464](https://github.com/PaddlePaddle/Paddle/pull/37464), [#37247](https://github.com/PaddlePaddle/Paddle/pull/37247))
+  
+- Fix the performance issue when `paddle.io.DataLoader` uses IterableDataset and num_workers>0. ([#40541](https://github.com/PaddlePaddle/Paddle/pull/40541))
+  
+- Fix the bug with `paddle.vision.ops.yolo_loss` returns incomplete values in dynamic graph. ([#40185](https://github.com/PaddlePaddle/Paddle/pull/40185))
+  
+- Remove the restriction that the input parameter dataset of `paddle.io.BatchSampler` needs to be the `paddle.io.Dataset` type, to expand the support for user-defined datasets. ([#40184](https://github.com/PaddlePaddle/Paddle/pull/40184))
+  
+- Fix the bug of `paddle.summary` reporting that op_flops does not exist. ([#36489](https://github.com/PaddlePaddle/Paddle/pull/36489))
+  
+- Fix the formula error of lars_momentum op when lars_weight_decay=0. ([#40892](https://github.com/PaddlePaddle/Paddle/pull/40892))
+  
+- Fix the bug that the optimize-offload cannot save presistable var. ([#36433](https://github.com/PaddlePaddle/Paddle/pull/36433))
+  
+- Fix an issue where optimizer-offload does not support adamw op type. ([#36432](https://github.com/PaddlePaddle/Paddle/pull/36432))
+  
+- Fix an issue where enable_program_desc_tracing_data in Tracer is not safe in multi-threaded scenarios. ([#39776](https://github.com/PaddlePaddle/Paddle/pull/39776))
+  
+- Fix an issue where the model file size is not initialized when the model is read. ([#40518](https://github.com/PaddlePaddle/Paddle/pull/40518))
+  
+- Fix the logic bug of the Expand op. When the dimension of the input Tensor X is smaller than the shape to be expanded, it may result in the incorrect Out.Shape. ([#38677](https://github.com/PaddlePaddle/Paddle/pull/38677))
+  
+- Fix the dynamic to static transcription error when the Expand_As op takes only y.shape without Y variable entered. ([#38677](https://github.com/PaddlePaddle/Paddle/pull/38677))
+  
+- Fix the logic error when Expand_As op computes the output shape. ([#38677](https://github.com/PaddlePaddle/Paddle/pull/38677))
+  
+- Frame function fixing
+  
+  - Fix the bug that the variables of the `core.VarDesc.VarType.STRINGS` type report error when getting the `lod_level` property and setting its `lod_level` to None. ([#39077](https://github.com/PaddlePaddle/Paddle/pull/39077))
+    
+  - Fix an issue where the framework function `Pylayer` does not support different dtypes. ([#37974](https://github.com/PaddlePaddle/Paddle/pull/37974))
+    
+- API fixing
+  
+  - Fix the bug of division by zero of the learning rate decay API `paddle.optimizer.lr.PolynomialDecay`. ([#38782](https://github.com/PaddlePaddle/Paddle/pull/38782))
+    
+  - Fix the issue where some logs remained after calling the DisableGlogInfo() interface. ([#36356](https://github.com/PaddlePaddle/Paddle/pull/36356))
+    
+- Fix an error in backward of multi-layer RNN (when dropout is set to 0) in the training of SimpleRNN, GRU and LSTM API CPU. ([#37080](https://github.com/PaddlePaddle/Paddle/pull/37080))
+  
+- Add cache for fft on the backend of cufft and hipfft. ([#36646](https://github.com/PaddlePaddle/Paddle/pull/36646))
+  
+- Enable the shifts parameter of `paddle.roll` to support transfer in Tensor. ([#36727](https://github.com/PaddlePaddle/Paddle/pull/36727))
+  
+- Add onemkl to fft as an optional computation backend. ([#36414](https://github.com/PaddlePaddle/Paddle/pull/36414))
+  
 
-- CPU related updates
-  - Upgrade oneDNN version to 2.3.2. ( [#35040](https://github.com/PaddlePaddle/Paddle/pull/35040))
-  - Add the support of quant-aware LSTM oneDNN INT8 models. ([#35382](https://github.com/PaddlePaddle/Paddle/pull/35382))
-  - Add the support of post-training LSTM oneDNN INT8 models. ([#35334](https://github.com/PaddlePaddle/Paddle/pull/35334), [#33295](https://github.com/PaddlePaddle/Paddle/pull/33295))
-  - Add the support of fusion_gru and multi_gru fusion and post-training INT8. ([#33749](https://github.com/PaddlePaddle/Paddle/pull/33749))
-  - Optimize the cache mechanism of oneDNN. ([#35664](https://github.com/PaddlePaddle/Paddle/pull/35664),  [#35331](https://github.com/PaddlePaddle/Paddle/pull/35331), [#35132](https://github.com/PaddlePaddle/Paddle/pull/35132), [#35030](https://github.com/PaddlePaddle/Paddle/pull/35030), [#35002](https://github.com/PaddlePaddle/Paddle/pull/35002), [#34830](https://github.com/PaddlePaddle/Paddle/pull/34830), [#33515](https://github.com/PaddlePaddle/Paddle/pull/33515), [#33048](https://github.com/PaddlePaddle/Paddle/pull/33048), [#32922](https://github.com/PaddlePaddle/Paddle/pull/32922), [#32499](https://github.com/PaddlePaddle/Paddle/pull/32499))
-  - This is implemented by adding multiple op (e.g., clip, scale, etc.) of oneDNN kernel. In the ch_ppocr_mobile_v1.1_det_infer, DPN68, fastscnn, hrnet, HRNet_W18_C, icnet, Res2Net50_26w_4s, and ssdlite_mobilenet_v3_large models, the single core performance of Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz increases by 47.8% in the oneDNN enabling against disabling. ([#35601](https://github.com/PaddlePaddle/Paddle/pull/35601), [#32975](https://github.com/PaddlePaddle/Paddle/pull/32975))
-  - Optimized oneDNN LSTM INT8 model with 1.59x performance improvement on Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz single core than that of the FP32 LSTM model. ([#35382](https://github.com/PaddlePaddle/Paddle/pull/35382), [#35334](https://github.com/PaddlePaddle/Paddle/pull/35334), [#34820](https://github.com/PaddlePaddle/Paddle/pull/34820), [#34137](https://github.com/PaddlePaddle/Paddle/pull/34137))
+## **4. Deployment Direction (Paddle Inference)**
 
+### **(1) New features**
 
-- GPU and TensorRT sub-graph engine related updates
+#### **New APIs**
 
-  - Added support for TensorRT 8.0. We will drop support for TensorRT 6.x in a future release.  ([#34403](https://github.com/PaddlePaddle/Paddle/pull/34403), [#34294](https://github.com/PaddlePaddle/Paddle/pull/34294), [#34157](https://github.com/PaddlePaddle/Paddle/pull/34157), [#33777](https://github.com/PaddlePaddle/Paddle/pull/33777), [#33680](https://github.com/PaddlePaddle/Paddle/pull/33680), [#33662](https://github.com/PaddlePaddle/Paddle/pull/33662), [#33654](https://github.com/PaddlePaddle/Paddle/pull/33654))
-  - Add support for dynamic shape in the TensorRT `layer_norm` plugin. ([#33448](https://github.com/PaddlePaddle/Paddle/pull/33448))
-  - Add support for dynamic shape in TensorRT `hard_swish` plugin. ([#35214](https://github.com/PaddlePaddle/Paddle/pull/35214))
-  - Add support for TensoRT `reduce_sum` and `gather_nd`. ([#33324](https://github.com/PaddlePaddle/Paddle/pull/33324))
-  - Add support for int8 in TensorRT `qkv_context` plugin ([#34917](https://github.com/PaddlePaddle/Paddle/pull/34917), [#35504](https://github.com/PaddlePaddle/Paddle/pull/35504))
-  - Add support for TensorRT conv3d. ([#35507](https://github.com/PaddlePaddle/Paddle/pull/35507))
-  - Add support for broadcasting the input of the `multihead_matmul` fusion operator. ([#35780](https://github.com/PaddlePaddle/Paddle/pull/35780))
-  - Inference supports for TensorRT8 sparse inference, with performance improved by 10%-30% for ERNIE model with variable-length input at different batch_sizes, and performance improved by 10% for ResNeXt101_32x4d model at different batch_sizes under test environment.  ([#36659](https://github.com/PaddlePaddle/Paddle/pull/36659))
+- Add the Java API so that Java developers can implement high performance inference on the server and in the cloud through a simple and flexible interface.([#37162](https://github.com/PaddlePaddle/Paddle/pull/37162))
+  
+- Add `GetTrtCompileVersion` and `GetTrtRuntimeVersion` interfaces for getting TensorRT version information. ([#36429](https://github.com/PaddlePaddle/Paddle/pull/36429))
+  
+- Add the `ShareExternalData` interface to avoid memory copy of input data during inference. ([#39809](https://github.com/PaddlePaddle/Paddle/pull/39809))
+  
 
-- Nvidia Jetson native support enhancements
-  - Add the Op support, for the Jetson Nano/TX2, two devices with lower arithmetic power. We made targeted optimizations. Now add the support for 17 OPs such as `pool2d`, `pool_max`, `conv3d_transpose`, etc. ([#35378](https://github.com/PaddlePaddle/Paddle/pull/35378))
-  - For the Jetson Nano, we add new models: DPN68, EfficientNetB0, ttfnet, fcn_hrnetw18, hardnet. ([#35378](https://github.com/PaddlePaddle/Paddle/pull/35378))
-  - For Jetson TX2, add new models: deeplabv3p_resnet50, deeplabv3_resnet50, fcn_hrnetw18, hardnet, pspnet, ttfnet, unet. ([#35378](https://github.com/PaddlePaddle/Paddle/pull/35378))
+#### **New functions**
 
+- Add ONNX Runtime backend support. Currently it supports only CPU in the integrated version. ([#39988](https://github.com/PaddlePaddle/Paddle/pull/39988), [#40561](https://github.com/PaddlePaddle/Paddle/pull/40561))
+  
+- Add support for Ascend 310 inference based on the Paddle Lite subgraph approach. ([#35226](https://github.com/PaddlePaddle/Paddle/pull/35226))
+  
+- Add the native GPU FP16 inference. ([#40531](https://github.com/PaddlePaddle/Paddle/pull/40531))
+  
+- For the switch_ir_debug interface, add the dump model function. ([#36581](https://github.com/PaddlePaddle/Paddle/pull/36581))
+  
+- Add the configuration interface for TensorRT config: `void UpdateConfigInterleaved(paddle_infer::Config* c, bool with_interleaved)` for special data layout in int8 quantization inference. ([#38884](https://github.com/PaddlePaddle/Paddle/pull/38884))
+  
+- Add TensorRT inspector output information to the log. It is valid only for TensorRT 8.2 or later. ([#38362](https://github.com/PaddlePaddle/Paddle/pull/38362)，[#38200](https://github.com/PaddlePaddle/Paddle/pull/38200)))
+  
+- Add the support of the TensorRT ASP sparse inference. ([#36413](https://github.com/PaddlePaddle/Paddle/pull/36413))
+  
 
-- Kunlun XPU interface feature extensions
-  - Add the `set_xpu_device_id` interface to support setting the device number of the Kunlun chip in the inference ([#35572](https://github.com/PaddlePaddle/Paddle/pull/35572))
-- In Inference python `copy_from_cpu` interface, add input type check. Report errors in advance for wrong type inputs.  ([#36552](https://github.com/PaddlePaddle/Paddle/pull/36552))
+### **(2) Underlying optimization**
 
-### **(3) Troubleshooting**
+#### **CPU performance optimization**
+
+- Optimize the caching mechanism of MKLDNN. ([#38336](https://github.com/PaddlePaddle/Paddle/pull/38336), [#36980](https://github.com/PaddlePaddle/Paddle/pull/36980), [#36695](https://github.com/PaddlePaddle/Paddle/pull/36695))
+  
+- Add matmul_scale_fuse pass. ([#37962](https://github.com/PaddlePaddle/Paddle/pull/37962))
+  
+- Add MKLDNN reshape_transpose_matmul_v2_mkldnn_fuse_pass. ([#37847](https://github.com/PaddlePaddle/Paddle/pull/37847), [#40948](https://github.com/PaddlePaddle/Paddle/pull/40948))
+  
+- Add MKLDNN conv_hard_sigmoid_mkldnn_fuse_pass. ([#36869](https://github.com/PaddlePaddle/Paddle/pull/36869))
+  
+- Add MKLDNN matmul_v2_transpose_reshape_fuse_pass. ([#36481](https://github.com/PaddlePaddle/Paddle/pull/36481))
+  
+- Add MKLDNN softplus_activation_mkldnn_fuse_pass. ([#36657](https://github.com/PaddlePaddle/Paddle/pull/36657))
+  
+- Add MKLDNN elt_act_mkldnn_fuse_pass. ([#36541](https://github.com/PaddlePaddle/Paddle/pull/36541))
+  
+- Add MKLDNN mish operator and conv_mish_mkldnn_fuse_pass. ([#38623](https://github.com/PaddlePaddle/Paddle/pull/38623))
+  
+
+#### **GPU performance optimization**
+
+- Change the inference default video memory allocation policy from `naive_best_fit` to `auto_growth` , to solve the problem of some models filled up with the GPU video memory. ([#41491](https://github.com/PaddlePaddle/Paddle/pull/41491))
+  
+- Support gelu and FC+gelu ops using TensorRT inference. ([#38399](https://github.com/PaddlePaddle/Paddle/pull/38399))
+  
+- Support `deformable_conv` inference using TensorRT under static shape. ([#36612](https://github.com/PaddlePaddle/Paddle/pull/36612) [#36850](https://github.com/PaddlePaddle/Paddle/pull/36850) [#37345](https://github.com/PaddlePaddle/Paddle/pull/37345))
+  
+- Support nearest_interp_v2 op using TensorRT inference. ([#34126](https://github.com/PaddlePaddle/Paddle/pull/34126))
+  
+- Add `yolo_box` TensorRT plugin to support input parameters `iou_aware` and `iou_aware_factor` so that the IoU computed by inference is used as a factor for confidence. ([#34128](https://github.com/PaddlePaddle/Paddle/pull/34128))
+  
+- Support `elementwise_sub` and `elementwise_div` calling for TensorRT inference. ([#40806](https://github.com/PaddlePaddle/Paddle/pull/40806) [#41253](https://github.com/PaddlePaddle/Paddle/pull/41253))
+  
+- Support `multiclass_nms3` using TensorRT inference. ([#41181](https://github.com/PaddlePaddle/Paddle/pull/41181) [#41344](https://github.com/PaddlePaddle/Paddle/pull/41344))
+  
+- Support flatten_contiguous_rang op using TensorRT inference. ([#38922](https://github.com/PaddlePaddle/Paddle/pull/38922))
+  
+- Support for `pool2d` attribute `padding` using TensorRT inference when dimension is 4, and `global_pooling` and `ceil_mode` are True. ([#39545](https://github.com/PaddlePaddle/Paddle/pull/39545))
+  
+- Support batch_norm and elementwise_add using TensorRT inference when dimension is 5. ([#36446](https://github.com/PaddlePaddle/Paddle/pull/36446))
+  
+- Add pool3d to use TensorRT inference. ([#36545](https://github.com/PaddlePaddle/Paddle/pull/36545), [#36783](https://github.com/PaddlePaddle/Paddle/pull/36783))
+  
+- Add the `reduce` int32 and float types to use TensorRT inference. Add `reduce_mean` GPU operator int32 and int64 registration. ([#39088](https://github.com/PaddlePaddle/Paddle/pull/39088))
+  
+- Modify MatmulV2ToMul pass. Modify the qualifier (not support of broadcast) and op_teller mapping condition. ([#36652](https://github.com/PaddlePaddle/Paddle/pull/36652))
+  
+- Add the support for TenorRT plugin interface AddPluginV2IOExt. ([#36493](https://github.com/PaddlePaddle/Paddle/pull/36493))
+  
+- Add the aligned attribute in roi_align op and support for TensorRT inference. ([#38905](https://github.com/PaddlePaddle/Paddle/pull/38905))
+  
+- Add the support for TensorRT inference with concat attribute `axis = -1` . ([#39096](https://github.com/PaddlePaddle/Paddle/pull/39096))
+  
+- Add TensorRT plugin: preln_emb_eltwise_layernorm, preln_skip_la, and rnorm ops, for ERNIE-like model performance optimization. ([#39570](https://github.com/PaddlePaddle/Paddle/pull/39570))
+  
+- Add TensorRT fuse pass: preln_embedding_eltwise_layernorm_fuse_pass, preln_skip_layernorm_fuse_pass, for ERNIE-like model performance optimization. ([#39508](https://github.com/PaddlePaddle/Paddle/pull/39508))
+  
+- Split matmul fusion-related passes based on different backends (GPU, CPU, TensorRT), to support transpose function for FC weights. ([#39369](https://github.com/PaddlePaddle/Paddle/pull/39369))
+  
+- Quantization support
+  
+  - For the `PostTrainingQuantization` API, add the support for `paddle.io.DataLoader` object or `Python Generator` input. ([#38686](https://github.com/PaddlePaddle/Paddle/pull/38686))
+    
+  - ERNIE full quantization model inference supports for interleaved data layout. ([#39424](https://github.com/PaddlePaddle/Paddle/pull/39424))
+    
+  - Support for PaddleSlim new quantile model format inference. ([#41049](https://github.com/PaddlePaddle/Paddle/pull/41049))
+    
+  - Add matmul int8 quantization inference op converter and plugin. ([#37285](https://github.com/PaddlePaddle/Paddle/pull/37285))
+    
+  - Add pass to determine if all ops in the model can support int8 quantization. ([#36042](https://github.com/PaddlePaddle/Paddle/pull/36042))
+    
+  - Support quantization inference for the FC part of the multihead attention of the non-variable-length branch. ([#39660](https://github.com/PaddlePaddle/Paddle/pull/39660))
+    
+
+#### **Ascend NPU Related Features**
+
+- - Refactor shape operator forward computation logic to support execution on NPU. ([#39613](https://github.com/PaddlePaddle/Paddle/pull/39613))
+    
+  - Refactor reshape operator forward computation logic to support ShapeTensor input. ([#38748](https://github.com/PaddlePaddle/Paddle/pull/38748))
+    
+  - Uniform accuracy type when loading model weights. ([#39160](https://github.com/PaddlePaddle/Paddle/pull/39160))
+    
+
+### **(3) Bug fixing**
 
 #### **Framework and API fixing**
 
-- Operator fixing
-  - Fix split op: when axis input is less than 0, address access error occurs when converting TensorRT. Filter out the cases that neither static nor dynamic shape is supported when axis is equal to 0. ([#35127](https://github.com/PaddlePaddle/Paddle/pull/35127))
-  - Fix the bug where transpose static shape is wrong when axis is `[0, 1]`. ([#35138](https://github.com/PaddlePaddle/Paddle/pull/35138))
-  - Fix the functional alignment of gather op with native paddle op, and improve op teller filtering conditions. ([#35784](https://github.com/PaddlePaddle/Paddle/pull/35784))
-  - Fix int8 branching of fc op. ([#34787](https://github.com/PaddlePaddle/Paddle/pull/34787), [#32671](https://github.com/PaddlePaddle/Paddle/pull/32671))
-  - Fix op teller filtering condition for reshape. ([#34787](https://github.com/PaddlePaddle/Paddle/pull/34787), [#34583](https://github.com/PaddlePaddle/Paddle/pull/34583))
-  - Fix poor multi-threaded inference efficiency for recurrent op. ([#36053](https://github.com/PaddlePaddle/Paddle/pull/36053))
-  - Fix the overflow bug of int values in gather and scatter op. ([#35544](https://github.com/PaddlePaddle/Paddle/pull/35544))
-  - Fix the ctc op divide-by-zero error.  ([#34724](https://github.com/PaddlePaddle/Paddle/pull/34724))
-  - Fix a crash caused by inserting a scale op when the model input contains a bool type. ([#35176](http://github.com/PaddlePaddle/Paddle/pull/35176))
-  - Fix complex scaler and Tensor operations failure bug. ([#33699](https://github.com/PaddlePaddle/Paddle/pull/33699))
-
-- Frame feature fixing
-  - Fix an out-of-bounds GPU memory access bug when batching data for some ernie models. ([#35077](https://github.com/PaddlePaddle/Paddle/pull/35077))
-  - Fix a possible accuracy bug in the running of the ernie model FP16 with precision. ([#34771](https://github.com/PaddlePaddle/Paddle/pull/34711))
-  - Fix the incorrect output bug due to an inconsistent order of inputs when the ernie becomes longer. ([#33575](https://github.com/PaddlePaddle/Paddle/pull/33575))
-  - Fix a bug where the allocator function is abnormal in multi-stream state. ([#32932](https://github.com/PaddlePaddle/Paddle/pull/33575))
-- Fix a possible crash bug of ERNIE model under TRT8. ([#36769](https://github.com/PaddlePaddle/Paddle/pull/36769))
-- Fix a bug of crash and accuracy when Pool and Slice are used. ([#36666](https://github.com/PaddlePaddle/Paddle/pull/36666))
-- Fix an accuracy bug of yolo_box op caused by a wrong formula. ([#36365](https://github.com/PaddlePaddle/Paddle/pull/36365))
-- Fix a bug where quantized matmul_v2 does not infer properly under TRT. ([#36821](https://github.com/PaddlePaddle/Paddle/pull/36821))
-- Fix a bug where quantized op is incorrectly added when quantizing matmul_v2. ([#36820](https://github.com/PaddlePaddle/Paddle/pull/36820))
-- Fix a bug with the operators batch_norm and elementwise_add reporting an error when TRT is enabled in 3D application scenarios. ([#36446](https://github.com/PaddlePaddle/Paddle/pull/36446))
-- Fix a bug where the prediction model saved by the high-level linear api cannot not be optimized by Pass fusion. ([#36500](https://github.com/PaddlePaddle/Paddle/pull/36500))
-- Fix the Pass of MatmulV2ToMul, re-qualify (matmul_v2 to mul) mapping pass, add Pass of MatmulV2ToMatmul, qualify (matmul_v2 to matmul) mapping pass condition (not supporting broadcast), and modify (matmul, mul) op_teller mapping condition.  ([#36652](https://github.com/PaddlePaddle/Paddle/pull/36652)
-
-#### **Back-end capability fixing**
-
-- TensorRT sub-graph engine fixing
-  - Fix an out-of-bounds error reporting bug with slice plugin's ends parameter in the TensorRT dynamic shape. ([#35357](https://github.com/PaddlePaddle/Paddle/pull/35357))
-  - Fix the bug of keepdim=false that is not supported when reduce op is converted to reduce_all = 1 of TensorRT. ([#35145](https://github.com/PaddlePaddle/Paddle/pull/35145))
-  - Fix the decrease_axis parameter bug when slice op is converted to TensorRT. ([#35100](https://github.com/PaddlePaddle/Paddle/pull/35100))
-  - Fix the bug that negative scale is not supported when nearest_interp op is converted to TensorRT dynamic shape.Fix a bug that scale has higher priority than outh and outw. ([#35405](https://github.com/PaddlePaddle/Paddle/pull/35405))
-  - Fix the bug that padd op's paddings parameter is not the same as tensorrt. ([#35371](https://github.com/PaddlePaddle/Paddle/pull/35371))
-  - Add the 4-dimensional padding support for conv2d op to converting to TensorRT. Filter the cases that the padding_algorithm is SAME and VALID when conv2d op is converted to TensorRT. ([#35627](https://github.com/PaddlePaddle/Paddle/pull/35627))
-  - Add the handling of padding_algorithm as SAME for pool2d op converting to TensorRT. Filter the cases that ksize in exclusive mode less than or equal to padings. ([#35923](https://github.com/PaddlePaddle/Paddle/pull/35923))
-  - Fix the bug of not supporting the Min and Max inputs when clip op is converted to TensorRT. ([#35694](https://github.com/PaddlePaddle/Paddle/pull/35694))
-  - Fix the bug of not supporting the approximate attribute when gelu op is converted to TensorRT. ([#35529](https://github.com/PaddlePaddle/Paddle/pull/35529))
-  - Fix the bug of not supporting the 2-dimensional inputs when affine_channel is converted to TensorRT. ([#35496](https://github.com/PaddlePaddle/Paddle/pull/35496))
-  - Fix an unstable TensorRT sub-graph matching bug. ([#35147](https://github.com/PaddlePaddle/Paddle/pull/35147))
-  - Fix the bug of the TensorRT engine not released after prediction engine destruction. ([#35842](https://github.com/PaddlePaddle/Paddle/pull/35842), [#35938](https://github.com/PaddlePaddle/Paddle/pull/35938))
-  - Fix the bug of incorrect conversion to trt of the paddle operator in opaddle-trt static mode if the shape attribute batch dimension of reshape is -1. ([#34007](https://github.com/PaddlePaddle/Paddle/pull/34007))
-  - Fix the bug of not supporting the RoisNum attribute when roi_align is converted to TensorRT. Fix the incorrect computation when aligned is True and Sampling_ratio = -1 in dynamic shape. ([#35549](https://github.com/PaddlePaddle/Paddle/pull/35549))
-  - Fix the bug of not supporting the AxisTensor property when concat is converted to TensorRT. ([#35545](https://github.com/PaddlePaddle/Paddle/pull/35545))
-  - Fix the bug of not supporting ScaleTensor property when scale is converted to TensorRT or not supporting 1-dimensional input in the static shape. ([#35225](https://github.com/PaddlePaddle/Paddle/pull/35225))
-  - Fix the bug of not supporting the MomentumTensor property when batchnorm is converted to TensorRT. ([#35527](https://github.com/PaddlePaddle/Paddle/pull/35527))
-  - Fix the bug of not supporting ShapeTensor when reshape is converted to TensorRT, fix the bug of not supporting the 1-Dimensional input in the Shape property and static shape.  ([#35166](https://github.com/PaddlePaddle/Paddle/pull/35166))
-  - Add support for TensorRT tile operator. ([#34388](https://github.com/PaddlePaddle/Paddle/pull/34388))
-  - Add support for TensorRT reduce mean operator. ([#34204](https://github.com/PaddlePaddle/Paddle/pull/34204))
-  - Fix a possible crash when using gather op. ([#33999](https://github.com/PaddlePaddle/Paddle/pull/33999))
-  - Fix a flag in TensorRT int8 incorrectly using debug (run only the int8 kernel, resulting in performance degradation). ([#34704](https://github.com/PaddlePaddle/Paddle/pull/34704))
-  - Fix a computation error with gather_nd op when calling TensorRT on 2-dimensional inputs. ([#35464](https://github.com/PaddlePaddle/Paddle/pull/35464))
-  - Fix hard_sigmoid op computation error when calling TensorRT with 2-dimensional input. ([#35908](https://github.com/PaddlePaddle/Paddle/pull/35908))
-  - Fix computation error in prelu op when calling TensorRT on 2-dimensional inputs. ([#35512](https://github.com/PaddlePaddle/Paddle/pull/35512))
-  - Fix a crash caused by using right slash as path separator in TensorRT inference on windows. ([#33853](http://github.com/PaddlePaddle/Paddle/pull/33853))
-
-
-#### **Others**
-
-- Fix the bug when pruning inverse operator script encounters an error with Chinese character comments. ([#33937](https://github.com/PaddlePaddle/Paddle/pull/33937), [#33919](https://github.com/PaddlePaddle/Paddle/pull/33919))
-- Fix the bug of an error in compile-time single-test model download caused by incomplete single-test inference, add MD5 download validation for test model download. ([#33264](https://github.com/PaddlePaddle/Paddle/pull/33264), [#33217](https://github.com/PaddlePaddle/Paddle/pull/33217))
-- Fix broadcast bug in blazeface model where mkldnn elementwise op is not supported.  ([#33549](https://github.com/PaddlePaddle/Paddle/pull/33549))
-- Fix swin_transformer mkldnn inference error reporting bug. ([#35740](https://github.com/PaddlePaddle/Paddle/pull/35740))
-- Fix the paddlex.deploy.Predictor oneDNN multi-threaded execution unet error reporting bug. ([#35231](https://github.com/PaddlePaddle/Paddle/pull/35231))
-- Fix the bug with oneDNN setCacheCapacity not limiting memory. ([#33571](https://github.com/PaddlePaddle/Paddle/pull/33571))
-
-
-
-
-## **Environment Adaptation**
-
-### **Compiler Installation**
-
-- For Windows, support `Ninja compilation build method`, compilation speed, ease of use, and stability. These features are improved greatly. Windows users can perform local source code compilation for Paddle via `pip install ninja`. ([#31161](https://github.com/PaddlePaddle/Paddle/pull/31161), [#31449](https://github.com/PaddlePaddle/Paddle/pull/31449), [#32987](https://github.com/PaddlePaddle/Paddle/pull/32987), [#33140](https://github.com/PaddlePaddle/Paddle/pull/33140), [#33155](https://github.com/PaddlePaddle/Paddle/pull/33155))
-- Only python3.7 is kept in the release mirror. Remove python3.5, python3.6, python3.8, python3.9 and paddle packages of the corresponding python versions. The mirror size is reduced.The mirror size is reduced by 30%~50%. ([#32688](https://github.com/PaddlePaddle/Paddle/pull/32688))
-- TensorRT library is used for inference. Only paddle training base functions in the release mirror are supported, without needing to support TensorRT.Delete the TensorRT library from the distribution image to prevent users from using the mirror by mistake. ([#34266](https://github.com/PaddlePaddle/Paddle/pull/34266))
-
-### **New Hardware Adaptation**
-
-- Support Hygon DCU chip training and inference. Support up to 9 classifications and 70 models. 
-  - For Hygon DCU, add the support of 5 PaddleDetection models. 
-  - For Hygon DCU, add the support for 6 PaddleGAN models.
-  - For Hygon DCU, add the support for 13 PaddleSeg models.
-  - For Hygon DCU, add the support for 3 PaddleNLP models.
-  - For Hygon DCU, add the support for 4 PaddleOCR models.
-  - For Hygon DCU, add the support for 3 PaddleVideo models.
-- Support Kunlun 2nd generation chip (XPU-2) training. Support ResNet50, SSD, Bert, Transformer and many other models. Support static map + dynamic map training. Support mixed precision training.
+- Fix the bug of model clipping when saving static graphs. ([#37579](https://github.com/PaddlePaddle/Paddle/pull/37579))
+  
+- For the C API, add wrapper PD_Cstr for strings, and provide construction and destructing methods to avoid users to use C runtime library to destruct strings directly. ([#38667](https://github.com/PaddlePaddle/Paddle/pull/38667))
+  
+- Fix the logic bug with memory reuse at prediction time. ([#37324](https://github.com/PaddlePaddle/Paddle/pull/37324))
+  
+- Fix memory reuse error reporting in multi-threading. ([#37894](https://github.com/PaddlePaddle/Paddle/pull/37894))
+  
+- Allow passing empty strings for inference when no weight file is available. ([#38579](https://github.com/PaddlePaddle/Paddle/pull/38579))
+  
+- Fix an issue of clone not being supported when TensorRT dynamic shape is enabled. ([#38520](https://github.com/PaddlePaddle/Paddle/pull/38520))
+  
+- Fix multi-threaded clone error after TensorRT dynamic shape is enabled. ([#40067](https://github.com/PaddlePaddle/Paddle/pull/40067))
+  
+- Fix a TensorRT engine destructing issue. ([#35842](https://github.com/PaddlePaddle/Paddle/pull/35842), [#35938](https://github.com/PaddlePaddle/Paddle/pull/35938))
+  
+- For the lite xpu interface, fix an issue where the xpu card cannot be selected. ([#36610](https://github.com/PaddlePaddle/Paddle/pull/36610))
+  
+- The TensorRT dynamic shape parameter automatically generate the interface, to add the file existence check. ([#36628](https://github.com/PaddlePaddle/Paddle/pull/36628))
+  
+
+#### **Backend Capability Fixing**
+
+- Fix cuDNN default algorithm selection configuration for prediction, with using non-deterministic policies. ([#41491](https://github.com/PaddlePaddle/Paddle/pull/41491))
+  
+- Fix the bug with deformable_conv op in TensorRT plugin resource recovery handling error. ([#38374](https://github.com/PaddlePaddle/Paddle/pull/38374))
+  
+- Fix a serialization error in the TensorRT plugin for deformable_conv op. ([#38057](https://github.com/PaddlePaddle/Paddle/pull/38057))
+  
+- Adapt the new refactor engine and serialization API of TensorRT 8.0. ([#36769](https://github.com/PaddlePaddle/Paddle/pull/36769))
+  
+- Fix the bug that the Flatten2MatmulFusePass, Squeeze2MatmulFusePass, and Reshape2MatmulFusePass do not take effect. ([#37644](https://github.com/PaddlePaddle/Paddle/pull/37644))
+  
+- Fix the bug with TensorRT input data reporting errors. ([#37427](https://github.com/PaddlePaddle/Paddle/pull/37427))
+  
+- Add error message when input dimension is wrong. ([#38962](https://github.com/PaddlePaddle/Paddle/pull/38962))
+  
+- Fix the bug with EmbEltwiseLayernorm output type error. ([#40015](https://github.com/PaddlePaddle/Paddle/pull/40015))
+  
+- Remove conv_affine_channel_fuse_pass and the corresponding unit test. ([#39817](https://github.com/PaddlePaddle/Paddle/pull/39817))
+  
+- Fix an issue where the adaptive_pool2d pass incorrectly replaces the pool attribute. ([#39600](https://github.com/PaddlePaddle/Paddle/pull/39600))
+  
+- Fix the bug that shuffle_channel_detect_pass incorrectly generates shuffle_channel op. ([#39242](https://github.com/PaddlePaddle/Paddle/pull/39242))
+  
+- Fix transpose parameter error. ([#39006](https://github.com/PaddlePaddle/Paddle/pull/39006))
+  
+- Fix the crash bug when nearest_interp_v2 input scale dimension is less than 1. ([#38725](https://github.com/PaddlePaddle/Paddle/pull/38725))
+  
+- Fix the bug that the prelu does not support one-dimensional input in dynamic shape. ([#39389](https://github.com/PaddlePaddle/Paddle/pull/39389))
+  
+- Fix the bug in the kernel function of slice's special_slice_plugin. ([#39875](https://github.com/PaddlePaddle/Paddle/pull/39875))
+  
+- Temporarily disable int8 branch under skip_layernorm variable length to prevent accuracy degradation. ([#39991](https://github.com/PaddlePaddle/Paddle/pull/39991))
+  
+- Fix some bugs regarding support for preln_ernie models. ([#39733](https://github.com/PaddlePaddle/Paddle/pull/39733))
+  
+- Fix the bug that slice may exceed threads limit in ERNIE. Fix the bug that the spacial_slice is incorrectly triggered. ([#39096](https://github.com/PaddlePaddle/Paddle/pull/39096))
+  
+- Fix the bug that the elementwise does not support broadcast when the dimension is the same. ([#37908](https://github.com/PaddlePaddle/Paddle/pull/37908))
+  
+- Fix the problem that the underlying implementation is different in the nearest_interp op when align_corners is True and TensorRT layer results and native op have diff. ([#37525](https://github.com/PaddlePaddle/Paddle/pull/37525))
+  
+- Fix qkv_plugin: Kernel function computation error. ([#37096](https://github.com/PaddlePaddle/Paddle/pull/37096))
+  
+- Fix the bug with inference pass for dynamic quantization. ([#35879](https://github.com/PaddlePaddle/Paddle/pull/35879))
+  
+- Reuse directly when Tensor requests less memory than the allocated size. ([#37880](https://github.com/PaddlePaddle/Paddle/pull/37880))
+  
+- Fix the hang bug when ERNIE fixed-length model is enabled with TensorRT. ([#37839](https://github.com/PaddlePaddle/Paddle/pull/37839))
+  
+- Fix the crash bug when TensorRT int8 lacks of dynamic range information. ([#36900](https://github.com/PaddlePaddle/Paddle/pull/36900))
+  
+- Fix the bug with slice deserialization code. ([#36588](https://github.com/PaddlePaddle/Paddle/pull/36588))
+  
+- Fix yolo box calculation formula error. ([#36240](https://github.com/PaddlePaddle/Paddle/pull/36240))
+  
+- Fix the crash bug when the earlier version model uses a later version of roi_align. ([#38788](https://github.com/PaddlePaddle/Paddle/pull/38788)) External Developers
+  
+- Fix the bug of a large performance difference of softmax between python and C++. ([#37130](https://github.com/PaddlePaddle/Paddle/pull/37130))
+  
+- Fix matmul inference failure on static shape 2-dimensional input and dynamic shape 3-dimensional input. ([#36849](https://github.com/PaddlePaddle/Paddle/pull/36849))
+  
+- Fix reshape_transpose_matmul_mkldnn_fuse_pass mishandling of shapes. ([#36731](https://github.com/PaddlePaddle/Paddle/pull/36731))
+  
+- Fix an issue where TensorRT gets 4 dimensions when the input is 2 dimensions. ([#36614](https://github.com/PaddlePaddle/Paddle/pull/36614))
+  
+- Fix the bug report when the interpolate_v2 MKLDNN operator is null in the scale attribute. ([#36623](https://github.com/PaddlePaddle/Paddle/pull/36623))
+  
+- Fix poor performance of the recurrent operator in multi-threaded scenarios. ([#36052](https://github.com/PaddlePaddle/Paddle/pull/36052))
+  
+- Remove restrictions of relu, sigmoid, tanh, relu6, batch_norm, clip, concat, gelu, hard_sigmoid, prelu, softmax, split, and swish on TensorRT 2-dimensional inputs. ([#37097](https://github.com/PaddlePaddle/Paddle/pull/37097))
+  
+- Fix reshape op to use TensorRT inference. ([#41090](https://github.com/PaddlePaddle/Paddle/pull/41090))
+  
+- Fix matmul related pass, which is compatible with matmul_v2. ([#36424](https://github.com/PaddlePaddle/Paddle/pull/36424))
+  
+- Support VALID and SAME attributes in the padding method of the conv2d operator when TensorRT is enabled. ([#38999](https://github.com/PaddlePaddle/Paddle/pull/38999))
+  
+- Fix MKLDNN multi-input operator quantization problem. ([#39593](https://github.com/PaddlePaddle/Paddle/pull/39593), [#39346](https://github.com/PaddlePaddle/Paddle/pull/39346), [#40717](https://github.com/PaddlePaddle/Paddle/pull/40717))
+  
+- Fix scale error of conv+activation in MKLDNN quantization scenarios. ([#38331](https://github.com/PaddlePaddle/Paddle/pull/38331))
+  
+- Fix the bug in MKLDNN quantization without parameters where the quantization of subsequent operators is handled differently. ([#39342](https://github.com/PaddlePaddle/Paddle/pull/39342))
+  
+- Fix a data type related issue in MKLDNN cpu_bfloat16_placement_pass. ([#38702](https://github.com/PaddlePaddle/Paddle/pull/38702))
+  
+- Fix a split operator execution issue in MKLDNN bfloat16 inference. ([#39548](https://github.com/PaddlePaddle/Paddle/pull/39548))
+  
+- Fix the bug with MKLDNN matmul_v2 operator not supporting 6 dimensions. ([#36342](https://github.com/PaddlePaddle/Paddle/pull/36342), [#38665](https://github.com/PaddlePaddle/Paddle/pull/38665))
+  
+- Fix MKLDNN DeviceContext error in MKLDNN matmul_v2_transpose_reshape. ([#38554](https://github.com/PaddlePaddle/Paddle/pull/38554))
+  
+- Fix incorrectly calculated results for segmentation models in MKLDNN inference scenarios. ([#37310](https://github.com/PaddlePaddle/Paddle/pull/37310))
+  
+- Fix MKLDNN bfloat16 placement operator list and add the missing operator. ([#36291](https://github.com/PaddlePaddle/Paddle/pull/36291))
+  
+- Fix the format bug of MKLDNN operators, including: FC, conv_transpose, 6-dimensional Tensor error reporting, and wrong output format of conv to NHWC input. ([#38890](https://github.com/PaddlePaddle/Paddle/pull/38890), [#37344](https://github.com/PaddlePaddle/Paddle/pull/37344), [#37175](https://github.com/PaddlePaddle/Paddle/pull/37175), [#38553](https://github.com/PaddlePaddle/Paddle/pull/38553), [#40049](https://github.com/PaddlePaddle/Paddle/pull/40049), [#39097](https://github.com/PaddlePaddle/Paddle/pull/39097))
+  
+- Fix MKLDNN multi-threaded reasoning scenario error due to cache mechanism. ([#36290](https://github.com/PaddlePaddle/Paddle/pull/36290), [#35884](https://github.com/PaddlePaddle/Paddle/pull/35884))
+  
+- Fix MKLDNN quantization model accuracy anomaly caused by matmul and FC. ([#38023](https://github.com/PaddlePaddle/Paddle/pull/38023), [#37618](https://github.com/PaddlePaddle/Paddle/pull/37618))
+  
+- Fix the abnormal quantization model accuracy issue in MKLDNN quantization conversion scripts caused by missing passes. ([#37619](https://github.com/PaddlePaddle/Paddle/pull/37619), [#40542](https://github.com/PaddlePaddle/Paddle/pull/40542),[#38912](https://github.com/PaddlePaddle/Paddle/pull/38912))
+  
+- Fix the crash bug in MKLDNN enabling volume op due to data type mismatch. ([#38133](https://github.com/PaddlePaddle/Paddle/pull/38133))
+  
+- Fix an issue where some MKLDNN ops need to change back to the original layout after modifying the layout. ([#39422](https://github.com/PaddlePaddle/Paddle/pull/39422))
+  
+- Fix the bug of Python API error report due to conflict with Ascend software stack, because the GIL lock is not released in the Ascend 910 inference scenario. ([#38605](https://github.com/PaddlePaddle/Paddle/pull/38605))
+  
+
+## **5. Environment Adaptation**
+
+### **Compile and Install**
+
+- The installation package CUDA released by PaddlePaddle on PIP source is adjusted to V11.0. If you need to install other CUDA versions of PaddlePaddle, please visit [PaddlePaddle official website](https://www.paddlepaddle.org.cn/install/quick) for downloading.
+  
+- PaddlePaddle 2.3.0-rc0 PIP source release of CUDA11.0 installer adds the support for Ampere architecture. Users with GPU architecture 8.0 or 8.6 can upgrade directly through `pip install paddlepaddle-gpu` .
+  
+- From version 2.3.0-rc0, PaddlePaddle has adjusted and upgraded the types of GPU architectures supported by the framework. (For more information, please refer to: [GPU architectures supported by PaddlePaddle](https://www.paddlepaddle.org.cn/documentation/docs/zh/2.3rc/install/Tables.html#gpu))
+  
+
+Notes:
+
+- PIP source installation means downloading the installation package and dependency libraries from PIP official website with using `pip install paddlepaddle` or `pip install paddlepaddle-gpu` . This supports less architecture types, and lighter installation package,and only one CUDA version of the installation package is provided(compared with BOS source).
+  
+  - Prior to version 2.3, the PIP source installer (CUDA10.2) supports the following GPU architectures: 3.5, 5.0, 5.2, 6.0, 6.1, 7.0, and 7.5.
+    
+  - Later than version 2.3, the PIP source installer (CUDA11.0) supports the following GPU architectures: 6.0, 6.1, 7.0, 7.5, 8.0
+    
+- The BOS source is a way to download the installation package and dependency libraries from the official website of PaddlePaddle, which supports more GPU architectures. The download source is from China and it is much faster.(compared with PIP source, it supports more kinds of architectures and provides multiple CUDA versions of installation packages).
+  
+  - Prior to version 2.3, the GPU architectures supported by the bos source installer on the PaddlePaddle website:
+    
+    - CUDA10 : 3.5, 5.0, 5.2, 6.0, 6.1, 7.0, 7.5；
+      
+    - CUDA11 : 5.2，6.0，6.1，7.0，7.5，8.0。
+      
+  - Later than version 2.3, the GPU architectures supported by the bos source installer on the PaddlePaddle website:
+    
+    - CUDA10 : 3.5, 5.0, 5.2, 6.0, 6.1, 7.0, 7.5；
+      
+    - CUDA11 : 3.5, 5.0, 6.0, 6.1, 7.0, 7.5, 8.0。
+      
+- The Windows platform supports the compilation through Visual Studio 2019. ([#38719](https://github.com/PaddlePaddle/Paddle/pull/38719))
+  
+- Eliminate various warnings when compiling on the Windows platform. ([#38034](https://github.com/PaddlePaddle/Paddle/pull/38034), [#37890](https://github.com/PaddlePaddle/Paddle/pull/37890), [#37442](https://github.com/PaddlePaddle/Paddle/pull/37442), [#37439](https://github.com/PaddlePaddle/Paddle/pull/37439), [#36857](https://github.com/PaddlePaddle/Paddle/pull/36857))
+  
+- Fix jetson compilation issues introduced by the underlying data structure upgrade. ([#39669](https://github.com/PaddlePaddle/Paddle/pull/39669), [#39441](https://github.com/PaddlePaddle/Paddle/pull/39441))
+  
+
+### **New Hardware Backend Extention**
+
+- Custom device support: provide a plug-in way to extend PaddlePaddle hardware backend. With this function, developers do not need to modify PaddlePaddle codes for specific hardware, but simply implement the standard interface and compile it into a dynamic link library that can be called by PaddlePaddle as a plug-in.This reduces the development effort of adding a new hardware backend to PaddlePaddle. Currently it supports custom Runtime and custom Kernel.
+  
+- Support Huawei NPU chip (Ascend910) training/inference. Support ResNet50, YoloV3, BERT, Transformer and many other models. Support static + dynamic graph and auto-mixed precision training. Support single card, and distribute training across multiple cards, multiple machines.
+  
+- Support Graphcore IPU chip (including IPU Mk2 GC200 and Bow IPU) training/inference. Support ResNet50, BERT and other models. Support static graph training. Support single card, and distribute training across multiple cards, multiple machines.
+  
+- Support cambricon MLU chip (MLU370x4) training/inference. Support models such as ResNet50. Support static graph + dynamic graph training. Support auto-mixed precision training. Support single card, and distribute training across multiple cards, multiple machines.
+  
+- Support KUNLUNXIN 2 chips (Kunlunxin AI acceleration cards R200, R300) training/inference. Support ResNet50, YoloV3, OCR-DB, SSD, MobilnetV3, UNet, BERT, Transformer, GPT-2, Wide&Deep, and DeepFM. Support static graph + dynamic graph training. Support auto-mixed precision training. Support single card, and distribute training across multiple cards, multiple machines.
+  
 
 ## Thanks to our Contributors
 
-This release contains contributions from:
+This release contains contributions from the project core team as well as :
 
-0x45f, 123malin, Adam Osewski, Aganlengzi, Aurelius84, Baibaifan, Bo Liu, CheQiXiao, Chen Long, Chen Weihang, CtfGo, Double\_V, Ethanzjp, Fan Zhang, Feiyu Chan, Feng Xing, From00, GT-Zhang, Guanghua Yu, Guoxia Wang, Haipeng Wang, Hao Lin, Haohongxiang, Hui Zhang, Huihuang Zheng, HydrogenSulfate, IMMORTAL, JYChen, JZ-LIANG, Jacek Czaja, Jack Zhou, Jackwaterveg, Jeng Bai-Cheng, Jiangxinz, Jiaqi Liu, Jiawei Wang, JingZhuangzhuang, June Weng, Kaipeng Deng, Kqnonrime, LJQ❤️, Leo Chen, Li Min, LielinJiang, Lijunhui, Linjie Chen, Liu-xiandong, LiuWei, Ming-Xu Huang, MissPenguin, PaddlePM, Pei Yang, Peihan, Qi Li, QingshuChen, Ren Wei (任卫), Roc, Shang Zhizhou, ShenLiang, Shibo Tao, Siming Dai, Sing\_chan, TCChenLong, TTerror, TeslaZhao, Thomas Young, Thunderbrook, Tongxin Bai, WJJ1995, WangXi, Wangzheee, Wei Shengyu, WeiXin, Weilong Wu, Wenyu, Wilber, XGZhang, XYZ, XYZ916829, XiangGao, Xiaoxu Chen, YUNSHEN XIE, Yanxing Shi, Yiqun Liu, YuanRisheng, Yuang Liu, Yulong Ao, Zeng Jinle, Zhang Ting, Zhang Zheng, Zhanlue Yang, Zhen Wang, Zhong Hui, Zhou Wei, andreazanetti, andyjpaddle, arlesniak, baoachun, cc, ceci3, chajchaj, chenenquan, chenjian, chentianyu03, crystal, cuicheng01, danleifeng, denglin-github, duanboqiang, dyning, feng626, feng_shuai, furnace, gongweibao, heliqi, hlygit66666, hong, hong19860320, houj04, huangjun12, huangxu96, huzhiqiang, iducn, jakpiase, jiangcheng, joanna.wozna.intel, jzhang533, kuizhiqing, levi131, lidanqing, lilong12, limingshu, littletomatodonkey, liu zhengxi, liutiexing, liuyuhui, liym27, lyuwenyu, lzzyzlbb, niuliling123, pangyoki, parap1uie-s, ronnywang, root, seemingwang, shangliang Xu, shiyutang, smallv0221, sunli, sunzhongkai588, taixiurong, tangwei12, tianshuo78520a, veyron95, wangguanqun, wangguanzhong, wanghuancoder, wangna11BD, wangxinxin08, wangzhen38, wangzhuang01, wawltor, wenbin, whs, will-jl944, wuhuachaocoding, wuhuanzhou, xiaoting, xiaoxiaohehe001, xiayanming, xiegegege, xiemoyuan, xiongkun, yaoxuefeng, yeliang2258, yingyibiao, zhangbo9674, zhangchunle, zhangkaihuo, zhaoyingli, zhiboniu, zhoujun, zhouzj, zhulei, zhupengyang, zlsh80826, zmx, zyfncg, 李季, 津, 王明冬, 石晓伟
+Adam Osewski, Allen Guo, arlesniak, chenenquan, chenyanlann, fengkuangxiaxia, fuqianya, fwenguang, guguguzi, helen88, houj04, Jacek Czaja, jakpiase, jianghaicheng, joanna.wozna.intel, joeqiao12, Leo Chen, Leo Guo, Li-fAngyU, lidanqing, Liyulingyue, Matsumoto GAO, maxhuiy, Ming-Xu Huang, Nyakku Shigure, piotrekobi, piotrekobiIntel, QingshuChen, qipengh, Skr Bang, Sylwester Fraczek, Sławomir Siwek, taixiurong, tanzhipeng, Tomasz Socha, TTerror, Webbley, yaozhixin, ykkk2333, yujun, Zhangjingyu06, zhangxiaoci, zhangyikun02, zhangyk0314, zlsh80826, zn, Zuza

From 552a353c7e691bf427b39a7745b3b293104de9fd Mon Sep 17 00:00:00 2001
From: Chen Long <1300851984@qq.com>
Date: Sat, 30 Apr 2022 16:01:18 +0800
Subject: [PATCH 03/11] Update release_note_en.md

---
 docs/release_note_en.md | 175 ++++++++++++++++++++--------------------
 1 file changed, 89 insertions(+), 86 deletions(-)

diff --git a/docs/release_note_en.md b/docs/release_note_en.md
index 39b34923d5b..261faad72de 100644
--- a/docs/release_note_en.md
+++ b/docs/release_note_en.md
@@ -65,105 +65,108 @@ We are pleased to announce the release the PaddlePaddle Framework V2.3.0-rc0. Th
 - To keep consistency with division behavior under python3, the division symbol `/` has been changed from “rounding divide” to “true divide”, and the data type of the computed output has been switched from int to float. ([#40890](https://github.com/PaddlePaddle/Paddle/pull/40890))
   
 
-|     |     |
-| --- | --- |
-| 2.2 | 2.3.0-rc0 |
-
-> > > import paddle
-
-> > > a = paddle.to_tensor([327])
-
-> > > b = paddle.to_tensor([80])
-
-> > > a / b
-
+<table>
+<tr>
+<th>
+2.2
+</th>
+<th>
+2.3.0-rc0
+</th>
+</tr>
+
+<tr>
+<td>
+<pre>
+
+```python
+>>> import paddle
+>>> a = paddle.to_tensor([327])
+>>> b = paddle.to_tensor([80])
+>>> a / b
 Tensor(shape=[1], dtype=int64, place=CUDAPlace(0), stop_gradient=True,
-
-[4])
-
-> > > import paddle
-
-> > > a = paddle.to_tensor([327])
-
-> > > b = paddle.to_tensor([80])
-
-> > > a / b
-
+      [4])
+```
+</pre>
+</td>
+<td>
+<pre>
+
+```python
+>>> import paddle
+>>> a = paddle.to_tensor([327])
+>>> b = paddle.to_tensor([80])
+>>> a / b
 Tensor(shape=[1], dtype=float32, place=Place(gpu:0), stop_gradient=True,
-
-[4.08750010])
+      [4.08750010])
+```
+</pre>
+</td>
+</tr>
+</table>
 
 - Revise the ELU's formula. The computing method in case of alpha <0 aligns with the original paper, thus fixing a small number of cases where the results are incorrectly calculated. Meanwhile, elu_ will report an error in case of alpha <0, because it is not mathematically possible to compute the inverse gradient from the output only at alpha <0. ([#37316](https://github.com/PaddlePaddle/Paddle/pull/37316))
 
-|     |     |
-| --- | --- |
-| 2.2 | 2.3.0-rc0 |
-
+<table>
+<tr>
+<th>
+2.2
+</th>
+<th>
+2.3.0-rc0
+</th>
+</tr>
+
+<tr>
+<td>
+<pre>
+
+```python
 # elu(x) = max(0, x) + min(0, α ∗ (e^x − 1))
-
-> > > import paddle
-
-> > > x = paddle.to_tensor([-1. ,6.])
-
-> > > m = paddle.nn.ELU(-0.2)
-
-> > > out = m(x)
-
-> > > out
-
+>>> import paddle
+>>> x = paddle.to_tensor([-1. ,6.])
+>>> m = paddle.nn.ELU(-0.2)
+>>> out = m(x)
+>>> out
 Tensor(shape=[2], dtype=float32, place=CUDAPlace(0), stop_gradient=True,
-
-[ 0. , -74.48576355])
-
-> > > out = paddle.nn.functional.elu_(x, alpha=-0.2, name=None)
-
-> > > out
-
+       [ 0.         , -74.48576355])
+>>> out = paddle.nn.functional.elu_(x, alpha=-0.2, name=None)
+>>> out
 Tensor(shape=[2], dtype=float32, place=CUDAPlace(0), stop_gradient=True,
-
-[ 0. , -74.48576355])
-
+       [ 0.         , -74.48576355])
+```
+</pre>
+</td>
+<td>
+<pre>
+
+```python
 # elu(x) = x, if x > 0
-
 # elu(x) = α ∗ (e^x − 1), if x <= 0
-
-> > > import paddle
-
-> > > x = paddle.to_tensor([-1. ,6.])
-
-> > > m = paddle.nn.ELU(-0.2)
-
-> > > out = m(x)
-
-> > > out
-
+>>> import paddle
+>>> x = paddle.to_tensor([-1. ,6.])
+>>> m = paddle.nn.ELU(-0.2)
+>>> out = m(x)
+>>> out
 Tensor(shape=[2], dtype=float32, place=CUDAPlace(0), stop_gradient=True,
-
-[0.12642412, 6. ])
-
-> > > out = paddle.nn.functional.elu_(x, alpha=-0.2, name=None)
-
+       [0.12642412,  6.        ])
+>>> out = paddle.nn.functional.elu_(x, alpha=-0.2, name=None)
 Traceback (most recent call last):
-
-File "<stdin>", line 1, in <module>
-
-File "/usr/local/lib/python3.7/dist-packages/decorator.py", line 232, in fun
-
-return caller(func, *(extras + args), **kw)
-
-File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/wrapped_decorator.py", line 25, in __impl__
-
-return wrapped_func(*args, **kwargs)
-
-File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/inplace_utils.py", line 34, in __impl__
-
-return func(*args, **kwargs)
-
-File "/usr/local/lib/python3.7/dist-packages/paddle/nn/functional/activation.py", line 89, in elu_
-
-assert alpha >= 0., "elu_ only support alpha >= 0, please use elu instead."
-
+  File "<stdin>", line 1, in <module>
+  File "/usr/local/lib/python3.7/dist-packages/decorator.py", line 232, in fun
+    return caller(func, *(extras + args), **kw)
+  File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/wrapped_decorator.py", line 25, in __impl__
+    return wrapped_func(*args, **kwargs)
+  File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/inplace_utils.py", line 34, in __impl__
+    return func(*args, **kwargs)
+  File "/usr/local/lib/python3.7/dist-packages/paddle/nn/functional/activation.py", line 89, in elu_
+    assert alpha >= 0., "elu_ only support alpha >= 0, please use elu instead."
 AssertionError: elu_ only support alpha >= 0, please use elu instead.
+```
+</pre>
+</td>
+</tr>
+</table>
 
 ## **3. Training Framework (with the distributed function)**
 

From 9eb73bcb74bf60e0c87f2da7053c2ac844e2eeeb Mon Sep 17 00:00:00 2001
From: Chen Long <1300851984@qq.com>
Date: Sat, 30 Apr 2022 16:03:15 +0800
Subject: [PATCH 04/11] Update release_note_cn.md

---
 docs/release_note_cn.md | 197 ++++++++++++++++++----------------------
 1 file changed, 89 insertions(+), 108 deletions(-)

diff --git a/docs/release_note_cn.md b/docs/release_note_cn.md
index 0509180df06..f3023a52264 100644
--- a/docs/release_note_cn.md
+++ b/docs/release_note_cn.md
@@ -53,133 +53,114 @@
 
 - 这个版本中，我们在框架的执行器也做了大量工作，详情请见：[新动态图执行机制](https://ku.baidu-int.com/knowledge/HFVrC7hq1Q/pKzJfZczuc/7UhIeLfrn3/0rDW-MD4RXSfkx#anchor-088a55e0-b962-11ec-a8b3-f52dfa102ded) 与 [全新静态图执行器](https://ku.baidu-int.com/knowledge/HFVrC7hq1Q/pKzJfZczuc/7UhIeLfrn3/0rDW-MD4RXSfkx#anchor-e81120c0-c233-11ec-a2f2-c9306d79e3c2)。
 
-### 其他备注（发布时要删除）
-
-> 我这面好像rc没有特别重要的了，性能自动优化得正式版才能发，编译器的部分更是不跟版本走，只到develop了。 by 蓝翔
-> 
-> 高阶自动微分的功能是否统一由一位高T来写？
-> 
-> 里面的术语需要统一起来，并且用标准的用法。例如：有的地方用float16，有的地方用FP16；用的地方用TensorRT，有的地方用tensorrt。
-> 
-> 术语统一（注意大小写与特殊标注）：
-> 
-> Pure FP16、FP32、bfloat16、Tensor、TensorRT、CUDA、cuDNN、GPU、CPU、op(op名称不需要加 )、API(API名称与参数需要加 `paddle.*`，如`NHCW`、`axis`)、 Kernel、seed、pass、inplace、PaddlePaddle/飞桨、shape、MKLDNN、python、conv、cache、dropout、ERNIE、Windows、Mac、Linux（更多统一标准待补充...）
-> 
-> 中英文之间加空格，句尾加句号；标点符号不要中英文混用，尤其注意中英文的逗号。
-
 ## 2. 不兼容升级
 
 - `paddle.to_tensor` 将一个 python int scalar 转换为 Tensor 时，在 Windows 上的默认数据类型由 int32 变为 int64，从而与 Linux/Mac 保持对齐。([#39662](https://github.com/PaddlePaddle/Paddle/pull/39662)) 
 
 - 为了与 python3 下的除法行为保持一致，除法符号 `/` 从 rounding divide 变成 true divide，计算输出结果的数据类型从 int 切换成 float。 ([#40890](https://github.com/PaddlePaddle/Paddle/pull/40890)) 
 
-### Paddle 2.2 version
-
-import paddle
-
-a = paddle.to_tensor([327])
-
-b = paddle.to_tensor([80])
-
-a / b
-
-'''
-
+<table>
+<tr>
+<th>
+2.2
+</th>
+<th>
+2.3.0-rc0
+</th>
+</tr>
+
+<tr>
+<td>
+<pre>
+
+```python
+>>> import paddle
+>>> a = paddle.to_tensor([327])
+>>> b = paddle.to_tensor([80])
+>>> a / b
 Tensor(shape=[1], dtype=int64, place=CUDAPlace(0), stop_gradient=True,
-
- [4])
-
-'''
-
-### Paddle 2.3.0-rc0 version
-
-import paddle
-
-a = paddle.to_tensor([327])
-
-b = paddle.to_tensor([80])
-
-a / b
-
-'''
-
+      [4])
+```
+</pre>
+</td>
+<td>
+<pre>
+
+```python
+>>> import paddle
+>>> a = paddle.to_tensor([327])
+>>> b = paddle.to_tensor([80])
+>>> a / b
 Tensor(shape=[1], dtype=float32, place=Place(gpu:0), stop_gradient=True,
-
- [4.08750010])
-
-'''
+      [4.08750010])
+```
+</pre>
+</td>
+</tr>
+</table>
 
 - 修正 ELU 的公式，alpha < 0 时的计算方式与原论文对齐，从而修复小部分情况下的计算结果错误。同时，由于在 alpha < 0 无法在数学上仅从输出计算反向梯度，因此 elu_ 在 alpha < 0 时将报错。([#37316](https://github.com/PaddlePaddle/Paddle/pull/37316))
 
-### Paddle 2.2 version
-
+<table>
+<tr>
+<th>
+2.2
+</th>
+<th>
+2.3.0-rc0
+</th>
+</tr>
+
+<tr>
+<td>
+<pre>
+
+```python
 # elu(x) = max(0, x) + min(0, α ∗ (e^x − 1))
-
-> > > import paddle
-
-> > > x = paddle.to_tensor([-1. ,6.])
-
-> > > m = paddle.nn.ELU(-0.2)
-
-> > > out = m(x)
-
-> > > out
-
+>>> import paddle
+>>> x = paddle.to_tensor([-1. ,6.])
+>>> m = paddle.nn.ELU(-0.2)
+>>> out = m(x)
+>>> out
 Tensor(shape=[2], dtype=float32, place=CUDAPlace(0), stop_gradient=True,
-
- [ 0.         , -74.48576355])
-
-> > > out = paddle.nn.functional.elu_(x, alpha=-0.2, name=None)
-
-> > > out
-
+       [ 0.         , -74.48576355])
+>>> out = paddle.nn.functional.elu_(x, alpha=-0.2, name=None)
+>>> out
 Tensor(shape=[2], dtype=float32, place=CUDAPlace(0), stop_gradient=True,
-
- [ 0.         , -74.48576355])
-
-### Paddle 2.3.0-rc0 version
-
+       [ 0.         , -74.48576355])
+```
+</pre>
+</td>
+<td>
+<pre>
+
+```python
 # elu(x) = x, if x > 0
-
 # elu(x) = α ∗ (e^x − 1), if x <= 0
-
-> > > import paddle
-
-> > > x = paddle.to_tensor([-1. ,6.])
-
-> > > m = paddle.nn.ELU(-0.2)
-
-> > > out = m(x)
-
-> > > out
-
+>>> import paddle
+>>> x = paddle.to_tensor([-1. ,6.])
+>>> m = paddle.nn.ELU(-0.2)
+>>> out = m(x)
+>>> out
 Tensor(shape=[2], dtype=float32, place=CUDAPlace(0), stop_gradient=True,
-
- [0.12642412,  6.        ])
-
-> > > out = paddle.nn.functional.elu_(x, alpha=-0.2, name=None)
-
+       [0.12642412,  6.        ])
+>>> out = paddle.nn.functional.elu_(x, alpha=-0.2, name=None)
 Traceback (most recent call last):
-
- File "<stdin>", line 1, in <module>
-
- File "/usr/local/lib/python3.7/dist-packages/decorator.py", line 232, in fun
-
- return caller(func, *(extras + args), **kw)
-
- File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/wrapped_decorator.py", line 25, in __impl__
-
- return wrapped_func(*args, **kwargs)
-
- File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/inplace_utils.py", line 34, in __impl__
-
- return func(*args, **kwargs)
-
- File "/usr/local/lib/python3.7/dist-packages/paddle/nn/functional/activation.py", line 89, in elu_
-
- assert alpha >= 0., "elu_ only support alpha >= 0, please use elu instead."
-
+  File "<stdin>", line 1, in <module>
+  File "/usr/local/lib/python3.7/dist-packages/decorator.py", line 232, in fun
+    return caller(func, *(extras + args), **kw)
+  File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/wrapped_decorator.py", line 25, in __impl__
+    return wrapped_func(*args, **kwargs)
+  File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/inplace_utils.py", line 34, in __impl__
+    return func(*args, **kwargs)
+  File "/usr/local/lib/python3.7/dist-packages/paddle/nn/functional/activation.py", line 89, in elu_
+    assert alpha >= 0., "elu_ only support alpha >= 0, please use elu instead."
 AssertionError: elu_ only support alpha >= 0, please use elu instead.
+```
+</pre>
+</td>
+</tr>
+</table>
 
 ## 3. 训练框架（含分布式）
 

From 7f66707721173e0ca00f4ab8e1de1429d0b92f2b Mon Sep 17 00:00:00 2001
From: Chen Long <1300851984@qq.com>
Date: Sun, 1 May 2022 00:25:59 +0800
Subject: [PATCH 05/11] Update release_note_cn.md

---
 docs/release_note_cn.md | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/docs/release_note_cn.md b/docs/release_note_cn.md
index f3023a52264..3932d35dc0f 100644
--- a/docs/release_note_cn.md
+++ b/docs/release_note_cn.md
@@ -33,8 +33,6 @@
 
 ### 编译安装
 
-- 飞桨在 PIP 源上发布的默认安装包 CUDA 架构调整为11.2，如需安装其他 CUDA 版本的 PaddlePaddle，请移步[飞桨官网-安装](https://www.paddlepaddle.org.cn/install/quick)﻿进行下载安装。
-
 - 从 2.3.0-rc0 版本开始，飞桨对框架支持的 GPU 架构种类进行了调整和升级。
 
 ### 推理部署
@@ -51,7 +49,7 @@
 
 ### 框架架构
 
-- 这个版本中，我们在框架的执行器也做了大量工作，详情请见：[新动态图执行机制](https://ku.baidu-int.com/knowledge/HFVrC7hq1Q/pKzJfZczuc/7UhIeLfrn3/0rDW-MD4RXSfkx#anchor-088a55e0-b962-11ec-a8b3-f52dfa102ded) 与 [全新静态图执行器](https://ku.baidu-int.com/knowledge/HFVrC7hq1Q/pKzJfZczuc/7UhIeLfrn3/0rDW-MD4RXSfkx#anchor-e81120c0-c233-11ec-a2f2-c9306d79e3c2)。
+- 这个版本中，我们在框架的执行器也做了大量工作，详情请见：[新动态图执行机制](#%E6%96%B0%E5%8A%A8%E6%80%81%E5%9B%BE%E6%89%A7%E8%A1%8C%E6%9C%BA%E5%88%B6) 与 [全新静态图执行器](#%E5%85%A8%E6%96%B0%E9%9D%99%E6%80%81%E5%9B%BE%E6%89%A7%E8%A1%8C%E5%99%A8)。
 
 ## 2. 不兼容升级
 
@@ -2065,10 +2063,6 @@ AssertionError: elu_ only support alpha >= 0, please use elu instead.
 
 ### 编译安装
 
-- 飞桨在 PIP 源上发布的安装包 CUDA 架构调整为11.0，如需安装其他 CUDA 版本的 paddle，请到 [飞桨官网](https://www.paddlepaddle.org.cn/install/quick) 进行下载。
-
-- 飞桨2.3.0-rc0 PIP 源发布的 CUDA11.0的安装包新增了对 Ampere 架构的支持。GPU 架构为8.0或8.6的用户可以直接通过 `pip install paddlepaddle-gpu`的方式进行升级。
-
 - 从2.3.0-rc0版本开始，飞桨对框架支持的 GPU 架构种类进行了调整和升级。(更多请参考: [飞桨支持的 GPU 架构](https://www.paddlepaddle.org.cn/documentation/docs/zh/2.3rc/install/Tables.html#gpu))
 
 备注：

From 87125f9d6260823dcc22a7b3ee38a6944f89a5e5 Mon Sep 17 00:00:00 2001
From: Chen Long <1300851984@qq.com>
Date: Sun, 1 May 2022 00:26:56 +0800
Subject: [PATCH 06/11] Update release_note_en.md

---
 docs/release_note_en.md | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/docs/release_note_en.md b/docs/release_note_en.md
index 261faad72de..e9e51ea79a8 100644
--- a/docs/release_note_en.md
+++ b/docs/release_note_en.md
@@ -3,7 +3,7 @@
 
 ## 1. **Important Updates**
 
-We are pleased to announce the release the PaddlePaddle Framework V2.3.0-rc0. This version contains the following important updates.
+We are excited to release the PaddlePaddle Framework V2.3.0-rc0. This version contains the following highlights.
 
 ### API
 
@@ -22,7 +22,7 @@ We are pleased to announce the release the PaddlePaddle Framework V2.3.0-rc0. Th
 
 ### **Paddle** HIgh reusability operator l**ibrary**
 
-- We anounce PHI as the new Paddle HIgh reusability operator library. PHI provides Primitive API, enabling kernel reuse for operator development. As a refactored functional operator library, PHI aims to solve legacy problems that harm the framework's performance and reusability, in particular on the operator development. Such problems include inefficient ways of cross using operators, unclear operator interfaces and lacking direct calls to the operator library in C++. With PHI, new operators can be easily implemented by composing functions available in the functional library. The library provides over 200 C++ operator class APIs and nearly 500 kernels. Composing new operators through these built-in functions can greatly reduce the user's development effort. PHI supports different types of hardware (e.g., GPU and XPU). In addition, PHI is extensible with plugins for accommodating third party accelerators (such as NPU) in a low cost and reusable fashion. In short, PHI supports low level operator composabilty, the reuse of kernels through Primitives, and accelerators through plugins.
+- We announce PHI as the new Paddle HIgh reusability operator library. PHI provides Primitive API, enabling kernel reuse for operator development. As a refactored functional operator library, PHI aims to solve legacy problems that harm the framework's performance and reusability, in particular on the operator development. Such problems include inefficient ways of cross using operators, unclear operator interfaces and lacking direct calls to the operator library in C++. With PHI, new operators can be easily implemented by composing functions available in the functional library. The library provides over 200 C++ operator class APIs and nearly 500 kernels. Composing new operators through these built-in functions can greatly reduce the user's development effort. PHI supports different types of hardware (e.g., GPU and XPU). In addition, PHI is extensible with plugins for accommodating third party accelerators (such as NPU) in a low cost and reusable fashion. In short, PHI supports low level operator composability, the reuse of kernels through Primitives, and accelerators through plugins.
 
 ### **Distributed Training**
 
@@ -34,8 +34,6 @@ We are pleased to announce the release the PaddlePaddle Framework V2.3.0-rc0. Th
   
 
 ### **Compile and Install**
-
-- The CUDA architecture of the installation package on the PIP source is adjusted to V11.0. If you need to install other CUDA versions please visit the [PaddlePaddle website - Installation](https://www.paddlepaddle.org.cn/install/quick) to download and install.
   
 - From version 2.3.0-rc0, PaddlePaddle upgrades GPU architectures supported.
   
@@ -56,7 +54,7 @@ We are pleased to announce the release the PaddlePaddle Framework V2.3.0-rc0. Th
 
 ### **Framework Architecture**
 
-- In this version, we did a lot of work on the framework executor. For details, please see [New Dynamic Graph Execution Mechanism](https://ku.baidu-int.com/knowledge/HFVrC7hq1Q/pKzJfZczuc/7UhIeLfrn3/0rDW-MD4RXSfkx#anchor-088a55e0-b962-11ec-a8b3-f52dfa102ded) and [New Static Graph Executor](https://ku.baidu-int.com/knowledge/HFVrC7hq1Q/pKzJfZczuc/7UhIeLfrn3/0rDW-MD4RXSfkx#anchor-e81120c0-c233-11ec-a2f2-c9306d79e3c2).
+- In this version, we did a lot of work on the framework executor. For details, please see [New Dynamic Graph Execution Mechanism](#new-dynamic-graph-execution-mechanism) and [New Static Graph Executor](#new-static-graph-executor).
 
 ## **2. Incompatibility Upgrade**
 
@@ -2091,10 +2089,6 @@ In order to solve the problem that the original static graph executor of the Pad
 ## **5. Environment Adaptation**
 
 ### **Compile and Install**
-
-- The installation package CUDA released by PaddlePaddle on PIP source is adjusted to V11.0. If you need to install other CUDA versions of PaddlePaddle, please visit [PaddlePaddle official website](https://www.paddlepaddle.org.cn/install/quick) for downloading.
-  
-- PaddlePaddle 2.3.0-rc0 PIP source release of CUDA11.0 installer adds the support for Ampere architecture. Users with GPU architecture 8.0 or 8.6 can upgrade directly through `pip install paddlepaddle-gpu` .
   
 - From version 2.3.0-rc0, PaddlePaddle has adjusted and upgraded the types of GPU architectures supported by the framework. (For more information, please refer to: [GPU architectures supported by PaddlePaddle](https://www.paddlepaddle.org.cn/documentation/docs/zh/2.3rc/install/Tables.html#gpu))
   

From 7447771f2277d197b9e0cadad841ad96ae77a061 Mon Sep 17 00:00:00 2001
From: TCChenlong <1300851984@qq.com>
Date: Thu, 12 May 2022 22:34:22 +0800
Subject: [PATCH 07/11] update guides index

---
 .../basic_concept/index_cn.rst                |  23 -
 .../basic_concept/index_en.rst                |  20 -
 .../02_paddle2.0_develop/06_device_cn.ipynb   | 405 ------------------
 docs/guides/02_paddle2.0_develop/index_cn.rst |  31 --
 docs/guides/03_VisualDL/index_cn.rst          |  15 -
 docs/guides/03_VisualDL/index_en.rst          |   8 -
 docs/guides/08_api_mapping/index_cn.rst       |  16 -
 .../autograd_cn.rst                           |  10 +-
 .../customize_cn.ipynb}                       |   0
 .../gradient_clip_cn.rst                      |   0
 .../gradient_clip_en.rst                      |   0
 .../images}/autograd_image_3-1.png            | Bin
 .../images}/autograd_image_4-1.png            | Bin
 .../images}/autograd_image_4-2.png            | Bin
 .../images}/autograd_image_4-3.png            | Bin
 .../images}/autograd_image_4-4.png            | Bin
 docs/guides/advanced/index_cn.rst             |  22 +
 docs/guides/advanced/index_en.rst             |  14 +
 .../layer_and_model_cn.md                     |   0
 .../layer_and_model_en.md                     |   0
 .../model_to_onnx_cn.rst}                     |   4 +-
 .../{03_VisualDL => advanced}/visualdl_cn.md  |   0
 .../{03_VisualDL => advanced}/visualdl_en.md  |   0
 .../visualdl_usage_cn.md                      |   0
 .../visualdl_usage_en.md                      |   0
 .../data_load_cn.ipynb}                       |   0
 .../data_preprocessing_cn.ipynb}              |   0
 .../images/Axis_2.0.png                       | Bin
 .../images/Tensor_2.0.png                     | Bin
 .../images/Tensor_broadcast.png               | Bin
 .../images/data_pipeline.png                  | Bin
 .../images/data_preprocessing.png             | Bin
 .../images/lenet.png                          | Bin
 .../images/mnist.png                          | Bin
 .../images/model.png                          | Bin
 .../images/model_develop_flow.png             | Bin
 .../images/paddle_jit_save_load_2.1.png       | Bin
 .../images/paddle_save_load_2.1.png           | Bin
 docs/guides/beginner/index_cn.rst             |  28 ++
 docs/guides/beginner/index_en.rst             |  12 +
 .../model_cn.ipynb}                           |   0
 .../model_save_load_cn.rst}                   |   0
 .../quick_start_cn.ipynb}                     |   0
 .../tensor_cn.md}                             |   0
 .../tensor_en.md}                             |   0
 .../train_eval_predict_cn.ipynb}              |   0
 .../hardware_info_cn.md                       |   0
 .../index_cn.rst                              |   0
 .../ipu_docs/index_cn.rst                     |   0
 .../ipu_docs/infer_example_cn.md              |   0
 .../ipu_docs/paddle_install_cn.md             |   0
 .../ipu_docs/train_example_cn.md              |   0
 .../npu_docs/index_cn.rst                     |   0
 .../npu_docs/paddle_install_cn.md             |   0
 .../npu_docs/train_example_cn.md              |   0
 .../rocm_docs/index_cn.rst                    |   0
 .../rocm_docs/infer_example_cn.md             |   0
 .../rocm_docs/paddle_install_cn.md            |   0
 .../rocm_docs/paddle_rocm_cn.md               |   0
 .../rocm_docs/train_example_cn.md             |   0
 .../xpu_docs/index_cn.rst                     |   0
 .../xpu_docs/inference_install_example_cn.md  |   0
 .../xpu_docs/paddle_2.0_xpu2_cn.md            |   0
 .../xpu_docs/paddle_2.0_xpu_cn.md             |   0
 .../xpu_docs/paddle_install_cn.md             |   0
 .../xpu_docs/paddle_install_xpu2_cn.md        |   0
 .../xpu_docs/train_example_cn.md              |   0
 .../xpu_docs/train_example_xpu2_cn.md         |   0
 docs/guides/index_cn.rst                      |  38 +-
 docs/guides/index_en.rst                      |  24 +-
 .../images/inference_ecosystem.png            | Bin
 .../index_cn.rst                              |   0
 .../index_en.rst                              |   0
 .../inference/images/inference.png            | Bin
 .../inference/images/paddlepaddle.png         | Bin
 .../inference/images/wechat.png               | Bin
 .../inference/inference_cn.md                 |   0
 .../mobile/mobile_index_cn.md                 |   0
 .../paddleslim/paddle_slim_cn.md              |   0
 .../paddleslim/paddle_slim_en.rst             |   0
 .../basic_usage_cn.md                         |   0
 .../basic_usage_en.md                         |   0
 .../case_analysis_cn.md                       |   0
 .../debugging_cn.md                           |   0
 .../debugging_en.md                           |   0
 .../grammar_list_cn.md                        |   0
 .../grammar_list_en.md                        |   0
 .../images/c++_error_log.png                  | Bin
 .../images/convert_cond.png                   | Bin
 .../images/dy2stat_error_log.png              | Bin
 .../images/dygraph_export.png                 | Bin
 .../images/dygraph_to_static.png              | Bin
 .../images/original_error_log.png             | Bin
 .../images/pdb_cmd.png                        | Bin
 .../images/pdb_cmd_en.png                     | Bin
 .../images/revise_suggestion.png              | Bin
 .../images/slice.png                          | Bin
 .../images/static_export.png                  | Bin
 .../images/to_static_export.png               | Bin
 .../images/to_static_train.png                | Bin
 .../index_cn.rst                              |   0
 .../index_en.rst                              |   0
 .../principle_cn.md                           |   0
 .../index_cn.rst                              |  11 +-
 .../index_en.rst                              |   0
 .../load_old_format_model_cn.rst              |   0
 .../migration_cn.rst                          |   0
 .../paddle_api_mapping_cn.rst                 |   0
 .../pytorch_api_mapping_cn.md                 |   0
 .../update_cn.md                              |   0
 .../update_en.md                              |   0
 .../guides/{07_new_op => new_op}/index_cn.rst |   0
 .../guides/{07_new_op => new_op}/index_en.rst |   0
 .../kernel_primitive_api/add_example_cn.md    |   0
 .../kernel_primitive_api/add_example_en.md    |   0
 .../api_description_cn.rst                    |   0
 .../api_description_en.rst                    |   0
 .../kernel_primitive_api/compute_api_cn.md    |   0
 .../kernel_primitive_api/compute_api_en.md    |   0
 .../kernel_primitive_api/example_cn.rst       |   0
 .../kernel_primitive_api/example_en.rst       |   0
 .../kernel_primitive_api/functor_api_cn.md    |   0
 .../kernel_primitive_api/functor_api_en.md    |   0
 .../images/compute_reduce.png                 | Bin
 .../images/example_add.png                    | Bin
 .../images/example_reduce.png                 | Bin
 .../images/io_read_data.png                   | Bin
 .../images/io_read_data_broadcast.png         | Bin
 .../images/io_read_data_broadcast_stride.png  | Bin
 .../images/io_read_data_reduce.png            | Bin
 .../images/io_read_data_stride.png            | Bin
 .../images/io_write_data.png                  | Bin
 .../images/io_write_data_stride.png           | Bin
 .../kernel_primitive_api/index_cn.rst         |   0
 .../kernel_primitive_api/index_en.rst         |   0
 .../kernel_primitive_api/io_api_cn.md         |   0
 .../kernel_primitive_api/io_api_en.md         |   0
 .../kernel_primitive_api/reduce_example_cn.md |   0
 .../kernel_primitive_api/reduce_example_en.md |   0
 .../{07_new_op => new_op}/new_custom_op_cn.md |   0
 .../{07_new_op => new_op}/new_python_op_cn.md |   0
 .../amp_cn.md                                 |   0
 .../amp_en.md                                 |   0
 .../guides/performance_improving/index_cn.rst |   9 +-
 144 files changed, 122 insertions(+), 568 deletions(-)
 delete mode 100644 docs/guides/01_paddle2.0_introduction/basic_concept/index_cn.rst
 delete mode 100644 docs/guides/01_paddle2.0_introduction/basic_concept/index_en.rst
 delete mode 100644 docs/guides/02_paddle2.0_develop/06_device_cn.ipynb
 delete mode 100644 docs/guides/02_paddle2.0_develop/index_cn.rst
 delete mode 100644 docs/guides/03_VisualDL/index_cn.rst
 delete mode 100644 docs/guides/03_VisualDL/index_en.rst
 delete mode 100644 docs/guides/08_api_mapping/index_cn.rst
 rename docs/guides/{01_paddle2.0_introduction/basic_concept => advanced}/autograd_cn.rst (97%)
 rename docs/guides/{02_paddle2.0_develop/07_customize_cn.ipynb => advanced/customize_cn.ipynb} (100%)
 rename docs/guides/{01_paddle2.0_introduction/basic_concept => advanced}/gradient_clip_cn.rst (100%)
 rename docs/guides/{01_paddle2.0_introduction/basic_concept => advanced}/gradient_clip_en.rst (100%)
 rename docs/guides/{01_paddle2.0_introduction/basic_concept/autograd_image => advanced/images}/autograd_image_3-1.png (100%)
 rename docs/guides/{01_paddle2.0_introduction/basic_concept/autograd_image => advanced/images}/autograd_image_4-1.png (100%)
 rename docs/guides/{01_paddle2.0_introduction/basic_concept/autograd_image => advanced/images}/autograd_image_4-2.png (100%)
 rename docs/guides/{01_paddle2.0_introduction/basic_concept/autograd_image => advanced/images}/autograd_image_4-3.png (100%)
 rename docs/guides/{01_paddle2.0_introduction/basic_concept/autograd_image => advanced/images}/autograd_image_4-4.png (100%)
 create mode 100644 docs/guides/advanced/index_cn.rst
 create mode 100644 docs/guides/advanced/index_en.rst
 rename docs/guides/{01_paddle2.0_introduction/basic_concept => advanced}/layer_and_model_cn.md (100%)
 rename docs/guides/{01_paddle2.0_introduction/basic_concept => advanced}/layer_and_model_en.md (100%)
 rename docs/guides/{02_paddle2.0_develop/09_model_to_onnx_cn.rst => advanced/model_to_onnx_cn.rst} (99%)
 rename docs/guides/{03_VisualDL => advanced}/visualdl_cn.md (100%)
 rename docs/guides/{03_VisualDL => advanced}/visualdl_en.md (100%)
 rename docs/guides/{03_VisualDL => advanced}/visualdl_usage_cn.md (100%)
 rename docs/guides/{03_VisualDL => advanced}/visualdl_usage_en.md (100%)
 rename docs/guides/{02_paddle2.0_develop/02_data_load_cn.ipynb => beginner/data_load_cn.ipynb} (100%)
 rename docs/guides/{02_paddle2.0_develop/03_data_preprocessing_cn.ipynb => beginner/data_preprocessing_cn.ipynb} (100%)
 rename docs/guides/{01_paddle2.0_introduction/basic_concept => beginner}/images/Axis_2.0.png (100%)
 rename docs/guides/{01_paddle2.0_introduction/basic_concept => beginner}/images/Tensor_2.0.png (100%)
 rename docs/guides/{01_paddle2.0_introduction/basic_concept => beginner}/images/Tensor_broadcast.png (100%)
 rename docs/guides/{02_paddle2.0_develop => beginner}/images/data_pipeline.png (100%)
 rename docs/guides/{02_paddle2.0_develop => beginner}/images/data_preprocessing.png (100%)
 rename docs/guides/{02_paddle2.0_develop => beginner}/images/lenet.png (100%)
 rename docs/guides/{02_paddle2.0_develop => beginner}/images/mnist.png (100%)
 rename docs/guides/{02_paddle2.0_develop => beginner}/images/model.png (100%)
 rename docs/guides/{02_paddle2.0_develop => beginner}/images/model_develop_flow.png (100%)
 rename docs/guides/{02_paddle2.0_develop => beginner}/images/paddle_jit_save_load_2.1.png (100%)
 rename docs/guides/{02_paddle2.0_develop => beginner}/images/paddle_save_load_2.1.png (100%)
 create mode 100644 docs/guides/beginner/index_cn.rst
 create mode 100644 docs/guides/beginner/index_en.rst
 rename docs/guides/{02_paddle2.0_develop/04_model_cn.ipynb => beginner/model_cn.ipynb} (100%)
 rename docs/guides/{02_paddle2.0_develop/08_model_save_load_cn.rst => beginner/model_save_load_cn.rst} (100%)
 rename docs/guides/{02_paddle2.0_develop/01_quick_start_cn.ipynb => beginner/quick_start_cn.ipynb} (100%)
 rename docs/guides/{01_paddle2.0_introduction/basic_concept/tensor_introduction_cn.md => beginner/tensor_cn.md} (100%)
 rename docs/guides/{01_paddle2.0_introduction/basic_concept/tensor_introduction_en.md => beginner/tensor_en.md} (100%)
 rename docs/guides/{02_paddle2.0_develop/05_train_eval_predict_cn.ipynb => beginner/train_eval_predict_cn.ipynb} (100%)
 rename docs/guides/{09_hardware_support => hardware_support}/hardware_info_cn.md (100%)
 rename docs/guides/{09_hardware_support => hardware_support}/index_cn.rst (100%)
 rename docs/guides/{09_hardware_support => hardware_support}/ipu_docs/index_cn.rst (100%)
 rename docs/guides/{09_hardware_support => hardware_support}/ipu_docs/infer_example_cn.md (100%)
 rename docs/guides/{09_hardware_support => hardware_support}/ipu_docs/paddle_install_cn.md (100%)
 rename docs/guides/{09_hardware_support => hardware_support}/ipu_docs/train_example_cn.md (100%)
 rename docs/guides/{09_hardware_support => hardware_support}/npu_docs/index_cn.rst (100%)
 rename docs/guides/{09_hardware_support => hardware_support}/npu_docs/paddle_install_cn.md (100%)
 rename docs/guides/{09_hardware_support => hardware_support}/npu_docs/train_example_cn.md (100%)
 rename docs/guides/{09_hardware_support => hardware_support}/rocm_docs/index_cn.rst (100%)
 rename docs/guides/{09_hardware_support => hardware_support}/rocm_docs/infer_example_cn.md (100%)
 rename docs/guides/{09_hardware_support => hardware_support}/rocm_docs/paddle_install_cn.md (100%)
 rename docs/guides/{09_hardware_support => hardware_support}/rocm_docs/paddle_rocm_cn.md (100%)
 rename docs/guides/{09_hardware_support => hardware_support}/rocm_docs/train_example_cn.md (100%)
 rename docs/guides/{09_hardware_support => hardware_support}/xpu_docs/index_cn.rst (100%)
 rename docs/guides/{09_hardware_support => hardware_support}/xpu_docs/inference_install_example_cn.md (100%)
 rename docs/guides/{09_hardware_support => hardware_support}/xpu_docs/paddle_2.0_xpu2_cn.md (100%)
 rename docs/guides/{09_hardware_support => hardware_support}/xpu_docs/paddle_2.0_xpu_cn.md (100%)
 rename docs/guides/{09_hardware_support => hardware_support}/xpu_docs/paddle_install_cn.md (100%)
 rename docs/guides/{09_hardware_support => hardware_support}/xpu_docs/paddle_install_xpu2_cn.md (100%)
 rename docs/guides/{09_hardware_support => hardware_support}/xpu_docs/train_example_cn.md (100%)
 rename docs/guides/{09_hardware_support => hardware_support}/xpu_docs/train_example_xpu2_cn.md (100%)
 rename docs/guides/{05_inference_deployment => infer}/images/inference_ecosystem.png (100%)
 rename docs/guides/{05_inference_deployment => infer}/index_cn.rst (100%)
 rename docs/guides/{05_inference_deployment => infer}/index_en.rst (100%)
 rename docs/guides/{05_inference_deployment => infer}/inference/images/inference.png (100%)
 rename docs/guides/{05_inference_deployment => infer}/inference/images/paddlepaddle.png (100%)
 rename docs/guides/{05_inference_deployment => infer}/inference/images/wechat.png (100%)
 rename docs/guides/{05_inference_deployment => infer}/inference/inference_cn.md (100%)
 rename docs/guides/{05_inference_deployment => infer}/mobile/mobile_index_cn.md (100%)
 rename docs/guides/{05_inference_deployment => infer}/paddleslim/paddle_slim_cn.md (100%)
 rename docs/guides/{05_inference_deployment => infer}/paddleslim/paddle_slim_en.rst (100%)
 rename docs/guides/{04_dygraph_to_static => jit}/basic_usage_cn.md (100%)
 rename docs/guides/{04_dygraph_to_static => jit}/basic_usage_en.md (100%)
 rename docs/guides/{04_dygraph_to_static => jit}/case_analysis_cn.md (100%)
 rename docs/guides/{04_dygraph_to_static => jit}/debugging_cn.md (100%)
 rename docs/guides/{04_dygraph_to_static => jit}/debugging_en.md (100%)
 rename docs/guides/{04_dygraph_to_static => jit}/grammar_list_cn.md (100%)
 rename docs/guides/{04_dygraph_to_static => jit}/grammar_list_en.md (100%)
 rename docs/guides/{04_dygraph_to_static => jit}/images/c++_error_log.png (100%)
 rename docs/guides/{04_dygraph_to_static => jit}/images/convert_cond.png (100%)
 rename docs/guides/{04_dygraph_to_static => jit}/images/dy2stat_error_log.png (100%)
 rename docs/guides/{04_dygraph_to_static => jit}/images/dygraph_export.png (100%)
 rename docs/guides/{04_dygraph_to_static => jit}/images/dygraph_to_static.png (100%)
 rename docs/guides/{04_dygraph_to_static => jit}/images/original_error_log.png (100%)
 rename docs/guides/{04_dygraph_to_static => jit}/images/pdb_cmd.png (100%)
 rename docs/guides/{04_dygraph_to_static => jit}/images/pdb_cmd_en.png (100%)
 rename docs/guides/{04_dygraph_to_static => jit}/images/revise_suggestion.png (100%)
 rename docs/guides/{04_dygraph_to_static => jit}/images/slice.png (100%)
 rename docs/guides/{04_dygraph_to_static => jit}/images/static_export.png (100%)
 rename docs/guides/{04_dygraph_to_static => jit}/images/to_static_export.png (100%)
 rename docs/guides/{04_dygraph_to_static => jit}/images/to_static_train.png (100%)
 rename docs/guides/{04_dygraph_to_static => jit}/index_cn.rst (100%)
 rename docs/guides/{04_dygraph_to_static => jit}/index_en.rst (100%)
 rename docs/guides/{04_dygraph_to_static => jit}/principle_cn.md (100%)
 rename docs/guides/{01_paddle2.0_introduction => model_convert}/index_cn.rst (55%)
 rename docs/guides/{01_paddle2.0_introduction => model_convert}/index_en.rst (100%)
 rename docs/guides/{01_paddle2.0_introduction => model_convert}/load_old_format_model_cn.rst (100%)
 rename docs/guides/{01_paddle2.0_introduction => model_convert}/migration_cn.rst (100%)
 rename docs/guides/{08_api_mapping => model_convert}/paddle_api_mapping_cn.rst (100%)
 rename docs/guides/{08_api_mapping => model_convert}/pytorch_api_mapping_cn.md (100%)
 rename docs/guides/{01_paddle2.0_introduction => model_convert}/update_cn.md (100%)
 rename docs/guides/{01_paddle2.0_introduction => model_convert}/update_en.md (100%)
 rename docs/guides/{07_new_op => new_op}/index_cn.rst (100%)
 rename docs/guides/{07_new_op => new_op}/index_en.rst (100%)
 rename docs/guides/{07_new_op => new_op}/kernel_primitive_api/add_example_cn.md (100%)
 rename docs/guides/{07_new_op => new_op}/kernel_primitive_api/add_example_en.md (100%)
 rename docs/guides/{07_new_op => new_op}/kernel_primitive_api/api_description_cn.rst (100%)
 rename docs/guides/{07_new_op => new_op}/kernel_primitive_api/api_description_en.rst (100%)
 rename docs/guides/{07_new_op => new_op}/kernel_primitive_api/compute_api_cn.md (100%)
 rename docs/guides/{07_new_op => new_op}/kernel_primitive_api/compute_api_en.md (100%)
 rename docs/guides/{07_new_op => new_op}/kernel_primitive_api/example_cn.rst (100%)
 rename docs/guides/{07_new_op => new_op}/kernel_primitive_api/example_en.rst (100%)
 rename docs/guides/{07_new_op => new_op}/kernel_primitive_api/functor_api_cn.md (100%)
 rename docs/guides/{07_new_op => new_op}/kernel_primitive_api/functor_api_en.md (100%)
 rename docs/guides/{07_new_op => new_op}/kernel_primitive_api/images/compute_reduce.png (100%)
 rename docs/guides/{07_new_op => new_op}/kernel_primitive_api/images/example_add.png (100%)
 rename docs/guides/{07_new_op => new_op}/kernel_primitive_api/images/example_reduce.png (100%)
 rename docs/guides/{07_new_op => new_op}/kernel_primitive_api/images/io_read_data.png (100%)
 rename docs/guides/{07_new_op => new_op}/kernel_primitive_api/images/io_read_data_broadcast.png (100%)
 rename docs/guides/{07_new_op => new_op}/kernel_primitive_api/images/io_read_data_broadcast_stride.png (100%)
 rename docs/guides/{07_new_op => new_op}/kernel_primitive_api/images/io_read_data_reduce.png (100%)
 rename docs/guides/{07_new_op => new_op}/kernel_primitive_api/images/io_read_data_stride.png (100%)
 rename docs/guides/{07_new_op => new_op}/kernel_primitive_api/images/io_write_data.png (100%)
 rename docs/guides/{07_new_op => new_op}/kernel_primitive_api/images/io_write_data_stride.png (100%)
 rename docs/guides/{07_new_op => new_op}/kernel_primitive_api/index_cn.rst (100%)
 rename docs/guides/{07_new_op => new_op}/kernel_primitive_api/index_en.rst (100%)
 rename docs/guides/{07_new_op => new_op}/kernel_primitive_api/io_api_cn.md (100%)
 rename docs/guides/{07_new_op => new_op}/kernel_primitive_api/io_api_en.md (100%)
 rename docs/guides/{07_new_op => new_op}/kernel_primitive_api/reduce_example_cn.md (100%)
 rename docs/guides/{07_new_op => new_op}/kernel_primitive_api/reduce_example_en.md (100%)
 rename docs/guides/{07_new_op => new_op}/new_custom_op_cn.md (100%)
 rename docs/guides/{07_new_op => new_op}/new_python_op_cn.md (100%)
 rename docs/guides/{01_paddle2.0_introduction/basic_concept => performance_improving}/amp_cn.md (100%)
 rename docs/guides/{01_paddle2.0_introduction/basic_concept => performance_improving}/amp_en.md (100%)

diff --git a/docs/guides/01_paddle2.0_introduction/basic_concept/index_cn.rst b/docs/guides/01_paddle2.0_introduction/basic_concept/index_cn.rst
deleted file mode 100644
index fce226d57f3..00000000000
--- a/docs/guides/01_paddle2.0_introduction/basic_concept/index_cn.rst
+++ /dev/null
@@ -1,23 +0,0 @@
-###################
-基本概念
-###################
-
-你可以这里学习飞桨的基本概念：
-
-- `Tensor概念介绍 <./tensor_introduction_cn.html>`_ : 飞桨中数据的表示方式，Tensor概念介绍。
-- `层与模型 <./layer_and_model_cn.html>`_ : 飞桨中层与模型概念介绍。
-- `广播介绍 <./broadcasting_cn.html>`_ : 飞桨中广播概念的介绍。
-- `自动微分 <./autograd_cn.html>`_ : 飞桨中自动微分原理介绍。
-- `自动混合精度训练 <./amp_cn.html>`_ : 飞桨中自动混合精度功能介绍。
-- `梯度裁剪 <./gradient_clip_cn.html>`_ : 飞桨中梯度裁剪的方法介绍。
-
-..  toctree::
-    :hidden:
-
-    tensor_introduction_cn.md
-    layer_and_model_cn.md
-    broadcasting_cn.rst
-    autograd_cn.rst
-    amp_cn.md
-    gradient_clip_cn.rst
-    
\ No newline at end of file
diff --git a/docs/guides/01_paddle2.0_introduction/basic_concept/index_en.rst b/docs/guides/01_paddle2.0_introduction/basic_concept/index_en.rst
deleted file mode 100644
index f4d661af097..00000000000
--- a/docs/guides/01_paddle2.0_introduction/basic_concept/index_en.rst
+++ /dev/null
@@ -1,20 +0,0 @@
-########################
-Basic Concept
-########################
-
-You can start with studying basic concept of PaddlePaddle:
-
-- `Introduction to Tensor <tensor_introduction_en.html>`_ : Introduction of Tensor, which is the representation of data in Paddle.
-- `Models and Layers <layer_and_model_en.html>`_ : Introduction to models and layers in Paddle. 
-- `Broadcasting <./broadcasting_en.html>`_ : Introduction of broadcasting.
-- `Automatic Mixed Precision Training <./amp_en.html>`_ : Introduction of Automatic Mixed Precision Training with Paddle.
-- `Gradient Clip <./gradient_clip_en.html>`_ : Introduction of gradient clip methods in Paddle.
-
-..  toctree::
-    :hidden:
-
-    tensor_introduction_en.md
-    layer_and_model_en.md
-    broadcasting_en.rst
-    amp_en.md
-    gradient_clip_en.rst
diff --git a/docs/guides/02_paddle2.0_develop/06_device_cn.ipynb b/docs/guides/02_paddle2.0_develop/06_device_cn.ipynb
deleted file mode 100644
index 569173abb51..00000000000
--- a/docs/guides/02_paddle2.0_develop/06_device_cn.ipynb
+++ /dev/null
@@ -1,405 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "id": "cd536a25",
-   "metadata": {},
-   "source": [
-    "# 单机多卡训练\n",
-    "\n",
-    "随着深度学习的发展，模型和数据集越来越大，有时单张显卡无法满足训练任务的显存要求，或者单卡训练用时太久，影响训练速度，这些情况下需要用到多卡训练的方式。飞桨框架 2.0 增加 [paddle.distributed.spawn](../api/paddle/distributed/spawn_cn.html) 函数来启动单机多卡训练，同时原有的 [paddle.distributed.launch](../api/paddle/distributed/launch_cn.html) 的方式依然保留。\n",
-    "\n",
-    "## 一、launch启动\n",
-    "\n",
-    "### 1.1 高层API场景\n",
-    "\n",
-    "当调用 [paddle.Model](../api/paddle/Model_cn.html) 高层API来实现训练时，想要启动单机多卡训练非常简单，代码不需要做任何修改，只需要在启动时增加一下参数 `-m paddle.distributed.launch` 。\n",
-    "以MNIST为例，使用高层API的训练代码如下："
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "5a2702a8",
-   "metadata": {
-    "scrolled": true
-   },
-   "outputs": [],
-   "source": [
-    "import paddle\n",
-    "import numpy as np\n",
-    "from paddle.vision.transforms import ToTensor\n",
-    "\n",
-    "# 加载训练数据集和测试数据集\n",
-    "train_dataset = paddle.vision.datasets.MNIST(mode='train', transform=ToTensor())\n",
-    "test_dataset = paddle.vision.datasets.MNIST(mode='test', transform=ToTensor())\n",
-    "\n",
-    "# 使用 Sequential 模型组网\n",
-    "mnist = paddle.nn.Sequential(\n",
-    "    paddle.nn.Flatten(1, -1), \n",
-    "    paddle.nn.Linear(784, 512), \n",
-    "    paddle.nn.ReLU(), \n",
-    "    paddle.nn.Dropout(0.2), \n",
-    "    paddle.nn.Linear(512, 10)\n",
-    ")\n",
-    "\n",
-    "# 使用 paddle.Model 封装模型\n",
-    "model = paddle.Model(mnist)\n",
-    "\n",
-    "# 使用 Model.prepare 配置训练准备参数\n",
-    "model.prepare(optimizer=paddle.optimizer.Adam(parameters=model.parameters()), \n",
-    "              loss=paddle.nn.CrossEntropyLoss(), \n",
-    "              metrics=paddle.metric.Accuracy())\n",
-    "\n",
-    "# 使用 Model.fit 训练模型\n",
-    "model.fit(train_dataset, \n",
-    "          epochs=5, \n",
-    "          batch_size=64,\n",
-    "          verbose=1)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "847c28b9",
-   "metadata": {},
-   "source": [
-    "将上述代码保存为train.py，使用高层API启动多卡训练的命令如下："
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "3db288b6",
-   "metadata": {
-    "scrolled": true
-   },
-   "outputs": [],
-   "source": [
-    "# 单机单卡启动，默认使用第0号卡\n",
-    "! python train.py\n",
-    "# 单机多卡启动，默认使用当前可见的所有卡\n",
-    "! python -m paddle.distributed.launch train.py\n",
-    "# 单机多卡启动，设置当前使用的第0号和第1号卡\n",
-    "! python -m paddle.distributed.launch --gpus='0,1' train.py\n",
-    "# 单机多卡启动，设置当前使用第0号和第1号卡\n",
-    "! export CUDA_VISIBLE_DEVICES=0,1\n",
-    "! python -m paddle.distributed.launch train.py"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "3fcc2fdc-795d-400b-8f40-a69d88595558",
-   "metadata": {},
-   "source": [
-    "这里补充一段介绍这个方式启动后发生了什么？任务怎么分配到不同卡上的\n",
-    "另外针对这里应该会有常见的问题定位流程，补充一下介绍，有FAQ可以补一下到FAQ的链接。\n",
-    "\n",
-    "（待补充）"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "17552b14",
-   "metadata": {},
-   "source": [
-    "### 1.2 基础API场景\n",
-    "\n",
-    "如果使用基础API实现现训练，想要启动单机多卡训练，需要对单机单卡的代码进行3处修改，具体如下：\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "d4ebc36a",
-   "metadata": {
-    "scrolled": true
-   },
-   "outputs": [],
-   "source": [
-    "import paddle\n",
-    "from paddle.vision.transforms import ToTensor\n",
-    "# 第1处改动，导入分布式训练所需的包\n",
-    "import paddle.distributed as dist\n",
-    "# 加载数据集\n",
-    "train_dataset = paddle.vision.datasets.MNIST(mode='train', transform=ToTensor())\n",
-    "test_dataset = paddle.vision.datasets.MNIST(mode='test', transform=ToTensor())\n",
-    "# 第2处改动，初始化并行环境\n",
-    "dist.init_parallel_env()\n",
-    "\n",
-    "# 定义网络结构\n",
-    "mnist = paddle.nn.Sequential(\n",
-    "    paddle.nn.Flatten(1, -1),\n",
-    "    paddle.nn.Linear(784, 512),\n",
-    "    paddle.nn.ReLU(),\n",
-    "    paddle.nn.Dropout(0.2),\n",
-    "    paddle.nn.Linear(512, 10)\n",
-    ")\n",
-    "# 用 DataLoader 实现数据加载\n",
-    "train_loader = paddle.io.DataLoader(train_dataset, batch_size=32, shuffle=True)\n",
-    "\n",
-    "# 第3处改动，增加paddle.DataParallel封装\n",
-    "mnist = paddle.DataParallel(mnist)\n",
-    "mnist.train()\n",
-    "# 设置迭代次数\n",
-    "epochs = 5\n",
-    "# 设置优化器\n",
-    "optim = paddle.optimizer.Adam(parameters=mnist.parameters())\n",
-    "for epoch in range(epochs):\n",
-    "    for batch_id, data in enumerate(train_loader()):\n",
-    "        x_data = data[0]            # 训练数据\n",
-    "        y_data = data[1]            # 训练数据标签\n",
-    "        predicts = mnist(x_data)    # 预测结果\n",
-    "        # 计算损失 等价于 prepare 中loss的设置\n",
-    "        loss = paddle.nn.functional.cross_entropy(predicts, y_data)\n",
-    "        # 计算准确率 等价于 prepare 中metrics的设置\n",
-    "        acc = paddle.metric.accuracy(predicts, y_data)\n",
-    "        # 下面的反向传播、打印训练信息、更新参数、梯度清零都被封装到 Model.fit() 中\n",
-    "        # 反向传播\n",
-    "        loss.backward()\n",
-    "        if (batch_id+1) % 1800 == 0:\n",
-    "            print(\"epoch: {}, batch_id: {}, loss is: {}, acc is: {}\".format(epoch, batch_id, loss.numpy(), acc.numpy()))\n",
-    "        # 更新参数\n",
-    "        optim.step()\n",
-    "        # 梯度清零\n",
-    "        optim.clear_grad()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "abaa5a5e",
-   "metadata": {},
-   "source": [
-    "修改完后保存文件为train.py，然后使用跟高层API相同的启动方式即可。\n",
-    "\n",
-    "补充：\n",
-    "\n",
-    "这里基础API实现的效果和高层一模一样吗？完全没有差异？有没有基础API可以更灵活应用的场景？为什么高层不用补额外的配置代码？\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "56786ff8",
-   "metadata": {
-    "scrolled": true
-   },
-   "outputs": [],
-   "source": [
-    "# 单机多卡启动，默认使用当前可见的所有卡\n",
-    "! python -m paddle.distributed.launch train.py\n",
-    "# 单机多卡启动，设置当前使用的第0号和第1号卡\n",
-    "! python -m paddle.distributed.launch --gpus '0,1' train.py\n",
-    "# 单机多卡启动，设置当前使用第0号和第1号卡\n",
-    "! export CUDA_VISIBLE_DEVICES=0,1\n",
-    "! python -m paddle.distributed.launch train.py"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f5045368",
-   "metadata": {},
-   "source": [
-    "## 二、spawn启动\n",
-    "\n",
-    " `launch` 方式启动训练，以文件为单位启动多进程，需要用户在启动时调用 `paddle.distributed.launch` ，对于进程的管理要求较高。飞桨框架2.0版本增加了 `spawn` 启动方式，可以更好地控制进程，在日志打印、训练退出时更友好。\n",
-    " \n",
-    "（补充“对进程的管理要求较高”、“可以更好地控制进程，在日志打印、训练退出时更友好”这几句话的理解）"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "199558fc-9c58-4ff5-8e9e-1bbddab9662d",
-   "metadata": {},
-   "source": [
-    "### 2.1 高层API场景\n",
-    "\n",
-    "使用 `spawn` 方式启动多卡训练时，需要先将训练的过程封装成一个函数，将超参数设为该函数的参数传入训练流程中。代码如下所示："
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "0ed54b0d-dcbb-4c52-b2aa-63bae432d79a",
-   "metadata": {
-    "scrolled": true
-   },
-   "outputs": [],
-   "source": [
-    "import paddle\n",
-    "import numpy as np\n",
-    "from paddle.vision.transforms import ToTensor\n",
-    "# 高层API场景使用spwan方式时，需要导入paddle.distributed包\n",
-    "import paddle.distributed as dist\n",
-    "\n",
-    "def train():\n",
-    "    # 加载训练数据集和测试数据集\n",
-    "    train_dataset = paddle.vision.datasets.MNIST(mode='train', transform=ToTensor())\n",
-    "    test_dataset = paddle.vision.datasets.MNIST(mode='test', transform=ToTensor())\n",
-    "\n",
-    "    # 使用 Sequential 模型组网\n",
-    "    mnist = paddle.nn.Sequential(\n",
-    "        paddle.nn.Flatten(1, -1), \n",
-    "        paddle.nn.Linear(784, 512), \n",
-    "        paddle.nn.ReLU(), \n",
-    "        paddle.nn.Dropout(0.2), \n",
-    "        paddle.nn.Linear(512, 10)\n",
-    "    )\n",
-    "\n",
-    "    # 使用 paddle.Model 封装模型\n",
-    "    model = paddle.Model(mnist)\n",
-    "\n",
-    "    # 使用 Model.prepare 配置训练准备参数\n",
-    "    model.prepare(optimizer=paddle.optimizer.Adam(parameters=model.parameters()), \n",
-    "                  loss=paddle.nn.CrossEntropyLoss(), \n",
-    "                  metrics=paddle.metric.Accuracy())\n",
-    "\n",
-    "    # 使用 Model.fit 训练模型\n",
-    "    model.fit(train_dataset, \n",
-    "              epochs=5, \n",
-    "              batch_size=64,\n",
-    "              verbose=1)\n",
-    "\n",
-    "\n",
-    "# 传入训练函数，指定进程数并指定当前使用的卡号\n",
-    "# （这里我测试使用多卡会报错，只能单卡跑）\n",
-    "if __name__ == '__main__':\n",
-    "    dist.spawn(train, nprocs=1, gpus='0')"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "c2ae28fc-5084-4393-a622-cf60d5e9df00",
-   "metadata": {},
-   "source": [
-    "### 2.2 基础API场景\n",
-    "\n",
-    "与高层API场景类似，使用 `spawn` 方式启动多卡训练时，需要先将训练的过程封装成一个函数，将超参数设为该函数的参数传入训练流程中。同时，也需要与 `paddle.distributed.launch` 过程类似，进行三处改动：导入分布式包、初始化并行环境和将模型封装。具体代码如下："
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "4cbcdcdb",
-   "metadata": {
-    "scrolled": true
-   },
-   "outputs": [],
-   "source": [
-    "from __future__ import print_function\n",
-    "\n",
-    "import paddle\n",
-    "import paddle.nn as nn\n",
-    "import paddle.optimizer as opt\n",
-    "# 第1处改动，导入分布式训练所需的包\n",
-    "import paddle.distributed as dist\n",
-    "\n",
-    "def train(print_result=False):\n",
-    "    # 加载数据集\n",
-    "    train_dataset = paddle.vision.datasets.MNIST(mode='train', transform=ToTensor())\n",
-    "    test_dataset = paddle.vision.datasets.MNIST(mode='test', transform=ToTensor())\n",
-    "    # 第2处改动，初始化并行环境\n",
-    "    dist.init_parallel_env()\n",
-    "\n",
-    "    # 定义网络结构\n",
-    "    mnist = paddle.nn.Sequential(\n",
-    "        paddle.nn.Flatten(1, -1),\n",
-    "        paddle.nn.Linear(784, 512),\n",
-    "        paddle.nn.ReLU(),\n",
-    "        paddle.nn.Dropout(0.2),\n",
-    "        paddle.nn.Linear(512, 10)\n",
-    "    )\n",
-    "    # 用 DataLoader 实现数据加载\n",
-    "    train_loader = paddle.io.DataLoader(train_dataset, batch_size=32, shuffle=True)\n",
-    "\n",
-    "    # 第3处改动，增加paddle.DataParallel封装\n",
-    "    mnist = paddle.DataParallel(mnist)\n",
-    "    mnist.train()\n",
-    "    # 设置迭代次数\n",
-    "    epochs = 5\n",
-    "    # 设置优化器\n",
-    "    optim = paddle.optimizer.Adam(parameters=mnist.parameters())\n",
-    "    for epoch in range(epochs):\n",
-    "        for batch_id, data in enumerate(train_loader()):\n",
-    "            x_data = data[0]            # 训练数据\n",
-    "            y_data = data[1]            # 训练数据标签\n",
-    "            predicts = mnist(x_data)    # 预测结果\n",
-    "            # 计算损失 等价于 prepare 中loss的设置\n",
-    "            loss = paddle.nn.functional.cross_entropy(predicts, y_data)\n",
-    "            # 计算准确率 等价于 prepare 中metrics的设置\n",
-    "            acc = paddle.metric.accuracy(predicts, y_data)\n",
-    "            # 下面的反向传播、打印训练信息、更新参数、梯度清零都被封装到 Model.fit() 中\n",
-    "            # 反向传播\n",
-    "            loss.backward()\n",
-    "            if (batch_id+1) % 1800 == 0 and print_reslut:\n",
-    "                print(\"epoch: {}, batch_id: {}, loss is: {}, acc is: {}\".format(epoch, batch_id, loss.numpy(), acc.numpy()))\n",
-    "            # 更新参数\n",
-    "            optim.step()\n",
-    "            # 梯度清零\n",
-    "            optim.clear_grad()\n",
-    "\n",
-    "# 传入训练函数、参数、指定进程数并指定当前使用的卡号\n",
-    "if __name__ == '__main__':\n",
-    "    dist.spawn(train, args=(True,), nprocs=2, gpus='4,5')"
-   ]
-  },
-  {
-   "cell_type": "raw",
-   "id": "ea540e08",
-   "metadata": {},
-   "source": [
-    "上述代码在本地运行结果如下：\n",
-    "init nccl context nranks: 2 local rank: 0 gpu id: 4 ring id: 0\n",
-    "init nccl context nranks: 2 local rank: 1 gpu id: 5 ring id: 0\n",
-    "Please NOTE: device: 5, GPU Compute Capability: 7.0, Driver API Version: 10.2, Runtime API Version: 10.2\n",
-    "Please NOTE: device: 4, GPU Compute Capability: 7.0, Driver API Version: 10.2, Runtime API Version: 10.2\n",
-    "device: 4, cuDNN Version: 7.6.\n",
-    "device: 5, cuDNN Version: 7.6.\n",
-    "loss: [2.041318]\n",
-    "loss: [4.749344]"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "1e47c613",
-   "metadata": {},
-   "source": [
-    "调用 [paddle.distributed.spawn](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/distributed/spawn_cn.html) 来启动多卡训练时，可根据需要设置参数：\n",
-    "* func：由 spawn 方法启动的进程所调用的目标函数。\n",
-    "* args：传入目标函数 func 的参数。\n",
-    "* nprocs：启动进程的数目。当仅需要使用部分可见的GPU设备进行训练时，可设置该参数指定GPU数。例如：当前机器有8张GPU卡 {0,1,2,3,4,5,6,7}，此时会使用前两张卡 {0,1}；或者当前机器通过配置环境变量 CUDA_VISIBLE_DEVICES=4,5,6,7，仅使4张GPU卡可见，此时会使用可见的前两张卡 {4,5}。若不设置该参数，默认使用所有可见的GPU设备训练。\n",
-    "* gpus：指定训练使用的GPU ID。例如 gpus='4,5' 可指定使用第4号卡和第5号卡。若不设置该参数，默认使用GPU ID序号较小的GPU。"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "a132576e-7021-4fcc-a56d-d5ecb4bccebd",
-   "metadata": {},
-   "source": [
-    "# 三、总结\n",
-    "\n",
-    "待补充"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "py35-paddle1.2.0"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.7.4"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
diff --git a/docs/guides/02_paddle2.0_develop/index_cn.rst b/docs/guides/02_paddle2.0_develop/index_cn.rst
deleted file mode 100644
index a85c778bf83..00000000000
--- a/docs/guides/02_paddle2.0_develop/index_cn.rst
+++ /dev/null
@@ -1,31 +0,0 @@
-###################
-模型开发
-###################
-
-本部分将介绍飞桨框架2.0的开发流程。
-
-为了快速上手飞桨框架2.0，你可以参考 `10分钟快速上手飞桨 <./01_quick_start_cn.html>`_ ;
-
-当完成了快速上手的任务后，下面这些模块会阐述如何用飞桨框架2.0，实现深度学习过程中的每一步。具体包括：
-
-- `数据集定义与加载 <./02_data_load_cn.html>`_ : 飞桨框架数据加载的方式，主要为\ ``paddle.io.Dataset + paddle.io.DataLoader``\ ，以及飞桨内置数据集的介绍。
-- `数据预处理 <./03_data_preprocessing_cn.html>`_ : 飞桨框架数据预处理的方法，主要是\ ``paddle.vision.transform.*``\ 。
-- `模型组网 <./04_model_cn.html>`_ : 飞桨框架组网API的介绍，主要是\ ``paddle.nn.*``\ ，然后是飞桨框架组网方式的介绍，即 Sequential 的组网与 SubClass 的组网。
-- `训练与预测 <./05_train_eval_predict_cn.html>`_ : 飞桨框架训练与预测的方法，有两种方式，一种是使用高层API\ ``paddle.Model``\ 封装模型，然后调用\ ``model.fit()、model.evaluate()、model.predict()``\ 完成模型的训练与预测；另一种是用基础API完成模型的训练与预测，也就是对高层API的拆解。
-- `单机多卡训练 <./06_device_cn.html>`_ : 飞桨框架在单机单卡、单机多卡的场景下完成模型的训练与预测。
-- `自定义指标 <./07_customize_cn.html>`_ : 飞桨框架自定义指标的方法，主要包含自定义Loss、自定义Metric与自定义Callback。
-- `模型的加载与保存 <./08_model_save_load_cn.html>`_ : 飞桨框架模型的加载与保存体系介绍。
-- `模型转ONNX协议 <./09_model_to_onnx_cn.html>`_ : 飞桨框架模型转换为ONNX格式介绍。
-
-.. toctree::
-    :hidden:
-
-    01_quick_start_cn.ipynb
-    02_data_load_cn.ipynb
-    03_data_preprocessing_cn.ipynb
-    04_model_cn.ipynb
-    05_train_eval_predict_cn.rst
-    06_device_cn.rst
-    07_customize_cn.rst
-    08_model_save_load_cn.rst
-    09_model_to_onnx_cn.rst
diff --git a/docs/guides/03_VisualDL/index_cn.rst b/docs/guides/03_VisualDL/index_cn.rst
deleted file mode 100644
index d6c6a98d570..00000000000
--- a/docs/guides/03_VisualDL/index_cn.rst
+++ /dev/null
@@ -1,15 +0,0 @@
-.. PaddlePaddle Fluid documentation master file, created by
-   sphinx-quickstart on Thu Jun  7 17:04:53 2018.
-   You can adapt this file completely to your liking, but it should at least
-   contain the root `toctree` directive.
-
-##############
-VisualDL 工具
-##############
-
-..  toctree::
-    :maxdepth: 1
-
-
-    visualdl_cn.md
-    visualdl_usage_cn.md
diff --git a/docs/guides/03_VisualDL/index_en.rst b/docs/guides/03_VisualDL/index_en.rst
deleted file mode 100644
index 840edeb8e5e..00000000000
--- a/docs/guides/03_VisualDL/index_en.rst
+++ /dev/null
@@ -1,8 +0,0 @@
-VisualDL Tools
-==========================
-
-..  toctree::
-    :maxdepth: 1
-
-    visualdl_en.md
-    visualdl_usage_en.md
diff --git a/docs/guides/08_api_mapping/index_cn.rst b/docs/guides/08_api_mapping/index_cn.rst
deleted file mode 100644
index 674633db5c0..00000000000
--- a/docs/guides/08_api_mapping/index_cn.rst
+++ /dev/null
@@ -1,16 +0,0 @@
-.. _cn_guides_others_information:
-
-##########
-算子映射
-##########
-
-你可以通过以下内容，了解飞桨算子映射信息:
-
-- `Paddle API映射表 <./paddle_api_mapping_cn.html>`_ : 说明 Paddle 1.8 版本与 Paddle 2.0 API对应关系。
-- `PyTorch API映射表 <./pytorch_api_mapping_cn.html>`_ : 说明 PyTorch 1.8 版本与 Paddle 2.0 API对应关系。
-
-..  toctree::
-    :hidden:
-
-    paddle_api_mapping_cn.rst
-    pytorch_api_mapping_cn.rst
diff --git a/docs/guides/01_paddle2.0_introduction/basic_concept/autograd_cn.rst b/docs/guides/advanced/autograd_cn.rst
similarity index 97%
rename from docs/guides/01_paddle2.0_introduction/basic_concept/autograd_cn.rst
rename to docs/guides/advanced/autograd_cn.rst
index fcf36e1d774..1bce9208dba 100644
--- a/docs/guides/01_paddle2.0_introduction/basic_concept/autograd_cn.rst
+++ b/docs/guides/advanced/autograd_cn.rst
@@ -136,7 +136,7 @@ PaddlePaddle的神经网络核心是自动微分，本篇文章主要为你介
 
 假设上面创建的\ ``x``\ 和\ ``y``\ 分别是神经网络中的参数，\ ``z``\ 为神经网络的损失值\ ``loss``\ 。
 
-.. image:: autograd_image/autograd_image_3-1.png
+.. image:: images/autograd_image_3-1.png
 
 对z调用\ ``backward()``\ ，飞桨即可以自动计算\ ``x``\ 和\ ``y``\ 的梯度，并且将他们存进\ ``grad``\ 属性中。
 
@@ -211,7 +211,7 @@ PaddlePaddle的神经网络核心是自动微分，本篇文章主要为你介
 
 飞桨的自动微分是通过\ ``trace``\ 的方式，记录\ ``前向OP``\ 的执行，并自动创建\ ``反向var``\ 和添加相应的\ ``反向OP``\ ，然后来实现反向梯度计算的。
 
-.. image:: autograd_image/autograd_image_4-1.png
+.. image:: images/autograd_image_4-1.png
 
 下面本文用一些的例子，来模拟这个过程。
 
@@ -237,7 +237,7 @@ PaddlePaddle的神经网络核心是自动微分，本篇文章主要为你介
 
 在上面代码中\ ``c.backward()``\ 执行前，你可以理解整个计算图是这样的：
 
-.. image:: autograd_image/autograd_image_4-2.png
+.. image:: images/autograd_image_4-2.png
 
 当创建\ ``Tensor``\ ，\ ``Tensor``\ 的\ ``stop_grad=False``\ 时，会自动为此\ ``Tensor``\ 创建一个\ ``反向Tensor``\ 。在此例子中，a的反向Tensor就是\ ``a_grad``\ 。在\ ``a_grad``\ 中，会记录他的反向OP，因为a没有作为任何反向op的输入，所以它的\ ``grad_op``\ 为\ ``None``\ 。
 
@@ -253,7 +253,7 @@ PaddlePaddle的神经网络核心是自动微分，本篇文章主要为你介
 
 调用\ ``backward()``\ 后，正式开始进行反向传播过程，开始自动计算微分。
 
-.. image:: autograd_image/autograd_image_4-3.png
+.. image:: images/autograd_image_4-3.png
 
 例子二：用一个稍微复杂一点的例子让你深入了解这个过程。
 
@@ -284,7 +284,7 @@ PaddlePaddle的神经网络核心是自动微分，本篇文章主要为你介
 该例子的正向和反向图构建过程即：
 
 
-.. image:: autograd_image/autograd_image_4-4.png
+.. image:: images/autograd_image_4-4.png
 
 
 
diff --git a/docs/guides/02_paddle2.0_develop/07_customize_cn.ipynb b/docs/guides/advanced/customize_cn.ipynb
similarity index 100%
rename from docs/guides/02_paddle2.0_develop/07_customize_cn.ipynb
rename to docs/guides/advanced/customize_cn.ipynb
diff --git a/docs/guides/01_paddle2.0_introduction/basic_concept/gradient_clip_cn.rst b/docs/guides/advanced/gradient_clip_cn.rst
similarity index 100%
rename from docs/guides/01_paddle2.0_introduction/basic_concept/gradient_clip_cn.rst
rename to docs/guides/advanced/gradient_clip_cn.rst
diff --git a/docs/guides/01_paddle2.0_introduction/basic_concept/gradient_clip_en.rst b/docs/guides/advanced/gradient_clip_en.rst
similarity index 100%
rename from docs/guides/01_paddle2.0_introduction/basic_concept/gradient_clip_en.rst
rename to docs/guides/advanced/gradient_clip_en.rst
diff --git a/docs/guides/01_paddle2.0_introduction/basic_concept/autograd_image/autograd_image_3-1.png b/docs/guides/advanced/images/autograd_image_3-1.png
similarity index 100%
rename from docs/guides/01_paddle2.0_introduction/basic_concept/autograd_image/autograd_image_3-1.png
rename to docs/guides/advanced/images/autograd_image_3-1.png
diff --git a/docs/guides/01_paddle2.0_introduction/basic_concept/autograd_image/autograd_image_4-1.png b/docs/guides/advanced/images/autograd_image_4-1.png
similarity index 100%
rename from docs/guides/01_paddle2.0_introduction/basic_concept/autograd_image/autograd_image_4-1.png
rename to docs/guides/advanced/images/autograd_image_4-1.png
diff --git a/docs/guides/01_paddle2.0_introduction/basic_concept/autograd_image/autograd_image_4-2.png b/docs/guides/advanced/images/autograd_image_4-2.png
similarity index 100%
rename from docs/guides/01_paddle2.0_introduction/basic_concept/autograd_image/autograd_image_4-2.png
rename to docs/guides/advanced/images/autograd_image_4-2.png
diff --git a/docs/guides/01_paddle2.0_introduction/basic_concept/autograd_image/autograd_image_4-3.png b/docs/guides/advanced/images/autograd_image_4-3.png
similarity index 100%
rename from docs/guides/01_paddle2.0_introduction/basic_concept/autograd_image/autograd_image_4-3.png
rename to docs/guides/advanced/images/autograd_image_4-3.png
diff --git a/docs/guides/01_paddle2.0_introduction/basic_concept/autograd_image/autograd_image_4-4.png b/docs/guides/advanced/images/autograd_image_4-4.png
similarity index 100%
rename from docs/guides/01_paddle2.0_introduction/basic_concept/autograd_image/autograd_image_4-4.png
rename to docs/guides/advanced/images/autograd_image_4-4.png
diff --git a/docs/guides/advanced/index_cn.rst b/docs/guides/advanced/index_cn.rst
new file mode 100644
index 00000000000..a56d0238d70
--- /dev/null
+++ b/docs/guides/advanced/index_cn.rst
@@ -0,0 +1,22 @@
+###################
+进阶用法
+###################
+
+
+- `模型可视化 <./visualdl_usage_cn.html>`_
+- `自动微分 <./autograd_cn.html>`_
+- `层与模型 <./layer_and_model_cn.html>`_
+- `自定义Loss、Metric 及 Callback <./customize_cn.html>`_
+- `梯度裁剪 <./gradient_clip_cn.html>`_
+- `模型导出ONNX协议 <./model_to_onnx_cn.html>`_
+
+..  toctree::
+    :hidden:
+
+    visualdl_usage_cn.md
+    autograd_cn.rst
+    layer_and_model_cn.md
+    customize_cn.ipynb
+    gradient_clip_cn.rst
+    model_to_onnx_cn.rst
+    
\ No newline at end of file
diff --git a/docs/guides/advanced/index_en.rst b/docs/guides/advanced/index_en.rst
new file mode 100644
index 00000000000..dcd63230c96
--- /dev/null
+++ b/docs/guides/advanced/index_en.rst
@@ -0,0 +1,14 @@
+########################
+Advanced Guides
+########################
+
+- `Model Visualization <./visualdl_usage_en.html>`_
+- `Model and Layer <./layer_and_model_en.html>`_
+- `Gradient Clip./gradient_clip_en.html>`_
+
+..  toctree::
+    :hidden:
+
+    visualdl_usage_cn.md
+    layer_and_model_cn.md
+    gradient_clip_cn.rst
diff --git a/docs/guides/01_paddle2.0_introduction/basic_concept/layer_and_model_cn.md b/docs/guides/advanced/layer_and_model_cn.md
similarity index 100%
rename from docs/guides/01_paddle2.0_introduction/basic_concept/layer_and_model_cn.md
rename to docs/guides/advanced/layer_and_model_cn.md
diff --git a/docs/guides/01_paddle2.0_introduction/basic_concept/layer_and_model_en.md b/docs/guides/advanced/layer_and_model_en.md
similarity index 100%
rename from docs/guides/01_paddle2.0_introduction/basic_concept/layer_and_model_en.md
rename to docs/guides/advanced/layer_and_model_en.md
diff --git a/docs/guides/02_paddle2.0_develop/09_model_to_onnx_cn.rst b/docs/guides/advanced/model_to_onnx_cn.rst
similarity index 99%
rename from docs/guides/02_paddle2.0_develop/09_model_to_onnx_cn.rst
rename to docs/guides/advanced/model_to_onnx_cn.rst
index d76bbc182f8..27de200f6a7 100755
--- a/docs/guides/02_paddle2.0_develop/09_model_to_onnx_cn.rst
+++ b/docs/guides/advanced/model_to_onnx_cn.rst
@@ -1,8 +1,8 @@
 .. _cn_model_to_onnx:
 
-#############
+################
 模型导出ONNX协议
-#############
+################
 
 一、简介
 ##################
diff --git a/docs/guides/03_VisualDL/visualdl_cn.md b/docs/guides/advanced/visualdl_cn.md
similarity index 100%
rename from docs/guides/03_VisualDL/visualdl_cn.md
rename to docs/guides/advanced/visualdl_cn.md
diff --git a/docs/guides/03_VisualDL/visualdl_en.md b/docs/guides/advanced/visualdl_en.md
similarity index 100%
rename from docs/guides/03_VisualDL/visualdl_en.md
rename to docs/guides/advanced/visualdl_en.md
diff --git a/docs/guides/03_VisualDL/visualdl_usage_cn.md b/docs/guides/advanced/visualdl_usage_cn.md
similarity index 100%
rename from docs/guides/03_VisualDL/visualdl_usage_cn.md
rename to docs/guides/advanced/visualdl_usage_cn.md
diff --git a/docs/guides/03_VisualDL/visualdl_usage_en.md b/docs/guides/advanced/visualdl_usage_en.md
similarity index 100%
rename from docs/guides/03_VisualDL/visualdl_usage_en.md
rename to docs/guides/advanced/visualdl_usage_en.md
diff --git a/docs/guides/02_paddle2.0_develop/02_data_load_cn.ipynb b/docs/guides/beginner/data_load_cn.ipynb
similarity index 100%
rename from docs/guides/02_paddle2.0_develop/02_data_load_cn.ipynb
rename to docs/guides/beginner/data_load_cn.ipynb
diff --git a/docs/guides/02_paddle2.0_develop/03_data_preprocessing_cn.ipynb b/docs/guides/beginner/data_preprocessing_cn.ipynb
similarity index 100%
rename from docs/guides/02_paddle2.0_develop/03_data_preprocessing_cn.ipynb
rename to docs/guides/beginner/data_preprocessing_cn.ipynb
diff --git a/docs/guides/01_paddle2.0_introduction/basic_concept/images/Axis_2.0.png b/docs/guides/beginner/images/Axis_2.0.png
similarity index 100%
rename from docs/guides/01_paddle2.0_introduction/basic_concept/images/Axis_2.0.png
rename to docs/guides/beginner/images/Axis_2.0.png
diff --git a/docs/guides/01_paddle2.0_introduction/basic_concept/images/Tensor_2.0.png b/docs/guides/beginner/images/Tensor_2.0.png
similarity index 100%
rename from docs/guides/01_paddle2.0_introduction/basic_concept/images/Tensor_2.0.png
rename to docs/guides/beginner/images/Tensor_2.0.png
diff --git a/docs/guides/01_paddle2.0_introduction/basic_concept/images/Tensor_broadcast.png b/docs/guides/beginner/images/Tensor_broadcast.png
similarity index 100%
rename from docs/guides/01_paddle2.0_introduction/basic_concept/images/Tensor_broadcast.png
rename to docs/guides/beginner/images/Tensor_broadcast.png
diff --git a/docs/guides/02_paddle2.0_develop/images/data_pipeline.png b/docs/guides/beginner/images/data_pipeline.png
similarity index 100%
rename from docs/guides/02_paddle2.0_develop/images/data_pipeline.png
rename to docs/guides/beginner/images/data_pipeline.png
diff --git a/docs/guides/02_paddle2.0_develop/images/data_preprocessing.png b/docs/guides/beginner/images/data_preprocessing.png
similarity index 100%
rename from docs/guides/02_paddle2.0_develop/images/data_preprocessing.png
rename to docs/guides/beginner/images/data_preprocessing.png
diff --git a/docs/guides/02_paddle2.0_develop/images/lenet.png b/docs/guides/beginner/images/lenet.png
similarity index 100%
rename from docs/guides/02_paddle2.0_develop/images/lenet.png
rename to docs/guides/beginner/images/lenet.png
diff --git a/docs/guides/02_paddle2.0_develop/images/mnist.png b/docs/guides/beginner/images/mnist.png
similarity index 100%
rename from docs/guides/02_paddle2.0_develop/images/mnist.png
rename to docs/guides/beginner/images/mnist.png
diff --git a/docs/guides/02_paddle2.0_develop/images/model.png b/docs/guides/beginner/images/model.png
similarity index 100%
rename from docs/guides/02_paddle2.0_develop/images/model.png
rename to docs/guides/beginner/images/model.png
diff --git a/docs/guides/02_paddle2.0_develop/images/model_develop_flow.png b/docs/guides/beginner/images/model_develop_flow.png
similarity index 100%
rename from docs/guides/02_paddle2.0_develop/images/model_develop_flow.png
rename to docs/guides/beginner/images/model_develop_flow.png
diff --git a/docs/guides/02_paddle2.0_develop/images/paddle_jit_save_load_2.1.png b/docs/guides/beginner/images/paddle_jit_save_load_2.1.png
similarity index 100%
rename from docs/guides/02_paddle2.0_develop/images/paddle_jit_save_load_2.1.png
rename to docs/guides/beginner/images/paddle_jit_save_load_2.1.png
diff --git a/docs/guides/02_paddle2.0_develop/images/paddle_save_load_2.1.png b/docs/guides/beginner/images/paddle_save_load_2.1.png
similarity index 100%
rename from docs/guides/02_paddle2.0_develop/images/paddle_save_load_2.1.png
rename to docs/guides/beginner/images/paddle_save_load_2.1.png
diff --git a/docs/guides/beginner/index_cn.rst b/docs/guides/beginner/index_cn.rst
new file mode 100644
index 00000000000..3dc33cf0ef6
--- /dev/null
+++ b/docs/guides/beginner/index_cn.rst
@@ -0,0 +1,28 @@
+###################
+模型开发
+###################
+
+本部分将介绍飞桨框架2.0的开发流程。
+
+为了快速上手飞桨框架2.0，你可以参考 `10分钟快速上手飞桨 <./01_quick_start_cn.html>`_ ;
+
+当完成了快速上手的任务后，下面这些模块会阐述如何用飞桨框架2.0，实现深度学习过程中的每一步。具体包括：
+
+- `Tensor 介绍 <./tensor_cn.html>`_ : 介绍飞桨基本数据类型 `Tensor` 的概念与常见用法。
+- `数据集定义与加载 <./data_load_cn.html>`_ : 飞桨框架数据加载的方式，主要为\ ``paddle.io.Dataset + paddle.io.DataLoader``\ ，以及飞桨内置数据集的介绍。
+- `数据预处理 <./data_preprocessing_cn.html>`_ : 飞桨框架数据预处理的方法，主要是\ ``paddle.vision.transform.*``\ 。
+- `模型组网 <./model_cn.html>`_ : 飞桨框架组网API的介绍，主要是\ ``paddle.nn.*``\ ，然后是飞桨框架组网方式的介绍，即 Sequential 的组网与 SubClass 的组网。
+- `训练与预测 <./train_eval_predict_cn.html>`_ : 飞桨框架训练与预测的方法，有两种方式，一种是使用高层API\ ``paddle.Model``\ 封装模型，然后调用\ ``model.fit()、model.evaluate()、model.predict()``\ 完成模型的训练与预测；另一种是用基础API完成模型的训练与预测，也就是对高层API的拆解。
+- `模型的加载与保存 <./model_save_load_cn.html>`_ : 飞桨框架模型的加载与保存体系介绍。
+
+.. toctree::
+    :hidden:
+
+    quick_start_cn.ipynb
+    tensor_cn.md
+    data_load_cn.ipynb
+    data_preprocessing_cn.ipynb
+    model_cn.ipynb
+    train_eval_predict_cn.rst
+    model_save_load_cn.rst
+
diff --git a/docs/guides/beginner/index_en.rst b/docs/guides/beginner/index_en.rst
new file mode 100644
index 00000000000..3ad80afe15f
--- /dev/null
+++ b/docs/guides/beginner/index_en.rst
@@ -0,0 +1,12 @@
+###################
+Model Development
+###################
+
+
+- `Introduction of Tensor <./tensor_en.html>`_ : 
+
+.. toctree::
+    :hidden:
+
+    tensor_en.md
+
diff --git a/docs/guides/02_paddle2.0_develop/04_model_cn.ipynb b/docs/guides/beginner/model_cn.ipynb
similarity index 100%
rename from docs/guides/02_paddle2.0_develop/04_model_cn.ipynb
rename to docs/guides/beginner/model_cn.ipynb
diff --git a/docs/guides/02_paddle2.0_develop/08_model_save_load_cn.rst b/docs/guides/beginner/model_save_load_cn.rst
similarity index 100%
rename from docs/guides/02_paddle2.0_develop/08_model_save_load_cn.rst
rename to docs/guides/beginner/model_save_load_cn.rst
diff --git a/docs/guides/02_paddle2.0_develop/01_quick_start_cn.ipynb b/docs/guides/beginner/quick_start_cn.ipynb
similarity index 100%
rename from docs/guides/02_paddle2.0_develop/01_quick_start_cn.ipynb
rename to docs/guides/beginner/quick_start_cn.ipynb
diff --git a/docs/guides/01_paddle2.0_introduction/basic_concept/tensor_introduction_cn.md b/docs/guides/beginner/tensor_cn.md
similarity index 100%
rename from docs/guides/01_paddle2.0_introduction/basic_concept/tensor_introduction_cn.md
rename to docs/guides/beginner/tensor_cn.md
diff --git a/docs/guides/01_paddle2.0_introduction/basic_concept/tensor_introduction_en.md b/docs/guides/beginner/tensor_en.md
similarity index 100%
rename from docs/guides/01_paddle2.0_introduction/basic_concept/tensor_introduction_en.md
rename to docs/guides/beginner/tensor_en.md
diff --git a/docs/guides/02_paddle2.0_develop/05_train_eval_predict_cn.ipynb b/docs/guides/beginner/train_eval_predict_cn.ipynb
similarity index 100%
rename from docs/guides/02_paddle2.0_develop/05_train_eval_predict_cn.ipynb
rename to docs/guides/beginner/train_eval_predict_cn.ipynb
diff --git a/docs/guides/09_hardware_support/hardware_info_cn.md b/docs/guides/hardware_support/hardware_info_cn.md
similarity index 100%
rename from docs/guides/09_hardware_support/hardware_info_cn.md
rename to docs/guides/hardware_support/hardware_info_cn.md
diff --git a/docs/guides/09_hardware_support/index_cn.rst b/docs/guides/hardware_support/index_cn.rst
similarity index 100%
rename from docs/guides/09_hardware_support/index_cn.rst
rename to docs/guides/hardware_support/index_cn.rst
diff --git a/docs/guides/09_hardware_support/ipu_docs/index_cn.rst b/docs/guides/hardware_support/ipu_docs/index_cn.rst
similarity index 100%
rename from docs/guides/09_hardware_support/ipu_docs/index_cn.rst
rename to docs/guides/hardware_support/ipu_docs/index_cn.rst
diff --git a/docs/guides/09_hardware_support/ipu_docs/infer_example_cn.md b/docs/guides/hardware_support/ipu_docs/infer_example_cn.md
similarity index 100%
rename from docs/guides/09_hardware_support/ipu_docs/infer_example_cn.md
rename to docs/guides/hardware_support/ipu_docs/infer_example_cn.md
diff --git a/docs/guides/09_hardware_support/ipu_docs/paddle_install_cn.md b/docs/guides/hardware_support/ipu_docs/paddle_install_cn.md
similarity index 100%
rename from docs/guides/09_hardware_support/ipu_docs/paddle_install_cn.md
rename to docs/guides/hardware_support/ipu_docs/paddle_install_cn.md
diff --git a/docs/guides/09_hardware_support/ipu_docs/train_example_cn.md b/docs/guides/hardware_support/ipu_docs/train_example_cn.md
similarity index 100%
rename from docs/guides/09_hardware_support/ipu_docs/train_example_cn.md
rename to docs/guides/hardware_support/ipu_docs/train_example_cn.md
diff --git a/docs/guides/09_hardware_support/npu_docs/index_cn.rst b/docs/guides/hardware_support/npu_docs/index_cn.rst
similarity index 100%
rename from docs/guides/09_hardware_support/npu_docs/index_cn.rst
rename to docs/guides/hardware_support/npu_docs/index_cn.rst
diff --git a/docs/guides/09_hardware_support/npu_docs/paddle_install_cn.md b/docs/guides/hardware_support/npu_docs/paddle_install_cn.md
similarity index 100%
rename from docs/guides/09_hardware_support/npu_docs/paddle_install_cn.md
rename to docs/guides/hardware_support/npu_docs/paddle_install_cn.md
diff --git a/docs/guides/09_hardware_support/npu_docs/train_example_cn.md b/docs/guides/hardware_support/npu_docs/train_example_cn.md
similarity index 100%
rename from docs/guides/09_hardware_support/npu_docs/train_example_cn.md
rename to docs/guides/hardware_support/npu_docs/train_example_cn.md
diff --git a/docs/guides/09_hardware_support/rocm_docs/index_cn.rst b/docs/guides/hardware_support/rocm_docs/index_cn.rst
similarity index 100%
rename from docs/guides/09_hardware_support/rocm_docs/index_cn.rst
rename to docs/guides/hardware_support/rocm_docs/index_cn.rst
diff --git a/docs/guides/09_hardware_support/rocm_docs/infer_example_cn.md b/docs/guides/hardware_support/rocm_docs/infer_example_cn.md
similarity index 100%
rename from docs/guides/09_hardware_support/rocm_docs/infer_example_cn.md
rename to docs/guides/hardware_support/rocm_docs/infer_example_cn.md
diff --git a/docs/guides/09_hardware_support/rocm_docs/paddle_install_cn.md b/docs/guides/hardware_support/rocm_docs/paddle_install_cn.md
similarity index 100%
rename from docs/guides/09_hardware_support/rocm_docs/paddle_install_cn.md
rename to docs/guides/hardware_support/rocm_docs/paddle_install_cn.md
diff --git a/docs/guides/09_hardware_support/rocm_docs/paddle_rocm_cn.md b/docs/guides/hardware_support/rocm_docs/paddle_rocm_cn.md
similarity index 100%
rename from docs/guides/09_hardware_support/rocm_docs/paddle_rocm_cn.md
rename to docs/guides/hardware_support/rocm_docs/paddle_rocm_cn.md
diff --git a/docs/guides/09_hardware_support/rocm_docs/train_example_cn.md b/docs/guides/hardware_support/rocm_docs/train_example_cn.md
similarity index 100%
rename from docs/guides/09_hardware_support/rocm_docs/train_example_cn.md
rename to docs/guides/hardware_support/rocm_docs/train_example_cn.md
diff --git a/docs/guides/09_hardware_support/xpu_docs/index_cn.rst b/docs/guides/hardware_support/xpu_docs/index_cn.rst
similarity index 100%
rename from docs/guides/09_hardware_support/xpu_docs/index_cn.rst
rename to docs/guides/hardware_support/xpu_docs/index_cn.rst
diff --git a/docs/guides/09_hardware_support/xpu_docs/inference_install_example_cn.md b/docs/guides/hardware_support/xpu_docs/inference_install_example_cn.md
similarity index 100%
rename from docs/guides/09_hardware_support/xpu_docs/inference_install_example_cn.md
rename to docs/guides/hardware_support/xpu_docs/inference_install_example_cn.md
diff --git a/docs/guides/09_hardware_support/xpu_docs/paddle_2.0_xpu2_cn.md b/docs/guides/hardware_support/xpu_docs/paddle_2.0_xpu2_cn.md
similarity index 100%
rename from docs/guides/09_hardware_support/xpu_docs/paddle_2.0_xpu2_cn.md
rename to docs/guides/hardware_support/xpu_docs/paddle_2.0_xpu2_cn.md
diff --git a/docs/guides/09_hardware_support/xpu_docs/paddle_2.0_xpu_cn.md b/docs/guides/hardware_support/xpu_docs/paddle_2.0_xpu_cn.md
similarity index 100%
rename from docs/guides/09_hardware_support/xpu_docs/paddle_2.0_xpu_cn.md
rename to docs/guides/hardware_support/xpu_docs/paddle_2.0_xpu_cn.md
diff --git a/docs/guides/09_hardware_support/xpu_docs/paddle_install_cn.md b/docs/guides/hardware_support/xpu_docs/paddle_install_cn.md
similarity index 100%
rename from docs/guides/09_hardware_support/xpu_docs/paddle_install_cn.md
rename to docs/guides/hardware_support/xpu_docs/paddle_install_cn.md
diff --git a/docs/guides/09_hardware_support/xpu_docs/paddle_install_xpu2_cn.md b/docs/guides/hardware_support/xpu_docs/paddle_install_xpu2_cn.md
similarity index 100%
rename from docs/guides/09_hardware_support/xpu_docs/paddle_install_xpu2_cn.md
rename to docs/guides/hardware_support/xpu_docs/paddle_install_xpu2_cn.md
diff --git a/docs/guides/09_hardware_support/xpu_docs/train_example_cn.md b/docs/guides/hardware_support/xpu_docs/train_example_cn.md
similarity index 100%
rename from docs/guides/09_hardware_support/xpu_docs/train_example_cn.md
rename to docs/guides/hardware_support/xpu_docs/train_example_cn.md
diff --git a/docs/guides/09_hardware_support/xpu_docs/train_example_xpu2_cn.md b/docs/guides/hardware_support/xpu_docs/train_example_xpu2_cn.md
similarity index 100%
rename from docs/guides/09_hardware_support/xpu_docs/train_example_xpu2_cn.md
rename to docs/guides/hardware_support/xpu_docs/train_example_xpu2_cn.md
diff --git a/docs/guides/index_cn.rst b/docs/guides/index_cn.rst
index 0bdc6748534..f160baeb19f 100644
--- a/docs/guides/index_cn.rst
+++ b/docs/guides/index_cn.rst
@@ -8,31 +8,27 @@
 
 使用教程分为如下的模块：
 
-- `整体介绍 <./01_paddle2.0_introduction/index_cn.html>`_ : 飞桨框架2.0新特性的介绍与飞桨框架2.0升级指南的说明。
-- `模型开发 <./02_paddle2.0_develop/index_cn.html>`_ : 飞桨框架2.0模型开发全流程说明。
-- `模型可视化 <./03_VisualDL/index_cn.html>`_ : 介绍如何用VisualDL实现飞桨框架模型的可视化。
-- `动态图转静态图 <./04_dygraph_to_static/index_cn.html>`_ : 介绍飞桨框架动态图转静态图的方法。
-- `预测部署 <./05_inference_deployment/index_cn.html>`_ : 介绍如何使用训练好的模型进行预测。
-- `分布式训练 <./06_distributed_training/index_cn.html>`_ : 介绍如何使用分布式进行训练。
-- `自定义算子 <./07_new_op/index_cn.html>`_ : 介绍飞桨框架自定义算子的方法。
-- `性能调优 <./performance_improving/index_cn.html>`_ : 介绍飞桨框架性能调优的方法。
-- `算子映射 <./08_api_mapping/index_cn.html>`_ : 介绍飞桨框架API算子的映射信息。
-- `硬件支持 <./09_hardware_support/index_cn.html>`_ : 介绍飞桨框架硬件支持相关信息。
-- `参与开发 <./10_contribution/index_cn.html>`_ : 介绍如何参与飞桨框架的开发。
-- `环境变量 <./flags/flags_cn.html>`_ : 介绍飞桨相关环境变量的使用。
+- `模型开发 <./beginner/index_cn.html>`_
+- `进阶用法 <./advanced/index_cn.html>`_
+- `动态图转静态图 <./jit/index_cn.html>`_
+- `预测部署 <./infer/index_cn.html>`_ 
+- `分布式训练 <./06_distributed_training/index_cn.html>`_
+- `性能调优 <./performance_improving/index_cn.html>`_
+- `模型迁移 <./model_convert/index_cn.html>`_
+- `硬件支持 <./hardware_support/index_cn.html>`_
+- `自定义算子 <./new_op/index_cn.html>`_
+- `环境变量 <./flags/flags_cn.html>`_
 
 ..  toctree::
     :hidden:
 
-    01_paddle2.0_introduction/index_cn.rst
-    02_paddle2.0_develop/index_cn.rst
-    03_VisualDL/index_cn.rst
-    04_dygraph_to_static/index_cn.rst
-    05_inference_deployment/index_cn.rst
+    beginner/index_cn.rst
+    advanced/index_cn.rst
+    jit/index_cn.rst
+    infer/index_cn.rst
     06_distributed_training/index_cn.rst
-    07_new_op/index_cn.rst
     performance_improving/index_cn.rst
-    08_api_mapping/index_cn.rst
-    09_hardware_support/index_cn.rst
-    10_contribution/index_cn.rst
+    model_convert/index_cn.rst
+    hardware_support/index_cn.rst
+    new_op/index_cn.rst
     flags/flags_cn.rst
diff --git a/docs/guides/index_en.rst b/docs/guides/index_en.rst
index 945cdeec478..b4c988644c6 100644
--- a/docs/guides/index_en.rst
+++ b/docs/guides/index_en.rst
@@ -9,25 +9,25 @@ Please refer to  `PaddlePaddle Github <https://github.com/PaddlePaddle/Paddle>`_
 
 Let's start with studying basic concept of PaddlePaddle:
 
-- `PaddlePaddle Introduction <./01_paddle2.0_introduction/index_en.html>`_ : Introduction of the new features of PaddlePaddle 2.0 and description of the PaddlePaddle 2.0 upgrade guide.
-- `Model Visualization <./03_VisualDL/index_en.html>`_ : Introduce VisualDL, a visual tool of PaddlePaddle.
-- `Dygraph to Static Graph <./04_dygraph_to_static/index_en.html>`_ : Introduce the transformation of dygraph to static graph.
-- `Inference and Deployment <./05_inference_deployment/index_en.html>`_ : Introduce the method of using the trained model to inference.
+- `Model Development <./beginner/index_en.html>`_
+- `Advanced Guides <./advanced/index_en.html>`_
+- `Dygraph to Static Graph <./jit/index_en.html>`_ : Introduce the transformation of dygraph to static graph.
+- `Inference and Deployment <./infer/index_en.html>`_ : Introduce the method of using the trained model to inference.
 - `Distributed Training <./06_distributed_training/index_en.html>`_ : Introduce how the PaddlePaddle uses distributed training
-- `Customize OP <./07_new_op/index_en.html>`_ :  Introduce how to customize OP for PaddlePaddle.
 - `Performance Improving <./performance_improving/index_en.html>`_ : Introduce how to improve performance of PaddlePaddle.
-- `Contribution <./10_contribution/index_en.html>`_: Introduce how to contribute for PaddlePaddle.
+- `Model Convert <./model_convert/index_en.html>`_ : Introduce how to convert your model to PaddlePaddle.
+- `Customize OP <./new_op/index_en.html>`_ :  Introduce how to customize OP for PaddlePaddle.
 - `FLAGS <./flags/flags_en.html>`_ : Introduce the envirenment flags in paddle.
 
 ..  toctree::
     :hidden:
     
-    01_paddle2.0_introduction/index_en.rst
-    03_VisualDL/index_en.rst
-    04_dygraph_to_static/index_en.rst
-    05_inference_deployment/index_en.rst
+    beginner/index_en.rst
+    advanced/index_en.rst
+    jit/index_en.rst
+    infer/index_en.rst
     06_distributed_training/index_en.rst
-    07_new_op/index_en.rst
     performance_improving/index_en.rst
-    10_contribution/index_en.rst
+    model_convert/index_en.rst
+    new_op/index_en.rst
     flags/flags_en.rst
diff --git a/docs/guides/05_inference_deployment/images/inference_ecosystem.png b/docs/guides/infer/images/inference_ecosystem.png
similarity index 100%
rename from docs/guides/05_inference_deployment/images/inference_ecosystem.png
rename to docs/guides/infer/images/inference_ecosystem.png
diff --git a/docs/guides/05_inference_deployment/index_cn.rst b/docs/guides/infer/index_cn.rst
similarity index 100%
rename from docs/guides/05_inference_deployment/index_cn.rst
rename to docs/guides/infer/index_cn.rst
diff --git a/docs/guides/05_inference_deployment/index_en.rst b/docs/guides/infer/index_en.rst
similarity index 100%
rename from docs/guides/05_inference_deployment/index_en.rst
rename to docs/guides/infer/index_en.rst
diff --git a/docs/guides/05_inference_deployment/inference/images/inference.png b/docs/guides/infer/inference/images/inference.png
similarity index 100%
rename from docs/guides/05_inference_deployment/inference/images/inference.png
rename to docs/guides/infer/inference/images/inference.png
diff --git a/docs/guides/05_inference_deployment/inference/images/paddlepaddle.png b/docs/guides/infer/inference/images/paddlepaddle.png
similarity index 100%
rename from docs/guides/05_inference_deployment/inference/images/paddlepaddle.png
rename to docs/guides/infer/inference/images/paddlepaddle.png
diff --git a/docs/guides/05_inference_deployment/inference/images/wechat.png b/docs/guides/infer/inference/images/wechat.png
similarity index 100%
rename from docs/guides/05_inference_deployment/inference/images/wechat.png
rename to docs/guides/infer/inference/images/wechat.png
diff --git a/docs/guides/05_inference_deployment/inference/inference_cn.md b/docs/guides/infer/inference/inference_cn.md
similarity index 100%
rename from docs/guides/05_inference_deployment/inference/inference_cn.md
rename to docs/guides/infer/inference/inference_cn.md
diff --git a/docs/guides/05_inference_deployment/mobile/mobile_index_cn.md b/docs/guides/infer/mobile/mobile_index_cn.md
similarity index 100%
rename from docs/guides/05_inference_deployment/mobile/mobile_index_cn.md
rename to docs/guides/infer/mobile/mobile_index_cn.md
diff --git a/docs/guides/05_inference_deployment/paddleslim/paddle_slim_cn.md b/docs/guides/infer/paddleslim/paddle_slim_cn.md
similarity index 100%
rename from docs/guides/05_inference_deployment/paddleslim/paddle_slim_cn.md
rename to docs/guides/infer/paddleslim/paddle_slim_cn.md
diff --git a/docs/guides/05_inference_deployment/paddleslim/paddle_slim_en.rst b/docs/guides/infer/paddleslim/paddle_slim_en.rst
similarity index 100%
rename from docs/guides/05_inference_deployment/paddleslim/paddle_slim_en.rst
rename to docs/guides/infer/paddleslim/paddle_slim_en.rst
diff --git a/docs/guides/04_dygraph_to_static/basic_usage_cn.md b/docs/guides/jit/basic_usage_cn.md
similarity index 100%
rename from docs/guides/04_dygraph_to_static/basic_usage_cn.md
rename to docs/guides/jit/basic_usage_cn.md
diff --git a/docs/guides/04_dygraph_to_static/basic_usage_en.md b/docs/guides/jit/basic_usage_en.md
similarity index 100%
rename from docs/guides/04_dygraph_to_static/basic_usage_en.md
rename to docs/guides/jit/basic_usage_en.md
diff --git a/docs/guides/04_dygraph_to_static/case_analysis_cn.md b/docs/guides/jit/case_analysis_cn.md
similarity index 100%
rename from docs/guides/04_dygraph_to_static/case_analysis_cn.md
rename to docs/guides/jit/case_analysis_cn.md
diff --git a/docs/guides/04_dygraph_to_static/debugging_cn.md b/docs/guides/jit/debugging_cn.md
similarity index 100%
rename from docs/guides/04_dygraph_to_static/debugging_cn.md
rename to docs/guides/jit/debugging_cn.md
diff --git a/docs/guides/04_dygraph_to_static/debugging_en.md b/docs/guides/jit/debugging_en.md
similarity index 100%
rename from docs/guides/04_dygraph_to_static/debugging_en.md
rename to docs/guides/jit/debugging_en.md
diff --git a/docs/guides/04_dygraph_to_static/grammar_list_cn.md b/docs/guides/jit/grammar_list_cn.md
similarity index 100%
rename from docs/guides/04_dygraph_to_static/grammar_list_cn.md
rename to docs/guides/jit/grammar_list_cn.md
diff --git a/docs/guides/04_dygraph_to_static/grammar_list_en.md b/docs/guides/jit/grammar_list_en.md
similarity index 100%
rename from docs/guides/04_dygraph_to_static/grammar_list_en.md
rename to docs/guides/jit/grammar_list_en.md
diff --git a/docs/guides/04_dygraph_to_static/images/c++_error_log.png b/docs/guides/jit/images/c++_error_log.png
similarity index 100%
rename from docs/guides/04_dygraph_to_static/images/c++_error_log.png
rename to docs/guides/jit/images/c++_error_log.png
diff --git a/docs/guides/04_dygraph_to_static/images/convert_cond.png b/docs/guides/jit/images/convert_cond.png
similarity index 100%
rename from docs/guides/04_dygraph_to_static/images/convert_cond.png
rename to docs/guides/jit/images/convert_cond.png
diff --git a/docs/guides/04_dygraph_to_static/images/dy2stat_error_log.png b/docs/guides/jit/images/dy2stat_error_log.png
similarity index 100%
rename from docs/guides/04_dygraph_to_static/images/dy2stat_error_log.png
rename to docs/guides/jit/images/dy2stat_error_log.png
diff --git a/docs/guides/04_dygraph_to_static/images/dygraph_export.png b/docs/guides/jit/images/dygraph_export.png
similarity index 100%
rename from docs/guides/04_dygraph_to_static/images/dygraph_export.png
rename to docs/guides/jit/images/dygraph_export.png
diff --git a/docs/guides/04_dygraph_to_static/images/dygraph_to_static.png b/docs/guides/jit/images/dygraph_to_static.png
similarity index 100%
rename from docs/guides/04_dygraph_to_static/images/dygraph_to_static.png
rename to docs/guides/jit/images/dygraph_to_static.png
diff --git a/docs/guides/04_dygraph_to_static/images/original_error_log.png b/docs/guides/jit/images/original_error_log.png
similarity index 100%
rename from docs/guides/04_dygraph_to_static/images/original_error_log.png
rename to docs/guides/jit/images/original_error_log.png
diff --git a/docs/guides/04_dygraph_to_static/images/pdb_cmd.png b/docs/guides/jit/images/pdb_cmd.png
similarity index 100%
rename from docs/guides/04_dygraph_to_static/images/pdb_cmd.png
rename to docs/guides/jit/images/pdb_cmd.png
diff --git a/docs/guides/04_dygraph_to_static/images/pdb_cmd_en.png b/docs/guides/jit/images/pdb_cmd_en.png
similarity index 100%
rename from docs/guides/04_dygraph_to_static/images/pdb_cmd_en.png
rename to docs/guides/jit/images/pdb_cmd_en.png
diff --git a/docs/guides/04_dygraph_to_static/images/revise_suggestion.png b/docs/guides/jit/images/revise_suggestion.png
similarity index 100%
rename from docs/guides/04_dygraph_to_static/images/revise_suggestion.png
rename to docs/guides/jit/images/revise_suggestion.png
diff --git a/docs/guides/04_dygraph_to_static/images/slice.png b/docs/guides/jit/images/slice.png
similarity index 100%
rename from docs/guides/04_dygraph_to_static/images/slice.png
rename to docs/guides/jit/images/slice.png
diff --git a/docs/guides/04_dygraph_to_static/images/static_export.png b/docs/guides/jit/images/static_export.png
similarity index 100%
rename from docs/guides/04_dygraph_to_static/images/static_export.png
rename to docs/guides/jit/images/static_export.png
diff --git a/docs/guides/04_dygraph_to_static/images/to_static_export.png b/docs/guides/jit/images/to_static_export.png
similarity index 100%
rename from docs/guides/04_dygraph_to_static/images/to_static_export.png
rename to docs/guides/jit/images/to_static_export.png
diff --git a/docs/guides/04_dygraph_to_static/images/to_static_train.png b/docs/guides/jit/images/to_static_train.png
similarity index 100%
rename from docs/guides/04_dygraph_to_static/images/to_static_train.png
rename to docs/guides/jit/images/to_static_train.png
diff --git a/docs/guides/04_dygraph_to_static/index_cn.rst b/docs/guides/jit/index_cn.rst
similarity index 100%
rename from docs/guides/04_dygraph_to_static/index_cn.rst
rename to docs/guides/jit/index_cn.rst
diff --git a/docs/guides/04_dygraph_to_static/index_en.rst b/docs/guides/jit/index_en.rst
similarity index 100%
rename from docs/guides/04_dygraph_to_static/index_en.rst
rename to docs/guides/jit/index_en.rst
diff --git a/docs/guides/04_dygraph_to_static/principle_cn.md b/docs/guides/jit/principle_cn.md
similarity index 100%
rename from docs/guides/04_dygraph_to_static/principle_cn.md
rename to docs/guides/jit/principle_cn.md
diff --git a/docs/guides/01_paddle2.0_introduction/index_cn.rst b/docs/guides/model_convert/index_cn.rst
similarity index 55%
rename from docs/guides/01_paddle2.0_introduction/index_cn.rst
rename to docs/guides/model_convert/index_cn.rst
index 5618005a3c1..37b79c9bc12 100644
--- a/docs/guides/01_paddle2.0_introduction/index_cn.rst
+++ b/docs/guides/model_convert/index_cn.rst
@@ -1,18 +1,21 @@
 ###############
-整体介绍
+模型迁移
 ###############
 
-您可以通过下面的内容，了解更多飞桨框架2.0的内容:
+您可以通过下面的内容，了解如何迁移模型到飞桨2.X:
+
 
-- `基本概念 <./basic_concept/index_cn.html>`_ : 飞桨框架2.0 基本概念的介绍。
 - `升级指南 <./update_cn.html>`_: 介绍飞桨框架2.0 的主要变化和如何升级到最新版飞桨。
 - `版本迁移工具 <./migration_cn.html>`_: 介绍飞桨框架版本转换工具的使用。
 - `兼容载入旧格式模型 <./load_old_format_model.html>`_: 介绍飞桨框架如何在2.x版本加载1.x版本保存的模型。
+- `Paddle API映射表 <./paddle_api_mapping_cn.html>`_ : 说明 Paddle 1.8 版本与 Paddle 2.0 API对应关系。
+- `PyTorch API映射表 <./pytorch_api_mapping_cn.html>`_ : 说明 PyTorch 1.8 版本与 Paddle 2.0 API对应关系。
 
 ..  toctree::
     :hidden:
 
-    basic_concept/index_cn.rst
     update_cn.md
     migration_cn.rst
     load_old_format_model.rst
+    paddle_api_mapping_cn.rst
+    pytorch_api_mapping_cn.rst
diff --git a/docs/guides/01_paddle2.0_introduction/index_en.rst b/docs/guides/model_convert/index_en.rst
similarity index 100%
rename from docs/guides/01_paddle2.0_introduction/index_en.rst
rename to docs/guides/model_convert/index_en.rst
diff --git a/docs/guides/01_paddle2.0_introduction/load_old_format_model_cn.rst b/docs/guides/model_convert/load_old_format_model_cn.rst
similarity index 100%
rename from docs/guides/01_paddle2.0_introduction/load_old_format_model_cn.rst
rename to docs/guides/model_convert/load_old_format_model_cn.rst
diff --git a/docs/guides/01_paddle2.0_introduction/migration_cn.rst b/docs/guides/model_convert/migration_cn.rst
similarity index 100%
rename from docs/guides/01_paddle2.0_introduction/migration_cn.rst
rename to docs/guides/model_convert/migration_cn.rst
diff --git a/docs/guides/08_api_mapping/paddle_api_mapping_cn.rst b/docs/guides/model_convert/paddle_api_mapping_cn.rst
similarity index 100%
rename from docs/guides/08_api_mapping/paddle_api_mapping_cn.rst
rename to docs/guides/model_convert/paddle_api_mapping_cn.rst
diff --git a/docs/guides/08_api_mapping/pytorch_api_mapping_cn.md b/docs/guides/model_convert/pytorch_api_mapping_cn.md
similarity index 100%
rename from docs/guides/08_api_mapping/pytorch_api_mapping_cn.md
rename to docs/guides/model_convert/pytorch_api_mapping_cn.md
diff --git a/docs/guides/01_paddle2.0_introduction/update_cn.md b/docs/guides/model_convert/update_cn.md
similarity index 100%
rename from docs/guides/01_paddle2.0_introduction/update_cn.md
rename to docs/guides/model_convert/update_cn.md
diff --git a/docs/guides/01_paddle2.0_introduction/update_en.md b/docs/guides/model_convert/update_en.md
similarity index 100%
rename from docs/guides/01_paddle2.0_introduction/update_en.md
rename to docs/guides/model_convert/update_en.md
diff --git a/docs/guides/07_new_op/index_cn.rst b/docs/guides/new_op/index_cn.rst
similarity index 100%
rename from docs/guides/07_new_op/index_cn.rst
rename to docs/guides/new_op/index_cn.rst
diff --git a/docs/guides/07_new_op/index_en.rst b/docs/guides/new_op/index_en.rst
similarity index 100%
rename from docs/guides/07_new_op/index_en.rst
rename to docs/guides/new_op/index_en.rst
diff --git a/docs/guides/07_new_op/kernel_primitive_api/add_example_cn.md b/docs/guides/new_op/kernel_primitive_api/add_example_cn.md
similarity index 100%
rename from docs/guides/07_new_op/kernel_primitive_api/add_example_cn.md
rename to docs/guides/new_op/kernel_primitive_api/add_example_cn.md
diff --git a/docs/guides/07_new_op/kernel_primitive_api/add_example_en.md b/docs/guides/new_op/kernel_primitive_api/add_example_en.md
similarity index 100%
rename from docs/guides/07_new_op/kernel_primitive_api/add_example_en.md
rename to docs/guides/new_op/kernel_primitive_api/add_example_en.md
diff --git a/docs/guides/07_new_op/kernel_primitive_api/api_description_cn.rst b/docs/guides/new_op/kernel_primitive_api/api_description_cn.rst
similarity index 100%
rename from docs/guides/07_new_op/kernel_primitive_api/api_description_cn.rst
rename to docs/guides/new_op/kernel_primitive_api/api_description_cn.rst
diff --git a/docs/guides/07_new_op/kernel_primitive_api/api_description_en.rst b/docs/guides/new_op/kernel_primitive_api/api_description_en.rst
similarity index 100%
rename from docs/guides/07_new_op/kernel_primitive_api/api_description_en.rst
rename to docs/guides/new_op/kernel_primitive_api/api_description_en.rst
diff --git a/docs/guides/07_new_op/kernel_primitive_api/compute_api_cn.md b/docs/guides/new_op/kernel_primitive_api/compute_api_cn.md
similarity index 100%
rename from docs/guides/07_new_op/kernel_primitive_api/compute_api_cn.md
rename to docs/guides/new_op/kernel_primitive_api/compute_api_cn.md
diff --git a/docs/guides/07_new_op/kernel_primitive_api/compute_api_en.md b/docs/guides/new_op/kernel_primitive_api/compute_api_en.md
similarity index 100%
rename from docs/guides/07_new_op/kernel_primitive_api/compute_api_en.md
rename to docs/guides/new_op/kernel_primitive_api/compute_api_en.md
diff --git a/docs/guides/07_new_op/kernel_primitive_api/example_cn.rst b/docs/guides/new_op/kernel_primitive_api/example_cn.rst
similarity index 100%
rename from docs/guides/07_new_op/kernel_primitive_api/example_cn.rst
rename to docs/guides/new_op/kernel_primitive_api/example_cn.rst
diff --git a/docs/guides/07_new_op/kernel_primitive_api/example_en.rst b/docs/guides/new_op/kernel_primitive_api/example_en.rst
similarity index 100%
rename from docs/guides/07_new_op/kernel_primitive_api/example_en.rst
rename to docs/guides/new_op/kernel_primitive_api/example_en.rst
diff --git a/docs/guides/07_new_op/kernel_primitive_api/functor_api_cn.md b/docs/guides/new_op/kernel_primitive_api/functor_api_cn.md
similarity index 100%
rename from docs/guides/07_new_op/kernel_primitive_api/functor_api_cn.md
rename to docs/guides/new_op/kernel_primitive_api/functor_api_cn.md
diff --git a/docs/guides/07_new_op/kernel_primitive_api/functor_api_en.md b/docs/guides/new_op/kernel_primitive_api/functor_api_en.md
similarity index 100%
rename from docs/guides/07_new_op/kernel_primitive_api/functor_api_en.md
rename to docs/guides/new_op/kernel_primitive_api/functor_api_en.md
diff --git a/docs/guides/07_new_op/kernel_primitive_api/images/compute_reduce.png b/docs/guides/new_op/kernel_primitive_api/images/compute_reduce.png
similarity index 100%
rename from docs/guides/07_new_op/kernel_primitive_api/images/compute_reduce.png
rename to docs/guides/new_op/kernel_primitive_api/images/compute_reduce.png
diff --git a/docs/guides/07_new_op/kernel_primitive_api/images/example_add.png b/docs/guides/new_op/kernel_primitive_api/images/example_add.png
similarity index 100%
rename from docs/guides/07_new_op/kernel_primitive_api/images/example_add.png
rename to docs/guides/new_op/kernel_primitive_api/images/example_add.png
diff --git a/docs/guides/07_new_op/kernel_primitive_api/images/example_reduce.png b/docs/guides/new_op/kernel_primitive_api/images/example_reduce.png
similarity index 100%
rename from docs/guides/07_new_op/kernel_primitive_api/images/example_reduce.png
rename to docs/guides/new_op/kernel_primitive_api/images/example_reduce.png
diff --git a/docs/guides/07_new_op/kernel_primitive_api/images/io_read_data.png b/docs/guides/new_op/kernel_primitive_api/images/io_read_data.png
similarity index 100%
rename from docs/guides/07_new_op/kernel_primitive_api/images/io_read_data.png
rename to docs/guides/new_op/kernel_primitive_api/images/io_read_data.png
diff --git a/docs/guides/07_new_op/kernel_primitive_api/images/io_read_data_broadcast.png b/docs/guides/new_op/kernel_primitive_api/images/io_read_data_broadcast.png
similarity index 100%
rename from docs/guides/07_new_op/kernel_primitive_api/images/io_read_data_broadcast.png
rename to docs/guides/new_op/kernel_primitive_api/images/io_read_data_broadcast.png
diff --git a/docs/guides/07_new_op/kernel_primitive_api/images/io_read_data_broadcast_stride.png b/docs/guides/new_op/kernel_primitive_api/images/io_read_data_broadcast_stride.png
similarity index 100%
rename from docs/guides/07_new_op/kernel_primitive_api/images/io_read_data_broadcast_stride.png
rename to docs/guides/new_op/kernel_primitive_api/images/io_read_data_broadcast_stride.png
diff --git a/docs/guides/07_new_op/kernel_primitive_api/images/io_read_data_reduce.png b/docs/guides/new_op/kernel_primitive_api/images/io_read_data_reduce.png
similarity index 100%
rename from docs/guides/07_new_op/kernel_primitive_api/images/io_read_data_reduce.png
rename to docs/guides/new_op/kernel_primitive_api/images/io_read_data_reduce.png
diff --git a/docs/guides/07_new_op/kernel_primitive_api/images/io_read_data_stride.png b/docs/guides/new_op/kernel_primitive_api/images/io_read_data_stride.png
similarity index 100%
rename from docs/guides/07_new_op/kernel_primitive_api/images/io_read_data_stride.png
rename to docs/guides/new_op/kernel_primitive_api/images/io_read_data_stride.png
diff --git a/docs/guides/07_new_op/kernel_primitive_api/images/io_write_data.png b/docs/guides/new_op/kernel_primitive_api/images/io_write_data.png
similarity index 100%
rename from docs/guides/07_new_op/kernel_primitive_api/images/io_write_data.png
rename to docs/guides/new_op/kernel_primitive_api/images/io_write_data.png
diff --git a/docs/guides/07_new_op/kernel_primitive_api/images/io_write_data_stride.png b/docs/guides/new_op/kernel_primitive_api/images/io_write_data_stride.png
similarity index 100%
rename from docs/guides/07_new_op/kernel_primitive_api/images/io_write_data_stride.png
rename to docs/guides/new_op/kernel_primitive_api/images/io_write_data_stride.png
diff --git a/docs/guides/07_new_op/kernel_primitive_api/index_cn.rst b/docs/guides/new_op/kernel_primitive_api/index_cn.rst
similarity index 100%
rename from docs/guides/07_new_op/kernel_primitive_api/index_cn.rst
rename to docs/guides/new_op/kernel_primitive_api/index_cn.rst
diff --git a/docs/guides/07_new_op/kernel_primitive_api/index_en.rst b/docs/guides/new_op/kernel_primitive_api/index_en.rst
similarity index 100%
rename from docs/guides/07_new_op/kernel_primitive_api/index_en.rst
rename to docs/guides/new_op/kernel_primitive_api/index_en.rst
diff --git a/docs/guides/07_new_op/kernel_primitive_api/io_api_cn.md b/docs/guides/new_op/kernel_primitive_api/io_api_cn.md
similarity index 100%
rename from docs/guides/07_new_op/kernel_primitive_api/io_api_cn.md
rename to docs/guides/new_op/kernel_primitive_api/io_api_cn.md
diff --git a/docs/guides/07_new_op/kernel_primitive_api/io_api_en.md b/docs/guides/new_op/kernel_primitive_api/io_api_en.md
similarity index 100%
rename from docs/guides/07_new_op/kernel_primitive_api/io_api_en.md
rename to docs/guides/new_op/kernel_primitive_api/io_api_en.md
diff --git a/docs/guides/07_new_op/kernel_primitive_api/reduce_example_cn.md b/docs/guides/new_op/kernel_primitive_api/reduce_example_cn.md
similarity index 100%
rename from docs/guides/07_new_op/kernel_primitive_api/reduce_example_cn.md
rename to docs/guides/new_op/kernel_primitive_api/reduce_example_cn.md
diff --git a/docs/guides/07_new_op/kernel_primitive_api/reduce_example_en.md b/docs/guides/new_op/kernel_primitive_api/reduce_example_en.md
similarity index 100%
rename from docs/guides/07_new_op/kernel_primitive_api/reduce_example_en.md
rename to docs/guides/new_op/kernel_primitive_api/reduce_example_en.md
diff --git a/docs/guides/07_new_op/new_custom_op_cn.md b/docs/guides/new_op/new_custom_op_cn.md
similarity index 100%
rename from docs/guides/07_new_op/new_custom_op_cn.md
rename to docs/guides/new_op/new_custom_op_cn.md
diff --git a/docs/guides/07_new_op/new_python_op_cn.md b/docs/guides/new_op/new_python_op_cn.md
similarity index 100%
rename from docs/guides/07_new_op/new_python_op_cn.md
rename to docs/guides/new_op/new_python_op_cn.md
diff --git a/docs/guides/01_paddle2.0_introduction/basic_concept/amp_cn.md b/docs/guides/performance_improving/amp_cn.md
similarity index 100%
rename from docs/guides/01_paddle2.0_introduction/basic_concept/amp_cn.md
rename to docs/guides/performance_improving/amp_cn.md
diff --git a/docs/guides/01_paddle2.0_introduction/basic_concept/amp_en.md b/docs/guides/performance_improving/amp_en.md
similarity index 100%
rename from docs/guides/01_paddle2.0_introduction/basic_concept/amp_en.md
rename to docs/guides/performance_improving/amp_en.md
diff --git a/docs/guides/performance_improving/index_cn.rst b/docs/guides/performance_improving/index_cn.rst
index 3ba00a463ab..13208283134 100644
--- a/docs/guides/performance_improving/index_cn.rst
+++ b/docs/guides/performance_improving/index_cn.rst
@@ -4,16 +4,13 @@
 
 你可以通过以下内容，了解飞桨框架性能调优相关的内容：
 
+- `自动混合精度训练 <./amp_cn.html>`_ : 使用飞桨框架进行自动混合精度训练。
 - `模型量化 <./quantization.html>`_ : 使用飞桨框架进行模型量化。
-
-..  toctree::
-    :hidden:
-
-    quantization.md
-
 - `模型性能分析 <./profiling_model.html>`_ : 使用飞桨性能分析器对模型进行性能调试。
 
 ..  toctree::
     :hidden:
 
+    amp_cn.md
+    quantization.md
     profiling_model.md
\ No newline at end of file

From 6fa2f3243bc98a8782706649a6b694af1381e670 Mon Sep 17 00:00:00 2001
From: TCChenlong <1300851984@qq.com>
Date: Thu, 12 May 2022 23:30:46 +0800
Subject: [PATCH 08/11] update guides index

---
 docs/guides/advanced/index_en.rst             |   2 +-
 docs/guides/advanced/visualdl_usage_cn.md     |   2 +-
 docs/guides/advanced/visualdl_usage_en.md     |   2 +-
 docs/guides/beginner/data_load_cn.ipynb       | 435 ++++++------
 docs/guides/beginner/index_cn.rst             |   2 +-
 docs/guides/beginner/index_en.rst             |   2 +-
 docs/guides/beginner/quick_start_cn.ipynb     | 495 +++++++-------
 .../beginner/train_eval_predict_cn.ipynb      | 622 +++++++++---------
 docs/guides/infer/index_en.rst                |   3 -
 docs/guides/jit/index_en.rst                  |   8 -
 docs/guides/model_convert/index_cn.rst        |   4 +-
 docs/guides/model_convert/index_en.rst        |   8 +-
 docs/guides/new_op/index_en.rst               |   6 -
 .../guides/performance_improving/index_en.rst |  10 +-
 14 files changed, 750 insertions(+), 851 deletions(-)

diff --git a/docs/guides/advanced/index_en.rst b/docs/guides/advanced/index_en.rst
index dcd63230c96..9981fbd4f0c 100644
--- a/docs/guides/advanced/index_en.rst
+++ b/docs/guides/advanced/index_en.rst
@@ -4,7 +4,7 @@ Advanced Guides
 
 - `Model Visualization <./visualdl_usage_en.html>`_
 - `Model and Layer <./layer_and_model_en.html>`_
-- `Gradient Clip./gradient_clip_en.html>`_
+- `Gradient Clip <./gradient_clip_en.html>`_
 
 ..  toctree::
     :hidden:
diff --git a/docs/guides/advanced/visualdl_usage_cn.md b/docs/guides/advanced/visualdl_usage_cn.md
index f01aa9ffdfc..e54eef5605f 100644
--- a/docs/guides/advanced/visualdl_usage_cn.md
+++ b/docs/guides/advanced/visualdl_usage_cn.md
@@ -1,4 +1,4 @@
-# VisualDL 使用指南
+# 模型可视化
 
 ### 概述
 
diff --git a/docs/guides/advanced/visualdl_usage_en.md b/docs/guides/advanced/visualdl_usage_en.md
index 6510c99f186..93d98b7bc00 100755
--- a/docs/guides/advanced/visualdl_usage_en.md
+++ b/docs/guides/advanced/visualdl_usage_en.md
@@ -1,4 +1,4 @@
-# VisualDL user guide
+# Model Visualization
 
 ## Overview
 
diff --git a/docs/guides/beginner/data_load_cn.ipynb b/docs/guides/beginner/data_load_cn.ipynb
index 12f1dbbb586..13c316d1409 100644
--- a/docs/guides/beginner/data_load_cn.ipynb
+++ b/docs/guides/beginner/data_load_cn.ipynb
@@ -2,8 +2,6 @@
  "cells": [
   {
    "cell_type": "markdown",
-   "id": "6baddc28",
-   "metadata": {},
    "source": [
     "# 数据集定义与加载\n",
     "\n",
@@ -19,84 +17,62 @@
     "\n",
     "\n",
     "本文以图像数据集为例介绍，文本数据集可参考 [NLP 应用实践](../../practices/nlp/index_cn.html)。"
-   ]
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "markdown",
-   "id": "0e6cbad6-1346-47d3-8bd1-e28cbb55c8ad",
-   "metadata": {},
    "source": [
     "## 一、定义数据集\n",
     "\n",
     "### 1.1 直接加载内置数据集\n",
     "\n",
     "飞桨框架在 [paddle.vision.datasets](../../api/paddle/vision/Overview_cn.html#api) 和 [paddle.text](../..//api/paddle/text/Overview_cn.html#api) 目录下内置了一些经典数据集可直接调用，通过以下代码可查看飞桨框架中的内置数据集。"
-   ]
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "code",
    "execution_count": 1,
-   "id": "4cc7b788-00c6-4e1b-a6ab-49317c861aa5",
-   "metadata": {
-    "execution": {
-     "iopub.execute_input": "2022-01-10T02:42:53.325579Z",
-     "iopub.status.busy": "2022-01-10T02:42:53.325030Z",
-     "iopub.status.idle": "2022-01-10T02:42:54.698658Z",
-     "shell.execute_reply": "2022-01-10T02:42:54.697869Z",
-     "shell.execute_reply.started": "2022-01-10T02:42:53.325539Z"
-    },
-    "scrolled": true
-   },
+   "source": [
+    "import paddle\n",
+    "print('计算机视觉（CV）相关数据集：', paddle.vision.datasets.__all__)\n",
+    "print('自然语言处理（NLP）相关数据集：', paddle.text.__all__)"
+   ],
    "outputs": [
     {
-     "name": "stdout",
      "output_type": "stream",
+     "name": "stdout",
      "text": [
       "计算机视觉（CV）相关数据集： ['DatasetFolder', 'ImageFolder', 'MNIST', 'FashionMNIST', 'Flowers', 'Cifar10', 'Cifar100', 'VOC2012']\n",
       "自然语言处理（NLP）相关数据集： ['Conll05st', 'Imdb', 'Imikolov', 'Movielens', 'UCIHousing', 'WMT14', 'WMT16', 'ViterbiDecoder', 'viterbi_decode']\n"
      ]
     }
    ],
-   "source": [
-    "import paddle\n",
-    "print('计算机视觉（CV）相关数据集：', paddle.vision.datasets.__all__)\n",
-    "print('自然语言处理（NLP）相关数据集：', paddle.text.__all__)"
-   ]
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2022-01-10T02:42:53.325579Z",
+     "iopub.status.busy": "2022-01-10T02:42:53.325030Z",
+     "iopub.status.idle": "2022-01-10T02:42:54.698658Z",
+     "shell.execute_reply": "2022-01-10T02:42:54.697869Z",
+     "shell.execute_reply.started": "2022-01-10T02:42:53.325539Z"
+    },
+    "scrolled": true
+   }
   },
   {
    "cell_type": "markdown",
-   "id": "9235d4f3-9e6f-4926-b003-da4eed882631",
-   "metadata": {},
    "source": [
     "从打印结果可以看到飞桨内置了 CV 领域的 MNIST、FashionMNIST、Flowers、Cifar10、Cifar100、VOC2012 数据集，以及 NLP 领域的 Conll05st、Imdb、Imikolov、Movielens、UCIHousing、WMT14、WMT16 数据集。\n",
     "\n",
     "\n",
     "以 [MNIST](../../api/paddle/vision/datasets/MNIST_cn.html) 数据集为例，加载内置数据集的代码示例如下所示。"
-   ]
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "code",
    "execution_count": 2,
-   "id": "5ddfd1f0-b188-4407-a331-7f7f622b805c",
-   "metadata": {
-    "execution": {
-     "iopub.execute_input": "2022-01-10T02:42:58.403305Z",
-     "iopub.status.busy": "2022-01-10T02:42:58.402126Z",
-     "iopub.status.idle": "2022-01-10T02:43:07.498070Z",
-     "shell.execute_reply": "2022-01-10T02:43:07.497331Z",
-     "shell.execute_reply.started": "2022-01-10T02:42:58.403262Z"
-    },
-    "scrolled": true
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "train images:  60000 , test images:  10000\n"
-     ]
-    }
-   ],
    "source": [
     "from paddle.vision.transforms import Normalize\n",
     "\n",
@@ -106,84 +82,97 @@
     "train_dataset = paddle.vision.datasets.MNIST(mode='train', transform=transform)\n",
     "test_dataset = paddle.vision.datasets.MNIST(mode='test', transform=transform)\n",
     "print('train images: ',len(train_dataset),', test images: ',len(test_dataset))"
-   ]
+   ],
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "train images:  60000 , test images:  10000\n"
+     ]
+    }
+   ],
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2022-01-10T02:42:58.403305Z",
+     "iopub.status.busy": "2022-01-10T02:42:58.402126Z",
+     "iopub.status.idle": "2022-01-10T02:43:07.498070Z",
+     "shell.execute_reply": "2022-01-10T02:43:07.497331Z",
+     "shell.execute_reply.started": "2022-01-10T02:42:58.403262Z"
+    },
+    "scrolled": true
+   }
   },
   {
    "cell_type": "markdown",
-   "id": "29fbad59-3234-41c6-82e6-da194972666a",
-   "metadata": {},
    "source": [
     "内置的 [MNIST](../../api/paddle/vision/datasets/MNIST_cn.html) 数据集已经划分好了训练集和测试集，通过 `mode` 字段传入 `'train'` 或 `'test'` 来区分。\n",
     "\n",
     "另外可通过 `transform` 字段传入一些对图像进行变换的操作，飞桨在 [paddle.vision.transforms](../..api/paddle/vision/Overview_cn.html#about-transforms) 下提供了一些常用的图像变换操作，如对图像进行中心裁剪、水平翻转图像和对图像进行归一化等。这里在初始化 MNIST 数据集时传入了 `Normalize` 变换对图像进行归一化，对图像进行归一化可以加快模型训练的收敛速度。"
-   ]
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "markdown",
-   "id": "79102100-e52e-42d6-9b17-48cdb9b59991",
-   "metadata": {},
    "source": [
     "完成数据集初始化之后，可以使用下面的代码直接对数据集进行迭代读取。"
-   ]
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "code",
    "execution_count": 3,
-   "id": "d1bbf911-41a1-452a-80f8-b19c4aec2939",
-   "metadata": {
-    "execution": {
-     "iopub.execute_input": "2022-01-10T02:50:30.296150Z",
-     "iopub.status.busy": "2022-01-10T02:50:30.294929Z",
-     "iopub.status.idle": "2022-01-10T02:50:30.465409Z",
-     "shell.execute_reply": "2022-01-10T02:50:30.464593Z",
-     "shell.execute_reply.started": "2022-01-10T02:50:30.296089Z"
-    },
-    "scrolled": true
-   },
+   "source": [
+    "from matplotlib import pyplot as plt\n",
+    "\n",
+    "for data in train_dataset:\n",
+    "    image, label = data\n",
+    "    print('shape of image: ',image.shape)\n",
+    "    plt.title(str(label))\n",
+    "    plt.imshow(image[0])    \n",
+    "    break"
+   ],
    "outputs": [
     {
-     "name": "stdout",
      "output_type": "stream",
+     "name": "stdout",
      "text": [
       "shape of image:  (1, 28, 28)\n"
      ]
     },
     {
+     "output_type": "display_data",
      "data": {
-      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAPsAAAEICAYAAACZA4KlAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAPW0lEQVR4nO3de4xc9XnG8eeJvZjYmMSOg+sQFzvglGsx6cqAsIAqCiUoEqAqECuKHErqNMFJaFwJSi+QilRulRARSpFMcTEV9wSEVdEk1IpwogaXhRowEG7GNDbGxmzBXH1Zv/1jx9Fidn67zJy5eN/vR1rtzHnPmfNq7GfPmfmdmZ8jQgDGvg90ugEA7UHYgSQIO5AEYQeSIOxAEoQdSIKwA0kQ9uRsh+03bX93lOt/p7Z+2B7f6v5QHXNRTW62Q9KciHh2n2VvSdr7n+O2iPjKkPosSc9L6omI3W1sF03gLzPqOX7oHwDs/ziNB5Ig7Khnte2XbN9VO23Hfo6wYzinSZol6UhJL0r6d96M2/8RdrxHRKyOiJ0R8aqkb0maLemoznaFZhF2jEZIcqebQHM4NcO72D5GUo+kxyR9UNKVkjZJerKTfaF5HNmxr+mSbpe0XdJ6Db52/1xE7OpkU2geF9UkZ/sdSTsk/TAi/mYU618u6duSJkiaFBEDLW4RFSHsQBKcxgNJEHYgiba+G3+AJ8SBmtTOXQKpvKM3tTN2DDtM2lTYbZ8p6WpJ4yT9S0QsLa1/oCbpRH+6mV0CKFgTq+rWGj6Ntz1O0rWSPivpaEkLbB/d6OMBaK1mXrPPk/RsRKyPiJ2SbpN0djVtAahaM2E/VNJvhtzfWFv2LrYX2e6z3bdLO5rYHYBmtPzd+IhYFhG9EdHbowmt3h2AOpoJ+yZJM4fc/3htGYAu1EzYH5Q0x/Zs2wdI+oKkldW0BaBqDQ+9RcRu24sl/VSDQ2/LI+LxyjoDUKmmxtkj4l5J91bUC4AW4nJZIAnCDiRB2IEkCDuQBGEHkiDsQBKEHUiCsANJEHYgCcIOJEHYgSQIO5AEYQeSIOxAEoQdSIKwA0kQdiAJwg4kQdiBJAg7kARhB5Ig7EAShB1IgrADSRB2IAnCDiRB2IEkCDuQBGEHkmhqFld0P48v/xOP++i0lu7/qb+YVbc2MHFPcdvDDt9arE/8uov1l646oG7t4d7bi9tuG3izWD/xziXF+hHffqBY74Smwm57g6TXJQ1I2h0RvVU0BaB6VRzZ/zAitlXwOABaiNfsQBLNhj0k/cz2Q7YXDbeC7UW2+2z37dKOJncHoFHNnsbPj4hNtg+RdJ/tX0fE6qErRMQyScsk6WBPjSb3B6BBTR3ZI2JT7fdWSXdLmldFUwCq13DYbU+yPXnvbUlnSFpXVWMAqtXMafx0SXfb3vs4t0TETyrpaowZd9ScYj0m9BTrL5724WL97ZPqjwlP/VB5vPgXx5fHmzvpP96aXKz/wz+dWayvOe6WurXnd71d3Hbpls8U6x/7xf73irThsEfEeknHV9gLgBZi6A1IgrADSRB2IAnCDiRB2IEk+IhrBQZO/1SxftWN1xbrn+yp/1HMsWxXDBTrf3vNl4v18W+Wh79OvnNx3drkTbuL207YVh6am9i3pljvRhzZgSQIO5AEYQeSIOxAEoQdSIKwA0kQdiAJxtkrMOGpF4v1h96ZWax/smdLle1Uasnmk4r19W+Uv4r6xsN/VLf22p7yOPn0H/5Xsd5K+98HWEfGkR1IgrADSRB2IAnCDiRB2IEkCDuQBGEHknBE+0YUD/bUONGfbtv+ukX/BScX69vPLH/d87hHDyrWH/n6Ne+7p72u3Pb7xfqDp5XH0Qdefa1Yj5PrfwHxhm8WN9XsBY+UV8B7rIlV2h79w85lzZEdSIKwA0kQdiAJwg4kQdiBJAg7kARhB5JgnL0LjJv2kWJ94JX+Yv35W+qPlT9+6vLitvP+/hvF+iHXdu4z5Xj/mhpnt73c9lbb64Ysm2r7PtvP1H5PqbJhANUbzWn8jZL2nfX+UkmrImKOpFW1+wC62Ihhj4jVkvY9jzxb0ora7RWSzqm2LQBVa/Q76KZHxOba7ZckTa+3ou1FkhZJ0oGa2ODuADSr6XfjY/Advrrv8kXEsojojYjeHk1odncAGtRo2LfYniFJtd9bq2sJQCs0GvaVkhbWbi+UdE817QBolRFfs9u+VdLpkqbZ3ijpcklLJd1h+0JJL0g6r5VNjnUD215pavtd2xuf3/2YLz5RrL983bjyA+wpz7GO7jFi2CNiQZ0SV8cA+xEulwWSIOxAEoQdSIKwA0kQdiAJpmweA4665Om6tQuOKw+a/Othq4r10z5/UbE++fYHinV0D47sQBKEHUiCsANJEHYgCcIOJEHYgSQIO5AE4+xjQGna5Fe+dlRx2/9d+XaxfumVNxXrf3neucV6/M+H6tZmfvdXxW3Vxq85z4AjO5AEYQeSIOxAEoQdSIKwA0kQdiAJwg4kwZTNyfX/ycnF+s2Xf69Ynz3+wIb3fcxNi4v1OddvLtZ3r9/Q8L7HqqambAYwNhB2IAnCDiRB2IEkCDuQBGEHkiDsQBKMs6MoTplbrB+8dGOxfusnftrwvo/8+VeK9d/7Tv3P8UvSwDPrG973/qqpcXbby21vtb1uyLIrbG+yvbb2c1aVDQOo3mhO42+UdOYwy38QEXNrP/dW2xaAqo0Y9ohYLam/Db0AaKFm3qBbbPvR2mn+lHor2V5ku8923y7taGJ3AJrRaNivk3S4pLmSNkv6fr0VI2JZRPRGRG+PJjS4OwDNaijsEbElIgYiYo+k6yXNq7YtAFVrKOy2Zwy5e66kdfXWBdAdRhxnt32rpNMlTZO0RdLltftzJYWkDZK+GhHlDx+LcfaxaNz0Q4r1F88/om5tzSVXF7f9wAjHoi8+f0ax/tr8V4r1sag0zj7iJBERsWCYxTc03RWAtuJyWSAJwg4kQdiBJAg7kARhB5LgI67omDs2lqdsnugDivW3Ymex/rlvXFz/se9eU9x2f8VXSQMg7EAWhB1IgrADSRB2IAnCDiRB2IEkRvzUG3LbM39usf7c58tTNh87d0Pd2kjj6CO5pv+EYn3iPX1NPf5Yw5EdSIKwA0kQdiAJwg4kQdiBJAg7kARhB5JgnH2Mc++xxfrT3yyPdV9/yopi/dQDy58pb8aO2FWsP9A/u/wAe0b8dvNUOLIDSRB2IAnCDiRB2IEkCDuQBGEHkiDsQBIjjrPbninpJknTNThF87KIuNr2VEm3S5qlwWmbz4uI/2tdq3mNn31Ysf7cBR+rW7vi/NuK2/7xQdsa6qkKl23pLdbvv/qkYn3KivL3zuPdRnNk3y1pSUQcLekkSRfZPlrSpZJWRcQcSatq9wF0qRHDHhGbI+Lh2u3XJT0p6VBJZ0vae3nVCknntKhHABV4X6/Zbc+SdIKkNZKmR8Te6xFf0uBpPoAuNeqw2z5I0o8lXRwR24fWYnDCuGEnjbO9yHaf7b5d2tFUswAaN6qw2+7RYNBvjoi7aou32J5Rq8+QtHW4bSNiWUT0RkRvjyZU0TOABowYdtuWdIOkJyPiqiGllZIW1m4vlHRP9e0BqMpoPuJ6iqQvSXrM9trassskLZV0h+0LJb0g6byWdDgGjJ/1u8X6a38wo1g//+9+Uqz/2YfvKtZbacnm8vDYr/65/vDa1Bv/u7jtlD0MrVVpxLBHxC8lDTvfsyQmWwf2E1xBByRB2IEkCDuQBGEHkiDsQBKEHUiCr5IepfEzfqdurX/5pOK2X5t9f7G+YPKWhnqqwuJN84v1h6+bW6xP+9G6Yn3q64yVdwuO7EAShB1IgrADSRB2IAnCDiRB2IEkCDuQRJpx9p1/VP7a4p1/3l+sX3bEvXVrZ3zwzYZ6qsqWgbfr1k5duaS47ZF//etifeqr5XHyPcUquglHdiAJwg4kQdiBJAg7kARhB5Ig7EAShB1IIs04+4Zzyn/Xnj7uzpbt+9pXDy/Wr77/jGLdA/W+yXvQkVc+X7c2Z8ua4rYDxSrGEo7sQBKEHUiCsANJEHYgCcIOJEHYgSQIO5CEI6K8gj1T0k2SpksKScsi4mrbV0j6U0kv11a9LCLqf+hb0sGeGieaWZ6BVlkTq7Q9+oe9MGM0F9XslrQkIh62PVnSQ7bvq9V+EBHfq6pRAK0zYtgjYrOkzbXbr9t+UtKhrW4MQLXe12t227MknSBp7zWYi20/anu57Sl1tllku8923y7taK5bAA0bddhtHyTpx5Iujojtkq6TdLikuRo88n9/uO0iYllE9EZEb48mNN8xgIaMKuy2ezQY9Jsj4i5JiogtETEQEXskXS9pXuvaBNCsEcNu25JukPRkRFw1ZPmMIaudK6k8nSeAjhrNu/GnSPqSpMdsr60tu0zSAttzNTgct0HSV1vQH4CKjObd+F9KGm7crjimDqC7cAUdkARhB5Ig7EAShB1IgrADSRB2IAnCDiRB2IEkCDuQBGEHkiDsQBKEHUiCsANJEHYgiRG/SrrSndkvS3phyKJpkra1rYH3p1t769a+JHprVJW9HRYRHx2u0Nawv2fndl9E9HasgYJu7a1b+5LorVHt6o3TeCAJwg4k0emwL+vw/ku6tbdu7Uuit0a1pbeOvmYH0D6dPrIDaBPCDiTRkbDbPtP2U7aftX1pJ3qox/YG24/ZXmu7r8O9LLe91fa6Icum2r7P9jO138POsdeh3q6wvan23K21fVaHeptp++e2n7D9uO1v1ZZ39Lkr9NWW563tr9ltj5P0tKTPSNoo6UFJCyLiibY2UoftDZJ6I6LjF2DYPlXSG5Juiohja8v+UVJ/RCyt/aGcEhGXdElvV0h6o9PTeNdmK5oxdJpxSedI+rI6+NwV+jpPbXjeOnFknyfp2YhYHxE7Jd0m6ewO9NH1ImK1pP59Fp8taUXt9goN/mdpuzq9dYWI2BwRD9duvy5p7zTjHX3uCn21RSfCfqik3wy5v1HdNd97SPqZ7YdsL+p0M8OYHhGba7dfkjS9k80MY8RpvNtpn2nGu+a5a2T682bxBt17zY+IT0n6rKSLaqerXSkGX4N109jpqKbxbpdhphn/rU4+d41Of96sToR9k6SZQ+5/vLasK0TEptrvrZLuVvdNRb1l7wy6td9bO9zPb3XTNN7DTTOuLnjuOjn9eSfC/qCkObZn2z5A0hckrexAH+9he1LtjRPZniTpDHXfVNQrJS2s3V4o6Z4O9vIu3TKNd71pxtXh567j059HRNt/JJ2lwXfkn5P0V53ooU5fn5D0SO3n8U73JulWDZ7W7dLgexsXSvqIpFWSnpH0n5KmdlFv/ybpMUmPajBYMzrU23wNnqI/Kmlt7eesTj93hb7a8rxxuSyQBG/QAUkQdiAJwg4kQdiBJAg7kARhB5Ig7EAS/w9pgMSoTFggTAAAAABJRU5ErkJggg==",
       "text/plain": [
        "<Figure size 432x288 with 1 Axes>"
-      ]
+      ],
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAPsAAAEICAYAAACZA4KlAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAPW0lEQVR4nO3de4xc9XnG8eeJvZjYmMSOg+sQFzvglGsx6cqAsIAqCiUoEqAqECuKHErqNMFJaFwJSi+QilRulRARSpFMcTEV9wSEVdEk1IpwogaXhRowEG7GNDbGxmzBXH1Zv/1jx9Fidn67zJy5eN/vR1rtzHnPmfNq7GfPmfmdmZ8jQgDGvg90ugEA7UHYgSQIO5AEYQeSIOxAEoQdSIKwA0kQ9uRsh+03bX93lOt/p7Z+2B7f6v5QHXNRTW62Q9KciHh2n2VvSdr7n+O2iPjKkPosSc9L6omI3W1sF03gLzPqOX7oHwDs/ziNB5Ig7Khnte2XbN9VO23Hfo6wYzinSZol6UhJL0r6d96M2/8RdrxHRKyOiJ0R8aqkb0maLemoznaFZhF2jEZIcqebQHM4NcO72D5GUo+kxyR9UNKVkjZJerKTfaF5HNmxr+mSbpe0XdJ6Db52/1xE7OpkU2geF9UkZ/sdSTsk/TAi/mYU618u6duSJkiaFBEDLW4RFSHsQBKcxgNJEHYgiba+G3+AJ8SBmtTOXQKpvKM3tTN2DDtM2lTYbZ8p6WpJ4yT9S0QsLa1/oCbpRH+6mV0CKFgTq+rWGj6Ntz1O0rWSPivpaEkLbB/d6OMBaK1mXrPPk/RsRKyPiJ2SbpN0djVtAahaM2E/VNJvhtzfWFv2LrYX2e6z3bdLO5rYHYBmtPzd+IhYFhG9EdHbowmt3h2AOpoJ+yZJM4fc/3htGYAu1EzYH5Q0x/Zs2wdI+oKkldW0BaBqDQ+9RcRu24sl/VSDQ2/LI+LxyjoDUKmmxtkj4l5J91bUC4AW4nJZIAnCDiRB2IEkCDuQBGEHkiDsQBKEHUiCsANJEHYgCcIOJEHYgSQIO5AEYQeSIOxAEoQdSIKwA0kQdiAJwg4kQdiBJAg7kARhB5Ig7EAShB1IgrADSRB2IAnCDiRB2IEkCDuQBGEHkmhqFld0P48v/xOP++i0lu7/qb+YVbc2MHFPcdvDDt9arE/8uov1l646oG7t4d7bi9tuG3izWD/xziXF+hHffqBY74Smwm57g6TXJQ1I2h0RvVU0BaB6VRzZ/zAitlXwOABaiNfsQBLNhj0k/cz2Q7YXDbeC7UW2+2z37dKOJncHoFHNnsbPj4hNtg+RdJ/tX0fE6qErRMQyScsk6WBPjSb3B6BBTR3ZI2JT7fdWSXdLmldFUwCq13DYbU+yPXnvbUlnSFpXVWMAqtXMafx0SXfb3vs4t0TETyrpaowZd9ScYj0m9BTrL5724WL97ZPqjwlP/VB5vPgXx5fHmzvpP96aXKz/wz+dWayvOe6WurXnd71d3Hbpls8U6x/7xf73irThsEfEeknHV9gLgBZi6A1IgrADSRB2IAnCDiRB2IEk+IhrBQZO/1SxftWN1xbrn+yp/1HMsWxXDBTrf3vNl4v18W+Wh79OvnNx3drkTbuL207YVh6am9i3pljvRhzZgSQIO5AEYQeSIOxAEoQdSIKwA0kQdiAJxtkrMOGpF4v1h96ZWax/smdLle1Uasnmk4r19W+Uv4r6xsN/VLf22p7yOPn0H/5Xsd5K+98HWEfGkR1IgrADSRB2IAnCDiRB2IEkCDuQBGEHknBE+0YUD/bUONGfbtv+ukX/BScX69vPLH/d87hHDyrWH/n6Ne+7p72u3Pb7xfqDp5XH0Qdefa1Yj5PrfwHxhm8WN9XsBY+UV8B7rIlV2h79w85lzZEdSIKwA0kQdiAJwg4kQdiBJAg7kARhB5JgnL0LjJv2kWJ94JX+Yv35W+qPlT9+6vLitvP+/hvF+iHXdu4z5Xj/mhpnt73c9lbb64Ysm2r7PtvP1H5PqbJhANUbzWn8jZL2nfX+UkmrImKOpFW1+wC62Ihhj4jVkvY9jzxb0ora7RWSzqm2LQBVa/Q76KZHxOba7ZckTa+3ou1FkhZJ0oGa2ODuADSr6XfjY/Advrrv8kXEsojojYjeHk1odncAGtRo2LfYniFJtd9bq2sJQCs0GvaVkhbWbi+UdE817QBolRFfs9u+VdLpkqbZ3ijpcklLJd1h+0JJL0g6r5VNjnUD215pavtd2xuf3/2YLz5RrL983bjyA+wpz7GO7jFi2CNiQZ0SV8cA+xEulwWSIOxAEoQdSIKwA0kQdiAJpmweA4665Om6tQuOKw+a/Othq4r10z5/UbE++fYHinV0D47sQBKEHUiCsANJEHYgCcIOJEHYgSQIO5AE4+xjQGna5Fe+dlRx2/9d+XaxfumVNxXrf3neucV6/M+H6tZmfvdXxW3Vxq85z4AjO5AEYQeSIOxAEoQdSIKwA0kQdiAJwg4kwZTNyfX/ycnF+s2Xf69Ynz3+wIb3fcxNi4v1OddvLtZ3r9/Q8L7HqqambAYwNhB2IAnCDiRB2IEkCDuQBGEHkiDsQBKMs6MoTplbrB+8dGOxfusnftrwvo/8+VeK9d/7Tv3P8UvSwDPrG973/qqpcXbby21vtb1uyLIrbG+yvbb2c1aVDQOo3mhO42+UdOYwy38QEXNrP/dW2xaAqo0Y9ohYLam/Db0AaKFm3qBbbPvR2mn+lHor2V5ku8923y7taGJ3AJrRaNivk3S4pLmSNkv6fr0VI2JZRPRGRG+PJjS4OwDNaijsEbElIgYiYo+k6yXNq7YtAFVrKOy2Zwy5e66kdfXWBdAdRhxnt32rpNMlTZO0RdLltftzJYWkDZK+GhHlDx+LcfaxaNz0Q4r1F88/om5tzSVXF7f9wAjHoi8+f0ax/tr8V4r1sag0zj7iJBERsWCYxTc03RWAtuJyWSAJwg4kQdiBJAg7kARhB5LgI67omDs2lqdsnugDivW3Ymex/rlvXFz/se9eU9x2f8VXSQMg7EAWhB1IgrADSRB2IAnCDiRB2IEkRvzUG3LbM39usf7c58tTNh87d0Pd2kjj6CO5pv+EYn3iPX1NPf5Yw5EdSIKwA0kQdiAJwg4kQdiBJAg7kARhB5JgnH2Mc++xxfrT3yyPdV9/yopi/dQDy58pb8aO2FWsP9A/u/wAe0b8dvNUOLIDSRB2IAnCDiRB2IEkCDuQBGEHkiDsQBIjjrPbninpJknTNThF87KIuNr2VEm3S5qlwWmbz4uI/2tdq3mNn31Ysf7cBR+rW7vi/NuK2/7xQdsa6qkKl23pLdbvv/qkYn3KivL3zuPdRnNk3y1pSUQcLekkSRfZPlrSpZJWRcQcSatq9wF0qRHDHhGbI+Lh2u3XJT0p6VBJZ0vae3nVCknntKhHABV4X6/Zbc+SdIKkNZKmR8Te6xFf0uBpPoAuNeqw2z5I0o8lXRwR24fWYnDCuGEnjbO9yHaf7b5d2tFUswAaN6qw2+7RYNBvjoi7aou32J5Rq8+QtHW4bSNiWUT0RkRvjyZU0TOABowYdtuWdIOkJyPiqiGllZIW1m4vlHRP9e0BqMpoPuJ6iqQvSXrM9trassskLZV0h+0LJb0g6byWdDgGjJ/1u8X6a38wo1g//+9+Uqz/2YfvKtZbacnm8vDYr/65/vDa1Bv/u7jtlD0MrVVpxLBHxC8lDTvfsyQmWwf2E1xBByRB2IEkCDuQBGEHkiDsQBKEHUiCr5IepfEzfqdurX/5pOK2X5t9f7G+YPKWhnqqwuJN84v1h6+bW6xP+9G6Yn3q64yVdwuO7EAShB1IgrADSRB2IAnCDiRB2IEkCDuQRJpx9p1/VP7a4p1/3l+sX3bEvXVrZ3zwzYZ6qsqWgbfr1k5duaS47ZF//etifeqr5XHyPcUquglHdiAJwg4kQdiBJAg7kARhB5Ig7EAShB1IIs04+4Zzyn/Xnj7uzpbt+9pXDy/Wr77/jGLdA/W+yXvQkVc+X7c2Z8ua4rYDxSrGEo7sQBKEHUiCsANJEHYgCcIOJEHYgSQIO5CEI6K8gj1T0k2SpksKScsi4mrbV0j6U0kv11a9LCLqf+hb0sGeGieaWZ6BVlkTq7Q9+oe9MGM0F9XslrQkIh62PVnSQ7bvq9V+EBHfq6pRAK0zYtgjYrOkzbXbr9t+UtKhrW4MQLXe12t227MknSBp7zWYi20/anu57Sl1tllku8923y7taK5bAA0bddhtHyTpx5Iujojtkq6TdLikuRo88n9/uO0iYllE9EZEb48mNN8xgIaMKuy2ezQY9Jsj4i5JiogtETEQEXskXS9pXuvaBNCsEcNu25JukPRkRFw1ZPmMIaudK6k8nSeAjhrNu/GnSPqSpMdsr60tu0zSAttzNTgct0HSV1vQH4CKjObd+F9KGm7crjimDqC7cAUdkARhB5Ig7EAShB1IgrADSRB2IAnCDiRB2IEkCDuQBGEHkiDsQBKEHUiCsANJEHYgiRG/SrrSndkvS3phyKJpkra1rYH3p1t769a+JHprVJW9HRYRHx2u0Nawv2fndl9E9HasgYJu7a1b+5LorVHt6o3TeCAJwg4k0emwL+vw/ku6tbdu7Uuit0a1pbeOvmYH0D6dPrIDaBPCDiTRkbDbPtP2U7aftX1pJ3qox/YG24/ZXmu7r8O9LLe91fa6Icum2r7P9jO138POsdeh3q6wvan23K21fVaHeptp++e2n7D9uO1v1ZZ39Lkr9NWW563tr9ltj5P0tKTPSNoo6UFJCyLiibY2UoftDZJ6I6LjF2DYPlXSG5Juiohja8v+UVJ/RCyt/aGcEhGXdElvV0h6o9PTeNdmK5oxdJpxSedI+rI6+NwV+jpPbXjeOnFknyfp2YhYHxE7Jd0m6ewO9NH1ImK1pP59Fp8taUXt9goN/mdpuzq9dYWI2BwRD9duvy5p7zTjHX3uCn21RSfCfqik3wy5v1HdNd97SPqZ7YdsL+p0M8OYHhGba7dfkjS9k80MY8RpvNtpn2nGu+a5a2T682bxBt17zY+IT0n6rKSLaqerXSkGX4N109jpqKbxbpdhphn/rU4+d41Of96sToR9k6SZQ+5/vLasK0TEptrvrZLuVvdNRb1l7wy6td9bO9zPb3XTNN7DTTOuLnjuOjn9eSfC/qCkObZn2z5A0hckrexAH+9he1LtjRPZniTpDHXfVNQrJS2s3V4o6Z4O9vIu3TKNd71pxtXh567j059HRNt/JJ2lwXfkn5P0V53ooU5fn5D0SO3n8U73JulWDZ7W7dLgexsXSvqIpFWSnpH0n5KmdlFv/ybpMUmPajBYMzrU23wNnqI/Kmlt7eesTj93hb7a8rxxuSyQBG/QAUkQdiAJwg4kQdiBJAg7kARhB5Ig7EAS/w9pgMSoTFggTAAAAABJRU5ErkJggg=="
      },
      "metadata": {
       "needs_background": "light"
-     },
-     "output_type": "display_data"
+     }
     }
    ],
-   "source": [
-    "from matplotlib import pyplot as plt\n",
-    "\n",
-    "for data in train_dataset:\n",
-    "    image, label = data\n",
-    "    print('shape of image: ',image.shape)\n",
-    "    plt.title(str(label))\n",
-    "    plt.imshow(image[0])    \n",
-    "    break"
-   ]
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2022-01-10T02:50:30.296150Z",
+     "iopub.status.busy": "2022-01-10T02:50:30.294929Z",
+     "iopub.status.idle": "2022-01-10T02:50:30.465409Z",
+     "shell.execute_reply": "2022-01-10T02:50:30.464593Z",
+     "shell.execute_reply.started": "2022-01-10T02:50:30.296089Z"
+    },
+    "scrolled": true
+   }
   },
   {
    "cell_type": "markdown",
-   "id": "95b8738b-35a3-4748-8aa6-624e17bed362",
-   "metadata": {},
    "source": [
     "### 1.2 使用 paddle.io.Dataset 自定义数据集"
-   ]
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "markdown",
-   "id": "ab2f3fb4",
-   "metadata": {},
    "source": [
     "在实际的场景中，一般需要使用自有的数据来定义数据集，这时可以通过 [paddle.io.Dataset](../../api/paddle/io/Dataset_cn.html#dataset) 基类来实现自定义数据集。\n",
     "\n",
@@ -195,12 +184,18 @@
     "\n",
     "下面介绍下载 MNIST 原始数据集文件后，用 `paddle.io.Dataset` 定义数据集的代码示例。\n",
     "\n"
-   ]
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "f9487da0",
+   "source": [
+    "# 下载原始的 MNIST 数据集并解压\n",
+    "! wget https://paddle-imagenet-models-name.bj.bcebos.com/data/mnist.tar\n",
+    "! tar -xf mnist.tar"
+   ],
+   "outputs": [],
    "metadata": {
     "execution": {
      "iopub.execute_input": "2022-01-28T05:34:50.747072Z",
@@ -210,37 +205,11 @@
      "shell.execute_reply.started": "2022-01-28T05:34:50.747033Z"
     },
     "scrolled": true
-   },
-   "outputs": [],
-   "source": [
-    "# 下载原始的 MNIST 数据集并解压\n",
-    "! wget https://paddle-imagenet-models-name.bj.bcebos.com/data/mnist.tar\n",
-    "! tar -xf mnist.tar"
-   ]
+   }
   },
   {
    "cell_type": "code",
    "execution_count": 4,
-   "id": "1d26950f",
-   "metadata": {
-    "execution": {
-     "iopub.execute_input": "2022-01-28T05:37:13.849337Z",
-     "iopub.status.busy": "2022-01-28T05:37:13.848816Z",
-     "iopub.status.idle": "2022-01-28T05:37:13.868808Z",
-     "shell.execute_reply": "2022-01-28T05:37:13.867867Z",
-     "shell.execute_reply.started": "2022-01-28T05:37:13.849276Z"
-    },
-    "scrolled": true
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "train_custom_dataset images:  60000 test_custom_dataset images:  10000\n"
-     ]
-    }
-   ],
    "source": [
     "import os\n",
     "import cv2\n",
@@ -295,12 +264,29 @@
     "train_custom_dataset = MyDataset('mnist/train','mnist/train/label.txt', transform)\n",
     "test_custom_dataset = MyDataset('mnist/val','mnist/val/label.txt', transform)\n",
     "print('train_custom_dataset images: ',len(train_custom_dataset), 'test_custom_dataset images: ',len(test_custom_dataset))"
-   ]
+   ],
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "train_custom_dataset images:  60000 test_custom_dataset images:  10000\n"
+     ]
+    }
+   ],
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2022-01-28T05:37:13.849337Z",
+     "iopub.status.busy": "2022-01-28T05:37:13.848816Z",
+     "iopub.status.idle": "2022-01-28T05:37:13.868808Z",
+     "shell.execute_reply": "2022-01-28T05:37:13.867867Z",
+     "shell.execute_reply.started": "2022-01-28T05:37:13.849276Z"
+    },
+    "scrolled": true
+   }
   },
   {
    "cell_type": "markdown",
-   "id": "0e705d33",
-   "metadata": {},
    "source": [
     "在上面的代码中，自定义了一个数据集类 `MyDataset`，`MyDataset` 继承自 `paddle.io.Dataset` 基类 ，并且实现了 `__init__`,`__getitem__` 和 `__len__` 三个函数。\n",
     "* 在 `__init__` 函数中完成了对标签文件的读取和解析，并将所有的图像路径 `image_path` 和对应的标签 `label` 存放到一个列表 `data_list` 中。\n",
@@ -310,76 +296,62 @@
     "\n",
     "\n",
     "\n",
-    "另外，在 `__init__` 函数和 `__getitem__` 函数中还可实现一些数据预处理操作，如对图像的翻转、裁剪、归一化等操作，最终返回处理好的单条数据（样本数据、对应的标签），该操作可增加图像数据多样性，对增强模型的泛化能力带来帮助。飞桨框架在 [paddle.vision.transforms](../..api/paddle/vision/Overview_cn.html#about-transforms)  下内置了几十种图像数据处理方法，详细使用方法可参考 [数据预处理](03_data_preprocessing_cn.html) 章节。\n",
+    "另外，在 `__init__` 函数和 `__getitem__` 函数中还可实现一些数据预处理操作，如对图像的翻转、裁剪、归一化等操作，最终返回处理好的单条数据（样本数据、对应的标签），该操作可增加图像数据多样性，对增强模型的泛化能力带来帮助。飞桨框架在 [paddle.vision.transforms](../..api/paddle/vision/Overview_cn.html#about-transforms)  下内置了几十种图像数据处理方法，详细使用方法可参考 [数据预处理](data_preprocessing_cn.html) 章节。\n",
     "\n",
     "和内置数据集类似，可以使用下面的代码直接对自定义数据集进行迭代读取。"
-   ]
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "code",
    "execution_count": 5,
-   "id": "9d1570a3",
-   "metadata": {
-    "scrolled": true
-   },
+   "source": [
+    "for data in train_custom_dataset:\n",
+    "    image, label = data\n",
+    "    print('shape of image: ',image.shape)\n",
+    "    plt.title(str(label))\n",
+    "    plt.imshow(image[0])    \n",
+    "    break"
+   ],
    "outputs": [
     {
-     "name": "stdout",
      "output_type": "stream",
+     "name": "stdout",
      "text": [
       "shape of image:  (1, 28, 28)\n"
      ]
     },
     {
+     "output_type": "display_data",
      "data": {
-      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAPsAAAEICAYAAACZA4KlAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAUp0lEQVR4nO3de3Cc1XkG8OfZlXyTLXxFKODEhjEY06QmCOM2hJIyIcC0GDpTJrRh3A6paIFMmOGPAm0nZEKntJOQMhQyKJhgWm6ZBopJDQE8JQwtOAjqYIO5GLDBQpYwvskXWdrdt39oTQXoe4/Yb3e/dc7zm9FY2ne/3eOVHn2rffecQzODiPzmy2U9ABGpD4VdJBIKu0gkFHaRSCjsIpFQ2EUiobCLREJhlzGRfIrkIMm95Y/Xsh6TpKOwi+dKM5ta/jgh68FIOgq7SCQUdvH8A8ntJP+b5JlZD0bSod4bL2MheRqAVwAMAfg6gH8BsNjM3sx0YFIxhV3GheRjAP7TzG7JeixSGT2Nl/EyAMx6EFI5hV0+geR0kl8jOYlkE8k/BXAGgMeyHptUrinrAUhDagZwA4CFAIoAXgVwgZm9numoJBX9zS4SCT2NF4mEwi4SCYVdJBIKu0gk6vpq/AROtEloqedd/r9ghzh0BeeFTL3GWRv0vycM1K1USj622f/Rt+GCW29Ug9iHITs45gOTKuwkzwFwM4A8gDvM7Ebv+pPQgtN4Vpo7rPzQfN6/QqheLCaWrJDxD0aKxwVpuzGh+05x+2ye4NcnNLv10r59ibWm2W3usYVtfW69Ua21NYm1ip/Gk8wDuBXAuQAWAbiY5KJKb09EaivN3+xLAGwys7fMbAjA/QCWVWdYIlJtacJ+NIB3R329tXzZR5DsJNlNsnsYB1PcnYikUfNX482sy8w6zKyjGRNrfXcikiBN2HsAzB319THly0SkAaUJ+/MAFpCcT3ICRhY4WFWdYYlItVXcejOzAskrAfwCI623O83s5VSjyQXaX55ScmsMGEd7LOv2WVbStO0QbmmmaUtaYdivDw9VfNuh1lpu2jT/vof8+7aDjff6VKo+u5mtBrC6SmMRkRrS22VFIqGwi0RCYReJhMIuEgmFXSQSCrtIJOq/umyaaaq55GMteepy7QXeH5ALTMUMLvrpTK8FAPPqNV5QNNRHz02ZklgrDfq96NDjxgmBKbBTJifWQn320t69br3Wj2st6MwuEgmFXSQSCrtIJBR2kUgo7CKRUNhFIlHf1hv9KZGhNo7bXkszPRYITpH1WoahaZ6lwHTH4DTRUqDNk6YNFHjcvHYnEGj7ASjt3/+ph/ThsYOB78ngoF/fs6fi+w49pvkZM9x6cefOyu+7RnRmF4mEwi4SCYVdJBIKu0gkFHaRSCjsIpFQ2EUiUecprvR3S63hcs7BLXqHAnNknb5rmiWNgXH00WspMDfYCunGxonJuwCFlltuaj/KrRd6t7n1fNuRibXSjl3uscHvaaZzqiujM7tIJBR2kUgo7CKRUNhFIqGwi0RCYReJhMIuEom69tkJgM688FQd3VC/eDjQw89yaeDAXHqvVw0AZPKSy5ycvJwyALB1qltHLnA+GPK3VX6zc17yfS8acI+dOW2fW59+efJtA8Br30uec77pK4+7x4Ycf/dfufX51zyb6vZrIVXYSW4GMACgCKBgZh3VGJSIVF81zuxfMbPtVbgdEakh/c0uEom0YTcAj5N8gWTnWFcg2Umym2T3EPz3QotI7aR9Gn+6mfWQPBLAEyRfNbOnR1/BzLoAdAHAEblZh98GWSK/IVKd2c2sp/xvP4CHACypxqBEpPoqDjvJFpLTDn0O4GwAG6o1MBGprjRP49sAPFTumzcBuNfMHvMOMFhwbfiKMfB7K7QufPD2k98fkJ892z92RqtbPnDsTLe+c4G/dfGeBcn/t/mLet1jL//sU2793Cl+o2VKzt822bO35K/73rVrkVt/5PbPu/VNJ/0ksdZb8Ldk/vY7y9z6Uc+m/HnKQMVhN7O3APx2FcciIjWk1ptIJBR2kUgo7CKRUNhFIqGwi0SivktJW3hbZleKbZMtZestP316Yq10TPKSxQDw9w8mt4AA4Ngm/zGZkZ/i1ovO9N58qCUZ5LfWdpcOuPU8kr9nucC5ZsU957j1ydv9N2QuPfCXibWWXn9qbu6g//My+ZlfufVGpDO7SCQUdpFIKOwikVDYRSKhsItEQmEXiYTCLhKJOm/ZnFINt01Gzu/Tl/bvT6zx9c3usc8dOM6tnzL9Xbe+v+T/394uJPeET5rgLyUdmur5b3v8iY1r+he69QeO//fE2qD5ve65N73g1m0otK1y8s9Lbor/3gXv+w0AbPbff5D657EGdGYXiYTCLhIJhV0kEgq7SCQUdpFIKOwikVDYRSJxePXZs1Ry5k4X/bnPP7npD9z6Tb/nL6k86RW/V/7yt25z657vv3+Gf9tL/R8RG+5x6xctGXNXMADA69+c5B57QvF/3Xpom22vl86mdD/6uRb/e1LcpT67iGREYReJhMIuEgmFXSQSCrtIJBR2kUgo7CKRqH+f3Vn7PdQ3rSln7XUAsBTLzs95wN+2vu3RaW690POeW1948iWJtftPvcM99sm7l7r19pw/pzzoV+sTSyfumO8eWgztMZBmDYKJE/3bDiju2p3q+CwEz+wk7yTZT3LDqMtmknyC5Bvlf2fUdpgiktZ4nsbfBeDjW3NcA2CNmS0AsKb8tYg0sGDYzexpADs+dvEyACvLn68EcEF1hyUi1Vbp3+xtZtZb/nwbgLakK5LsBNAJAJPgr/slIrWT+tV4MzMAia+smVmXmXWYWUcz0r0oIiKVqzTsfSTbAaD8b3/1hiQitVBp2FcBWF7+fDmAh6szHBGpleDf7CTvA3AmgNkktwL4DoAbAfyU5KUAtgC4aNz36O4X7ve6a9qHD92202i3UI9+wJ/bXBoYcOv51la3fnBn8rzwxYF+8twL33br9h/+3vOFLf6a9/kZyV3Z4ib/vt33ZADItQTWfnceVzt40D3WGzcAFHfudOuNKBh2M7s4oXRWlcciIjWkt8uKREJhF4mEwi4SCYVdJBIKu0gk6j/FNdCmOiyFljRuaXHrpX37Ut398Zc9n1j7oxO/6h778+Mfdeu/f8I33XpzoPWWpkXVdPRn3Hpo6q+3XLQFps+Gxh2aIhtq7WVBZ3aRSCjsIpFQ2EUiobCLREJhF4mEwi4SCYVdJBIZ9NkzXC46I6E+eqhnW9yzx63npx+RWNv93c+6x/bf5Y/tqlvvdevffdXfjvrA2tmJtXm3bnSPLWz1t4MOyU1LXqI79RRVbwvvBqUzu0gkFHaRSCjsIpFQ2EUiobCLREJhF4mEwi4SCVod+96tnGmnMb5Fab151QCAvL/1cC5lH94z+IdL3Povb+9y6zuLydsiA8CMfPJyz4tuu9w9dv69/nz1wttb3Hqq93QEtoNGKcUe3jW01tZgj+0Ycw1undlFIqGwi0RCYReJhMIuEgmFXSQSCrtIJBR2kUioz94AQn14C82ddnq+oTXrkfN/3xe/cJxbb7qh362vPmF1Yq2/6M+l/51fXunWF/7tB269sPmdxFrwMQ+sK5+blLxNNgCUBgfdeq2k6rOTvJNkP8kNoy67nmQPyXXlj/OqOWARqb7xPI2/C8A5Y1z+QzNbXP5I/vUtIg0hGHYzexrAjjqMRURqKM0LdFeSfKn8NH9G0pVIdpLsJtk9jMbb/0okFpWG/UcAjgOwGEAvgB8kXdHMusysw8w6muFP6BCR2qko7GbWZ2ZFMysB+DEAf+qUiGSuorCTbB/15YUANiRdV0QaQ7DPTvI+AGcCmA2gD8B3yl8vBmAANgO4zMx6Q3d2RG6WLZ2U3KUrDQ37N+D0k9P2TUPYPCH5toeHanbb45H2/l2Bed351qlu/b1LTkqsrbv2toqGdMhZr5zv1pvP3ZZYCz1mh+P+64DfZw9uEmFmF49x8YrUoxKRutLbZUUiobCLREJhF4mEwi4SCYVdJBJ13bLZzGo29a/WrTevVZN2GmlpYKCSIX3I27LZAu1MG0rXgirtO+DW2275n+Tite6hKFrJrf/d/J+79Wv/uDOx1nrfWvfYUGstP2umWy9+0HjTSXRmF4mEwi4SCYVdJBIKu0gkFHaRSCjsIpFQ2EUiUdc+O0l3CV5O8Kd6lg4k9+jT9u/zc+b4V3D6rqEtk0NTWEN9+tI+f8nl4q7dzo2n23rYvniiW3/jG8lbMgPA6aduTKxtHPK3ez62udmt7yq1uvXWe59LrIXeP5Cb7C8V3Yh99BCd2UUiobCLREJhF4mEwi4SCYVdJBIKu0gkFHaRSDTWfPZAr9zrV4f6pij6/eTi++/7xzvS9slDyxrnZyTurjVy+/uT+9W5eXPdYzde7d/2w2ff4tan5/x1AtrzkxNruwNbUU+k32e/o+fLbn1ks6KxhearFwP1pvaj3HqhN3kZ66zozC4SCYVdJBIKu0gkFHaRSCjsIpFQ2EUiobCLRCLYZyc5F8DdANowskVzl5ndTHImgAcAzMPIts0XmdnO2g0VYD75d1NoPnuoD++tvQ4Axe0fJN+30+cGxtEn3+v34THB7ze/c8UpibUbl9/lHnt+iz/23sB6++1N/pbNnhW7Frv1Vd87y623PvqyW6/lNtuN2EcPGc+ZvQDgajNbBGApgCtILgJwDYA1ZrYAwJry1yLSoIJhN7NeM3ux/PkAgI0AjgawDMDK8tVWArigRmMUkSr4VH+zk5wH4GQAawG0mdmh9yNuw8jTfBFpUOMOO8mpAH4G4Coz+8iia2ZmGPl7fqzjOkl2k+wehv9+YxGpnXGFnWQzRoJ+j5k9WL64j2R7ud4OoH+sY82sy8w6zKyjGYHJKiJSM8GwkySAFQA2mtlNo0qrACwvf74cwMPVH56IVMt4prh+CcAlANaTXFe+7DoANwL4KclLAWwBcFHohpjPI9+a3OLylooGENz62BOc0hjYutjT1Hakf9tO2w4Atv/5qW79dy/rdusPt/vTUH2BpaYDvrH5TLf+5i0LE2sznnzTPXbq+8lLQQOAv6Gzz1vSHABygVZsYVtfinvPRjDsZvYMACaU/UaoiDQMvYNOJBIKu0gkFHaRSCjsIpFQ2EUiobCLRKKuS0mjVIINJve7Q71wb+Hh0DTSkOJOf3Zu07HzEmtv/eM099jbT/mFW5/X9JRbn8KkzueIZvpLWXsufed0t/7rOz7v1o98xO+Vt/Yl98r9xb2Bps/5y2AX3/N73d6U6JBQHz20DXfaKbS1oDO7SCQUdpFIKOwikVDYRSKhsItEQmEXiYTCLhKJhtqyObjtsrPFb6hPfmDZErfe/C1/aeDzP/NCYu2K6e+6x+4s+ss1T8n5/+/Q1sXPDSZ3rP/kkSvcY0/85+RtjQFg1tvPuvVQr9yTmzLFrRe2+I9r2q2yPfk5c9x6mi2+s6Izu0gkFHaRSCjsIpFQ2EUiobCLREJhF4mEwi4SifrOZweAXPI65aH57J6mY4526+992f+9tnrBfW79+Obknm5/0e/n3tB3plt/ZP0X3Hpzrz93esHtWxNrC3dvdI8t7Nrt1kPY5P8ImbPlc2ir65A0ffSQw7GPHqIzu0gkFHaRSCjsIpFQ2EUiobCLREJhF4mEwi4SCZp5q7EDJOcCuBtAG0aWbu8ys5tJXg/gLwAcakheZ2arvdtqzc20pU1fS6znpvrzk21oOLGWtmebm+av/Y5h576dOfoAkJ81060XP9jh1kPvIShs7UmspV7fPLBmPQI/P1Jfa20N9tiOMb9p43lTTQHA1Wb2IslpAF4g+US59kMz+361BioitRMMu5n1Augtfz5AciMA/1QjIg3nU/3NTnIegJMBrC1fdCXJl0jeSXLM/ZdIdpLsJtk9bJW/HVZE0hl32ElOBfAzAFeZ2R4APwJwHIDFGDnz/2Cs48ysy8w6zKyjmYE15kSkZsYVdpLNGAn6PWb2IACYWZ+ZFc2sBODHAPwVHUUkU8GwkySAFQA2mtlNoy5vH3W1CwFsqP7wRKRaxvNq/JcAXAJgPcl15cuuA3AxycUYacdtBnBZ8JbMn/JYTDHdMthiKvqLHpcGBiq+79ykSW491FoLKbznL3PtsUJyyxAIL9+dZtqxNJbxvBr/DICx+nZuT11EGoveQScSCYVdJBIKu0gkFHaRSCjsIpFQ2EUiUdelpNnchKbZbYn1wra+im87OFWzhtJOcfW2ogaA4u49bt1bztl7X8N47jtEffrDh87sIpFQ2EUiobCLREJhF4mEwi4SCYVdJBIKu0gkgktJV/XOyPcBbBl10WwA2+s2gE+nUcfWqOMCNLZKVXNsnzOzOWMV6hr2T9w52W1mHZkNwNGoY2vUcQEaW6XqNTY9jReJhMIuEomsw96V8f17GnVsjTouQGOrVF3Glunf7CJSP1mf2UWkThR2kUhkEnaS55B8jeQmktdkMYYkJDeTXE9yHcnujMdyJ8l+khtGXTaT5BMk3yj/O+YeexmN7XqSPeXHbh3J8zIa21yS/0XyFZIvk/x2+fJMHztnXHV53Or+NzvJPIDXAXwVwFYAzwO42MxeqetAEpDcDKDDzDJ/AwbJMwDsBXC3mf1W+bJ/ArDDzG4s/6KcYWZ/3SBjux7A3qy38S7vVtQ+eptxABcA+DNk+Ng547oIdXjcsjizLwGwyczeMrMhAPcDWJbBOBqemT0N4OPbySwDsLL8+UqM/LDUXcLYGoKZ9ZrZi+XPBwAc2mY808fOGVddZBH2owG8O+rrrWis/d4NwOMkXyDZmfVgxtBmZr3lz7cBSF7nKxvBbbzr6WPbjDfMY1fJ9udp6QW6TzrdzL4I4FwAV5SfrjYkG/kbrJF6p+Paxrtexthm/ENZPnaVbn+eVhZh7wEwd9TXx5Qvawhm1lP+tx/AQ2i8raj7Du2gW/63P+PxfKiRtvEea5txNMBjl+X251mE/XkAC0jOJzkBwNcBrMpgHJ9AsqX8wglItgA4G423FfUqAMvLny8H8HCGY/mIRtnGO2mbcWT82GW+/bmZ1f0DwHkYeUX+TQB/k8UYEsZ1LIBflz9eznpsAO7DyNO6YYy8tnEpgFkA1gB4A8CTAGY20Nj+FcB6AC9hJFjtGY3tdIw8RX8JwLryx3lZP3bOuOryuOntsiKR0At0IpFQ2EUiobCLREJhF4mEwi4SCYVdJBIKu0gk/g81xM9ks5Ld8AAAAABJRU5ErkJggg==",
       "text/plain": [
        "<Figure size 432x288 with 1 Axes>"
-      ]
+      ],
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAPsAAAEICAYAAACZA4KlAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAUp0lEQVR4nO3de3Cc1XkG8OfZlXyTLXxFKODEhjEY06QmCOM2hJIyIcC0GDpTJrRh3A6paIFMmOGPAm0nZEKntJOQMhQyKJhgWm6ZBopJDQE8JQwtOAjqYIO5GLDBQpYwvskXWdrdt39oTQXoe4/Yb3e/dc7zm9FY2ne/3eOVHn2rffecQzODiPzmy2U9ABGpD4VdJBIKu0gkFHaRSCjsIpFQ2EUiobCLREJhlzGRfIrkIMm95Y/Xsh6TpKOwi+dKM5ta/jgh68FIOgq7SCQUdvH8A8ntJP+b5JlZD0bSod4bL2MheRqAVwAMAfg6gH8BsNjM3sx0YFIxhV3GheRjAP7TzG7JeixSGT2Nl/EyAMx6EFI5hV0+geR0kl8jOYlkE8k/BXAGgMeyHptUrinrAUhDagZwA4CFAIoAXgVwgZm9numoJBX9zS4SCT2NF4mEwi4SCYVdJBIKu0gk6vpq/AROtEloqedd/r9ghzh0BeeFTL3GWRv0vycM1K1USj622f/Rt+GCW29Ug9iHITs45gOTKuwkzwFwM4A8gDvM7Ebv+pPQgtN4Vpo7rPzQfN6/QqheLCaWrJDxD0aKxwVpuzGh+05x+2ye4NcnNLv10r59ibWm2W3usYVtfW69Ua21NYm1ip/Gk8wDuBXAuQAWAbiY5KJKb09EaivN3+xLAGwys7fMbAjA/QCWVWdYIlJtacJ+NIB3R329tXzZR5DsJNlNsnsYB1PcnYikUfNX482sy8w6zKyjGRNrfXcikiBN2HsAzB319THly0SkAaUJ+/MAFpCcT3ICRhY4WFWdYYlItVXcejOzAskrAfwCI623O83s5VSjyQXaX55ScmsMGEd7LOv2WVbStO0QbmmmaUtaYdivDw9VfNuh1lpu2jT/vof8+7aDjff6VKo+u5mtBrC6SmMRkRrS22VFIqGwi0RCYReJhMIuEgmFXSQSCrtIJOq/umyaaaq55GMteepy7QXeH5ALTMUMLvrpTK8FAPPqNV5QNNRHz02ZklgrDfq96NDjxgmBKbBTJifWQn320t69br3Wj2st6MwuEgmFXSQSCrtIJBR2kUgo7CKRUNhFIlHf1hv9KZGhNo7bXkszPRYITpH1WoahaZ6lwHTH4DTRUqDNk6YNFHjcvHYnEGj7ASjt3/+ph/ThsYOB78ngoF/fs6fi+w49pvkZM9x6cefOyu+7RnRmF4mEwi4SCYVdJBIKu0gkFHaRSCjsIpFQ2EUiUecprvR3S63hcs7BLXqHAnNknb5rmiWNgXH00WspMDfYCunGxonJuwCFlltuaj/KrRd6t7n1fNuRibXSjl3uscHvaaZzqiujM7tIJBR2kUgo7CKRUNhFIqGwi0RCYReJhMIuEom69tkJgM688FQd3VC/eDjQw89yaeDAXHqvVw0AZPKSy5ycvJwyALB1qltHLnA+GPK3VX6zc17yfS8acI+dOW2fW59+efJtA8Br30uec77pK4+7x4Ycf/dfufX51zyb6vZrIVXYSW4GMACgCKBgZh3VGJSIVF81zuxfMbPtVbgdEakh/c0uEom0YTcAj5N8gWTnWFcg2Umym2T3EPz3QotI7aR9Gn+6mfWQPBLAEyRfNbOnR1/BzLoAdAHAEblZh98GWSK/IVKd2c2sp/xvP4CHACypxqBEpPoqDjvJFpLTDn0O4GwAG6o1MBGprjRP49sAPFTumzcBuNfMHvMOMFhwbfiKMfB7K7QufPD2k98fkJ892z92RqtbPnDsTLe+c4G/dfGeBcn/t/mLet1jL//sU2793Cl+o2VKzt822bO35K/73rVrkVt/5PbPu/VNJ/0ksdZb8Ldk/vY7y9z6Uc+m/HnKQMVhN7O3APx2FcciIjWk1ptIJBR2kUgo7CKRUNhFIqGwi0SivktJW3hbZleKbZMtZestP316Yq10TPKSxQDw9w8mt4AA4Ngm/zGZkZ/i1ovO9N58qCUZ5LfWdpcOuPU8kr9nucC5ZsU957j1ydv9N2QuPfCXibWWXn9qbu6g//My+ZlfufVGpDO7SCQUdpFIKOwikVDYRSKhsItEQmEXiYTCLhKJOm/ZnFINt01Gzu/Tl/bvT6zx9c3usc8dOM6tnzL9Xbe+v+T/394uJPeET5rgLyUdmur5b3v8iY1r+he69QeO//fE2qD5ve65N73g1m0otK1y8s9Lbor/3gXv+w0AbPbff5D657EGdGYXiYTCLhIJhV0kEgq7SCQUdpFIKOwikVDYRSJxePXZs1Ry5k4X/bnPP7npD9z6Tb/nL6k86RW/V/7yt25z657vv3+Gf9tL/R8RG+5x6xctGXNXMADA69+c5B57QvF/3Xpom22vl86mdD/6uRb/e1LcpT67iGREYReJhMIuEgmFXSQSCrtIJBR2kUgo7CKRqH+f3Vn7PdQ3rSln7XUAsBTLzs95wN+2vu3RaW690POeW1948iWJtftPvcM99sm7l7r19pw/pzzoV+sTSyfumO8eWgztMZBmDYKJE/3bDiju2p3q+CwEz+wk7yTZT3LDqMtmknyC5Bvlf2fUdpgiktZ4nsbfBeDjW3NcA2CNmS0AsKb8tYg0sGDYzexpADs+dvEyACvLn68EcEF1hyUi1Vbp3+xtZtZb/nwbgLakK5LsBNAJAJPgr/slIrWT+tV4MzMAia+smVmXmXWYWUcz0r0oIiKVqzTsfSTbAaD8b3/1hiQitVBp2FcBWF7+fDmAh6szHBGpleDf7CTvA3AmgNkktwL4DoAbAfyU5KUAtgC4aNz36O4X7ve6a9qHD92202i3UI9+wJ/bXBoYcOv51la3fnBn8rzwxYF+8twL33br9h/+3vOFLf6a9/kZyV3Z4ib/vt33ZADItQTWfnceVzt40D3WGzcAFHfudOuNKBh2M7s4oXRWlcciIjWkt8uKREJhF4mEwi4SCYVdJBIKu0gk6j/FNdCmOiyFljRuaXHrpX37Ut398Zc9n1j7oxO/6h778+Mfdeu/f8I33XpzoPWWpkXVdPRn3Hpo6q+3XLQFps+Gxh2aIhtq7WVBZ3aRSCjsIpFQ2EUiobCLREJhF4mEwi4SCYVdJBIZ9NkzXC46I6E+eqhnW9yzx63npx+RWNv93c+6x/bf5Y/tqlvvdevffdXfjvrA2tmJtXm3bnSPLWz1t4MOyU1LXqI79RRVbwvvBqUzu0gkFHaRSCjsIpFQ2EUiobCLREJhF4mEwi4SCVod+96tnGmnMb5Fab151QCAvL/1cC5lH94z+IdL3Povb+9y6zuLydsiA8CMfPJyz4tuu9w9dv69/nz1wttb3Hqq93QEtoNGKcUe3jW01tZgj+0Ycw1undlFIqGwi0RCYReJhMIuEgmFXSQSCrtIJBR2kUioz94AQn14C82ddnq+oTXrkfN/3xe/cJxbb7qh362vPmF1Yq2/6M+l/51fXunWF/7tB269sPmdxFrwMQ+sK5+blLxNNgCUBgfdeq2k6rOTvJNkP8kNoy67nmQPyXXlj/OqOWARqb7xPI2/C8A5Y1z+QzNbXP5I/vUtIg0hGHYzexrAjjqMRURqKM0LdFeSfKn8NH9G0pVIdpLsJtk9jMbb/0okFpWG/UcAjgOwGEAvgB8kXdHMusysw8w6muFP6BCR2qko7GbWZ2ZFMysB+DEAf+qUiGSuorCTbB/15YUANiRdV0QaQ7DPTvI+AGcCmA2gD8B3yl8vBmAANgO4zMx6Q3d2RG6WLZ2U3KUrDQ37N+D0k9P2TUPYPCH5toeHanbb45H2/l2Bed351qlu/b1LTkqsrbv2toqGdMhZr5zv1pvP3ZZYCz1mh+P+64DfZw9uEmFmF49x8YrUoxKRutLbZUUiobCLREJhF4mEwi4SCYVdJBJ13bLZzGo29a/WrTevVZN2GmlpYKCSIX3I27LZAu1MG0rXgirtO+DW2275n+Tite6hKFrJrf/d/J+79Wv/uDOx1nrfWvfYUGstP2umWy9+0HjTSXRmF4mEwi4SCYVdJBIKu0gkFHaRSCjsIpFQ2EUiUdc+O0l3CV5O8Kd6lg4k9+jT9u/zc+b4V3D6rqEtk0NTWEN9+tI+f8nl4q7dzo2n23rYvniiW3/jG8lbMgPA6aduTKxtHPK3ez62udmt7yq1uvXWe59LrIXeP5Cb7C8V3Yh99BCd2UUiobCLREJhF4mEwi4SCYVdJBIKu0gkFHaRSDTWfPZAr9zrV4f6pij6/eTi++/7xzvS9slDyxrnZyTurjVy+/uT+9W5eXPdYzde7d/2w2ff4tan5/x1AtrzkxNruwNbUU+k32e/o+fLbn1ks6KxhearFwP1pvaj3HqhN3kZ66zozC4SCYVdJBIKu0gkFHaRSCjsIpFQ2EUiobCLRCLYZyc5F8DdANowskVzl5ndTHImgAcAzMPIts0XmdnO2g0VYD75d1NoPnuoD++tvQ4Axe0fJN+30+cGxtEn3+v34THB7ze/c8UpibUbl9/lHnt+iz/23sB6++1N/pbNnhW7Frv1Vd87y623PvqyW6/lNtuN2EcPGc+ZvQDgajNbBGApgCtILgJwDYA1ZrYAwJry1yLSoIJhN7NeM3ux/PkAgI0AjgawDMDK8tVWArigRmMUkSr4VH+zk5wH4GQAawG0mdmh9yNuw8jTfBFpUOMOO8mpAH4G4Coz+8iia2ZmGPl7fqzjOkl2k+wehv9+YxGpnXGFnWQzRoJ+j5k9WL64j2R7ud4OoH+sY82sy8w6zKyjGYHJKiJSM8GwkySAFQA2mtlNo0qrACwvf74cwMPVH56IVMt4prh+CcAlANaTXFe+7DoANwL4KclLAWwBcFHohpjPI9+a3OLylooGENz62BOc0hjYutjT1Hakf9tO2w4Atv/5qW79dy/rdusPt/vTUH2BpaYDvrH5TLf+5i0LE2sznnzTPXbq+8lLQQOAv6Gzz1vSHABygVZsYVtfinvPRjDsZvYMACaU/UaoiDQMvYNOJBIKu0gkFHaRSCjsIpFQ2EUiobCLRKKuS0mjVIINJve7Q71wb+Hh0DTSkOJOf3Zu07HzEmtv/eM099jbT/mFW5/X9JRbn8KkzueIZvpLWXsufed0t/7rOz7v1o98xO+Vt/Yl98r9xb2Bps/5y2AX3/N73d6U6JBQHz20DXfaKbS1oDO7SCQUdpFIKOwikVDYRSKhsItEQmEXiYTCLhKJhtqyObjtsrPFb6hPfmDZErfe/C1/aeDzP/NCYu2K6e+6x+4s+ss1T8n5/+/Q1sXPDSZ3rP/kkSvcY0/85+RtjQFg1tvPuvVQr9yTmzLFrRe2+I9r2q2yPfk5c9x6mi2+s6Izu0gkFHaRSCjsIpFQ2EUiobCLREJhF4mEwi4SifrOZweAXPI65aH57J6mY4526+992f+9tnrBfW79+Obknm5/0e/n3tB3plt/ZP0X3Hpzrz93esHtWxNrC3dvdI8t7Nrt1kPY5P8ImbPlc2ir65A0ffSQw7GPHqIzu0gkFHaRSCjsIpFQ2EUiobCLREJhF4mEwi4SCZp5q7EDJOcCuBtAG0aWbu8ys5tJXg/gLwAcakheZ2arvdtqzc20pU1fS6znpvrzk21oOLGWtmebm+av/Y5h576dOfoAkJ81060XP9jh1kPvIShs7UmspV7fPLBmPQI/P1Jfa20N9tiOMb9p43lTTQHA1Wb2IslpAF4g+US59kMz+361BioitRMMu5n1Augtfz5AciMA/1QjIg3nU/3NTnIegJMBrC1fdCXJl0jeSXLM/ZdIdpLsJtk9bJW/HVZE0hl32ElOBfAzAFeZ2R4APwJwHIDFGDnz/2Cs48ysy8w6zKyjmYE15kSkZsYVdpLNGAn6PWb2IACYWZ+ZFc2sBODHAPwVHUUkU8GwkySAFQA2mtlNoy5vH3W1CwFsqP7wRKRaxvNq/JcAXAJgPcl15cuuA3AxycUYacdtBnBZ8JbMn/JYTDHdMthiKvqLHpcGBiq+79ykSW491FoLKbznL3PtsUJyyxAIL9+dZtqxNJbxvBr/DICx+nZuT11EGoveQScSCYVdJBIKu0gkFHaRSCjsIpFQ2EUiUdelpNnchKbZbYn1wra+im87OFWzhtJOcfW2ogaA4u49bt1bztl7X8N47jtEffrDh87sIpFQ2EUiobCLREJhF4mEwi4SCYVdJBIKu0gkgktJV/XOyPcBbBl10WwA2+s2gE+nUcfWqOMCNLZKVXNsnzOzOWMV6hr2T9w52W1mHZkNwNGoY2vUcQEaW6XqNTY9jReJhMIuEomsw96V8f17GnVsjTouQGOrVF3Glunf7CJSP1mf2UWkThR2kUhkEnaS55B8jeQmktdkMYYkJDeTXE9yHcnujMdyJ8l+khtGXTaT5BMk3yj/O+YeexmN7XqSPeXHbh3J8zIa21yS/0XyFZIvk/x2+fJMHztnXHV53Or+NzvJPIDXAXwVwFYAzwO42MxeqetAEpDcDKDDzDJ/AwbJMwDsBXC3mf1W+bJ/ArDDzG4s/6KcYWZ/3SBjux7A3qy38S7vVtQ+eptxABcA+DNk+Ng547oIdXjcsjizLwGwyczeMrMhAPcDWJbBOBqemT0N4OPbySwDsLL8+UqM/LDUXcLYGoKZ9ZrZi+XPBwAc2mY808fOGVddZBH2owG8O+rrrWis/d4NwOMkXyDZmfVgxtBmZr3lz7cBSF7nKxvBbbzr6WPbjDfMY1fJ9udp6QW6TzrdzL4I4FwAV5SfrjYkG/kbrJF6p+Paxrtexthm/ENZPnaVbn+eVhZh7wEwd9TXx5Qvawhm1lP+tx/AQ2i8raj7Du2gW/63P+PxfKiRtvEea5txNMBjl+X251mE/XkAC0jOJzkBwNcBrMpgHJ9AsqX8wglItgA4G423FfUqAMvLny8H8HCGY/mIRtnGO2mbcWT82GW+/bmZ1f0DwHkYeUX+TQB/k8UYEsZ1LIBflz9eznpsAO7DyNO6YYy8tnEpgFkA1gB4A8CTAGY20Nj+FcB6AC9hJFjtGY3tdIw8RX8JwLryx3lZP3bOuOryuOntsiKR0At0IpFQ2EUiobCLREJhF4mEwi4SCYVdJBIKu0gk/g81xM9ks5Ld8AAAAABJRU5ErkJggg=="
      },
      "metadata": {
       "needs_background": "light"
-     },
-     "output_type": "display_data"
+     }
     }
    ],
-   "source": [
-    "for data in train_custom_dataset:\n",
-    "    image, label = data\n",
-    "    print('shape of image: ',image.shape)\n",
-    "    plt.title(str(label))\n",
-    "    plt.imshow(image[0])    \n",
-    "    break"
-   ]
+   "metadata": {
+    "scrolled": true
+   }
   },
   {
    "cell_type": "markdown",
-   "id": "de3fd19b",
-   "metadata": {},
    "source": [
     "## 二、迭代读取数据集\n",
     "\n",
     "### 2.1 使用 paddle.io.DataLoader 定义数据读取器\n",
     "\n",
     "通过前面介绍的直接迭代读取 Dataset 的方式虽然可实现对数据集的访问，但是这种访问方式只能单线程进行并且还需要手动分批次（batch）。在飞桨框架中，推荐使用 [paddle.io.DataLoader](../../api/paddle/io/DataLoader_cn.html#dataloader) API 对数据集进行多进程的读取，并且可自动完成划分 batch 的工作。"
-   ]
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "code",
    "execution_count": 7,
-   "id": "c3ad4116",
-   "metadata": {
-    "scrolled": true
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "batch_id: 0, 训练数据shape: [64, 1, 28, 28], 标签数据shape: [64]\n"
-     ]
-    }
-   ],
    "source": [
     "# 定义并初始化数据读取器\n",
     "train_loader = paddle.io.DataLoader(train_custom_dataset, batch_size=64, shuffle=True, num_workers=1, drop_last=True)\n",
@@ -389,12 +361,22 @@
     "    images, labels = data\n",
     "    print(\"batch_id: {}, 训练数据shape: {}, 标签数据shape: {}\".format(batch_id, images.shape, labels.shape))\n",
     "    break"
-   ]
+   ],
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "batch_id: 0, 训练数据shape: [64, 1, 28, 28], 标签数据shape: [64]\n"
+     ]
+    }
+   ],
+   "metadata": {
+    "scrolled": true
+   }
   },
   {
    "cell_type": "markdown",
-   "id": "ae21b353",
-   "metadata": {},
    "source": [
     "通过上述方法，初始化了一个数据读取器 `train_loader`，用于加载训练数据集 `custom_dataset`。在数据读取器中几个常用的字段如下：\n",
     "\n",
@@ -404,18 +386,17 @@
     "* `num_workers`：**同步/异步读取数据**，通过 `num_workers` 来设置加载数据的子进程个数，num_workers的值设为大于0时，即开启多进程方式异步加载数据，可提升数据读取速度。\n",
     "\n",
     "\n",
-    "定义好数据读取器之后，便可用 for 循环方便地迭代读取批次数据，用于模型训练了。值得注意的是，如果使用高层 API 的 [paddle.Model.fit](../../api/paddle/Model_cn.html#fit-train-data-none-eval-data-none-batch-size-1-epochs-1-eval-freq-1-log-freq-10-save-dir-none-save-freq-1-verbose-2-drop-last-false-shuffle-true-num-workers-0-callbacks-none) 读取数据集进行训练，则只需定义数据集 Dataset 即可，不需要再单独定义 DataLoader，因为 paddle.Model.fit 中实际已经封装了一部分 DataLoader 的功能，详细可参考 [模型训练、评估与推理](05_train_eval_predict_cn.html) 章节。\n",
+    "定义好数据读取器之后，便可用 for 循环方便地迭代读取批次数据，用于模型训练了。值得注意的是，如果使用高层 API 的 [paddle.Model.fit](../../api/paddle/Model_cn.html#fit-train-data-none-eval-data-none-batch-size-1-epochs-1-eval-freq-1-log-freq-10-save-dir-none-save-freq-1-verbose-2-drop-last-false-shuffle-true-num-workers-0-callbacks-none) 读取数据集进行训练，则只需定义数据集 Dataset 即可，不需要再单独定义 DataLoader，因为 paddle.Model.fit 中实际已经封装了一部分 DataLoader 的功能，详细可参考 [模型训练、评估与推理](train_eval_predict_cn.html) 章节。\n",
     "\n",
     "\n",
     "\n",
     "> 注：\n",
     "> DataLoader 实际上是通过批采样器 BatchSampler 产生的批次索引列表，并根据索引取得 Dataset 中的对应样本数据，以实现批次数据的加载。DataLoader 中定义了采样的批次大小、顺序等信息，对应字段包括 `batch_size`、`shuffle`、`drop_last`。这三个字段也可以用一个 `batch_sampler` 字段代替，并在 `batch_sampler` 中传入自定义的批采样器实例。以上两种方式二选一即可，可实现相同的效果。下面小节中介绍后一种自定义采样器的使用方法，该用法可以更灵活地定义采样规则。\n"
-   ]
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "markdown",
-   "id": "4c387f37",
-   "metadata": {},
    "source": [
     "### 2.2 （可选）自定义采样器\n",
     "\n",
@@ -427,27 +408,12 @@
     "下面通过两段示例代码，介绍采样器的用法。\n",
     "\n",
     "首先，以 BatchSampler 为例，介绍在 DataLoader 中使用 BatchSampler 获取采样数据的方法。\n"
-   ]
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "code",
    "execution_count": 8,
-   "id": "477c89ef",
-   "metadata": {
-    "scrolled": true
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "BatchSampler 每轮迭代返回一个索引列表\n",
-      "[53486, 39208, 42267, 46762, 33087, 54705, 55986, 20736]\n",
-      "在 DataLoader 中使用 BatchSampler，返回索引对应的一组样本和标签数据 \n",
-      "batch_id: 0, 训练数据shape: [8, 1, 28, 28], 标签数据shape: [8]\n"
-     ]
-    }
-   ],
    "source": [
     "from paddle.io import BatchSampler\n",
     "\n",
@@ -467,62 +433,36 @@
     "    images, labels = data\n",
     "    print(\"batch_id: {}, 训练数据shape: {}, 标签数据shape: {}\".format(batch_id, images.shape, labels.shape))\n",
     "    break"
-   ]
+   ],
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "BatchSampler 每轮迭代返回一个索引列表\n",
+      "[53486, 39208, 42267, 46762, 33087, 54705, 55986, 20736]\n",
+      "在 DataLoader 中使用 BatchSampler，返回索引对应的一组样本和标签数据 \n",
+      "batch_id: 0, 训练数据shape: [8, 1, 28, 28], 标签数据shape: [8]\n"
+     ]
+    }
+   ],
+   "metadata": {
+    "scrolled": true
+   }
   },
   {
    "cell_type": "markdown",
-   "id": "a503bc9a",
-   "metadata": {},
    "source": [
     "以上示例代码中，定义了一个批采样器实例 `bs`，每轮迭代会返回一个 `batch_size` 大小的索引列表（示例中一轮迭代返回 8 个索引值），数据读取器 `train_loader` 通过 `batch_sampler=bs` 字段传入批采样器，即可根据这些索引获取对应的一组样本数据。另外可以看到，`batch_size`、`shuffle`、`drop_last`这三个参数只在 BatchSampler 中设定。\n",
     "\n",
     "\n",
     "下面再通过一段代码示例，对比几个不同采样器的采样行为。"
-   ]
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "code",
    "execution_count": 9,
-   "id": "7d4c4622-b19a-47ad-a603-fc53c44650fa",
-   "metadata": {
-    "scrolled": true
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "-----------------顺序采样----------------\n",
-      "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]\n",
-      "[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]\n",
-      "[20, 21, 22, 23, 24, 25, 26, 27, 28, 29]\n",
-      "[30, 31, 32, 33, 34, 35, 36, 37, 38, 39]\n",
-      "[40, 41, 42, 43, 44, 45, 46, 47, 48, 49]\n",
-      "[50, 51, 52, 53, 54, 55, 56, 57, 58, 59]\n",
-      "[60, 61, 62, 63, 64, 65, 66, 67, 68, 69]\n",
-      "[70, 71, 72, 73, 74, 75, 76, 77, 78, 79]\n",
-      "[80, 81, 82, 83, 84, 85, 86, 87, 88, 89]\n",
-      "[90, 91, 92, 93, 94, 95, 96, 97, 98, 99]\n",
-      "-----------------随机采样----------------\n",
-      "[44, 29, 37, 11, 21, 53, 65, 3, 26, 23]\n",
-      "[17, 4, 48, 84, 86, 90, 92, 76, 97, 69]\n",
-      "[35, 51, 71, 45, 25, 38, 32, 83, 22, 57]\n",
-      "[47, 55, 39, 46, 78, 61, 68, 66, 18, 41]\n",
-      "[77, 81, 15, 63, 91, 54, 24, 75, 59, 99]\n",
-      "[73, 88, 20, 43, 93, 56, 95, 60, 87, 72]\n",
-      "[70, 98, 1, 64, 0, 16, 33, 14, 80, 89]\n",
-      "[36, 40, 62, 50, 9, 34, 8, 19, 82, 6]\n",
-      "[74, 27, 30, 58, 31, 28, 12, 13, 7, 49]\n",
-      "[10, 52, 2, 94, 67, 96, 79, 42, 5, 85]\n",
-      "-----------------分布式采样----------------\n",
-      "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]\n",
-      "[20, 21, 22, 23, 24, 25, 26, 27, 28, 29]\n",
-      "[40, 41, 42, 43, 44, 45, 46, 47, 48, 49]\n",
-      "[60, 61, 62, 63, 64, 65, 66, 67, 68, 69]\n",
-      "[80, 81, 82, 83, 84, 85, 86, 87, 88, 89]\n"
-     ]
-    }
-   ],
    "source": [
     "from paddle.io import SequenceSampler, RandomSampler, BatchSampler, DistributedBatchSampler\n",
     "\n",
@@ -559,23 +499,59 @@
     "\n",
     "for index in batch_sampler:\n",
     "    print(index)"
-   ]
+   ],
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "-----------------顺序采样----------------\n",
+      "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]\n",
+      "[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]\n",
+      "[20, 21, 22, 23, 24, 25, 26, 27, 28, 29]\n",
+      "[30, 31, 32, 33, 34, 35, 36, 37, 38, 39]\n",
+      "[40, 41, 42, 43, 44, 45, 46, 47, 48, 49]\n",
+      "[50, 51, 52, 53, 54, 55, 56, 57, 58, 59]\n",
+      "[60, 61, 62, 63, 64, 65, 66, 67, 68, 69]\n",
+      "[70, 71, 72, 73, 74, 75, 76, 77, 78, 79]\n",
+      "[80, 81, 82, 83, 84, 85, 86, 87, 88, 89]\n",
+      "[90, 91, 92, 93, 94, 95, 96, 97, 98, 99]\n",
+      "-----------------随机采样----------------\n",
+      "[44, 29, 37, 11, 21, 53, 65, 3, 26, 23]\n",
+      "[17, 4, 48, 84, 86, 90, 92, 76, 97, 69]\n",
+      "[35, 51, 71, 45, 25, 38, 32, 83, 22, 57]\n",
+      "[47, 55, 39, 46, 78, 61, 68, 66, 18, 41]\n",
+      "[77, 81, 15, 63, 91, 54, 24, 75, 59, 99]\n",
+      "[73, 88, 20, 43, 93, 56, 95, 60, 87, 72]\n",
+      "[70, 98, 1, 64, 0, 16, 33, 14, 80, 89]\n",
+      "[36, 40, 62, 50, 9, 34, 8, 19, 82, 6]\n",
+      "[74, 27, 30, 58, 31, 28, 12, 13, 7, 49]\n",
+      "[10, 52, 2, 94, 67, 96, 79, 42, 5, 85]\n",
+      "-----------------分布式采样----------------\n",
+      "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]\n",
+      "[20, 21, 22, 23, 24, 25, 26, 27, 28, 29]\n",
+      "[40, 41, 42, 43, 44, 45, 46, 47, 48, 49]\n",
+      "[60, 61, 62, 63, 64, 65, 66, 67, 68, 69]\n",
+      "[80, 81, 82, 83, 84, 85, 86, 87, 88, 89]\n"
+     ]
+    }
+   ],
+   "metadata": {
+    "scrolled": true
+   }
   },
   {
    "cell_type": "markdown",
-   "id": "970dad59",
-   "metadata": {},
    "source": [
     "从代码输出结果可以看出：\n",
     "* 顺序采样：按照顺序的方式输出各个样本的索引。\n",
     "* 随机采样：先将样本顺序打乱，再输出乱序后的样本索引。\n",
     "* 分布式采样：常用于分布式训练场景，将样本数据切分成多份，分别放到不同卡上训练。示例中设置了 `num_replicas=2`，样本会被划分到两张卡上，所以这里只输出一半样本的索引。"
-   ]
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "markdown",
-   "id": "d3a256d5-33f0-4018-bd5a-3ee6d9bff372",
-   "metadata": {},
    "source": [
     "## 三、总结\n",
     "\n",
@@ -585,10 +561,11 @@
     "\n",
     "图 1：数据集定义和加载流程\n",
     "\n",
-    "主要包括定义数据集和定义数据读取器两个步骤，另外在数据读取器中可调用采样器实现更灵活地采样。其中，在定义数据集时，本节仅对数据集进行了归一化处理，如需了解更多数据增强相关操作，可以参考 [数据预处理](03_data_preprocessing_cn.html)。 \n",
+    "主要包括定义数据集和定义数据读取器两个步骤，另外在数据读取器中可调用采样器实现更灵活地采样。其中，在定义数据集时，本节仅对数据集进行了归一化处理，如需了解更多数据增强相关操作，可以参考 [数据预处理](data_preprocessing_cn.html)。 \n",
     "\n",
-    "以上所有数据处理工作完成后，即可进入下一个任务：[模型训练、评估与推理](05_train_eval_predict_cn.html)。"
-   ]
+    "以上所有数据处理工作完成后，即可进入下一个任务：[模型训练、评估与推理](train_eval_predict_cn.html)。"
+   ],
+   "metadata": {}
   }
  ],
  "metadata": {
@@ -612,4 +589,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 5
-}
+}
\ No newline at end of file
diff --git a/docs/guides/beginner/index_cn.rst b/docs/guides/beginner/index_cn.rst
index 3dc33cf0ef6..71a37aae029 100644
--- a/docs/guides/beginner/index_cn.rst
+++ b/docs/guides/beginner/index_cn.rst
@@ -4,7 +4,7 @@
 
 本部分将介绍飞桨框架2.0的开发流程。
 
-为了快速上手飞桨框架2.0，你可以参考 `10分钟快速上手飞桨 <./01_quick_start_cn.html>`_ ;
+为了快速上手飞桨框架2.0，你可以参考 `10分钟快速上手飞桨 <./quick_start_cn.html>`_ ;
 
 当完成了快速上手的任务后，下面这些模块会阐述如何用飞桨框架2.0，实现深度学习过程中的每一步。具体包括：
 
diff --git a/docs/guides/beginner/index_en.rst b/docs/guides/beginner/index_en.rst
index 3ad80afe15f..403bb7c6dff 100644
--- a/docs/guides/beginner/index_en.rst
+++ b/docs/guides/beginner/index_en.rst
@@ -3,7 +3,7 @@ Model Development
 ###################
 
 
-- `Introduction of Tensor <./tensor_en.html>`_ : 
+- `Introduction of Tensor <./tensor_en.html>`_
 
 .. toctree::
     :hidden:
diff --git a/docs/guides/beginner/quick_start_cn.ipynb b/docs/guides/beginner/quick_start_cn.ipynb
index 3dda15b4f3a..3b06e5fcc3b 100644
--- a/docs/guides/beginner/quick_start_cn.ipynb
+++ b/docs/guides/beginner/quick_start_cn.ipynb
@@ -2,8 +2,6 @@
  "cells": [
   {
    "cell_type": "markdown",
-   "id": "f848b574",
-   "metadata": {},
    "source": [
     "\n",
     "# 10分钟快速上手飞桨\n",
@@ -15,12 +13,17 @@
     "如果已经安装好飞桨那么可以跳过此步骤。飞桨支持很多种安装方式，这里介绍其中一种简单的安装命令。\n",
     "\n",
     "> 注：目前飞桨支持 Python 3.6 ~ 3.9 版本，pip3 要求 20.2.2 或更高版本，请提前安装对应版本的 Python 和 pip 工具。\n"
-   ]
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "code",
    "execution_count": 2,
-   "id": "e7d9b5cf-fffe-4086-b84d-3f58eee2f602",
+   "source": [
+    "# 使用 pip 工具安装飞桨 CPU 版\n",
+    "! python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple"
+   ],
+   "outputs": [],
    "metadata": {
     "execution": {
      "iopub.execute_input": "2021-12-20T07:51:58.100025Z",
@@ -30,17 +33,10 @@
      "shell.execute_reply.started": "2021-12-20T07:51:58.099995Z"
     },
     "scrolled": true
-   },
-   "outputs": [],
-   "source": [
-    "# 使用 pip 工具安装飞桨 CPU 版\n",
-    "! python3 -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple"
-   ]
+   }
   },
   {
    "cell_type": "markdown",
-   "id": "54e32a09-1e8f-4bc6-a3bd-1eebd621fa66",
-   "metadata": {},
    "source": [
     "该命令用于安装 CPU 版本的飞桨。如果要安装其他计算平台或操作系统支持的版本，可点击 [ 快速安装]( <https://www.paddlepaddle.org.cn/install/quick>)  查看安装引导。\n",
     "\n",
@@ -49,12 +45,25 @@
     "安装完成后，需要在Python解释器中使用 import 导入飞桨，即可开始实践深度学习任务。\n",
     "\n",
     "若操作成功，会输出飞桨的版本号。"
-   ]
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "code",
    "execution_count": 3,
-   "id": "468426ec",
+   "source": [
+    "import paddle    \n",
+    "print(paddle.__version__)"
+   ],
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "2.2.1\n"
+     ]
+    }
+   ],
    "metadata": {
     "execution": {
      "iopub.execute_input": "2021-12-16T04:18:43.298378Z",
@@ -64,25 +73,10 @@
      "shell.execute_reply.started": "2021-12-16T04:18:43.298346Z"
     },
     "scrolled": true
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "2.2.1\n"
-     ]
-    }
-   ],
-   "source": [
-    "import paddle    \n",
-    "print(paddle.__version__)"
-   ]
+   }
   },
   {
    "cell_type": "markdown",
-   "id": "a3e0d5e5",
-   "metadata": {},
    "source": [
     "\n",
     "\n",
@@ -97,12 +91,17 @@
     "图 1：MNIST 数据集样例\n",
     "\n",
     "开始之前，需要使用下面的命令安装 Python 的 matplotlib 库和 numpy 库，matplotlib 库用于可视化图片，numpy 库用于处理数据。"
-   ]
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "code",
    "execution_count": 3,
-   "id": "0b50ba8c-93dd-42d9-8b52-35025ecb122b",
+   "source": [
+    "# 使用 pip 工具安装 matplotlib 和 numpy\n",
+    "! python3 -m pip install matplotlib numpy -i https://mirror.baidu.com/pypi/simple"
+   ],
+   "outputs": [],
    "metadata": {
     "execution": {
      "iopub.execute_input": "2021-12-16T06:31:04.356444Z",
@@ -112,80 +111,18 @@
      "shell.execute_reply.started": "2021-12-16T06:31:04.356403Z"
     },
     "scrolled": true
-   },
-   "outputs": [],
-   "source": [
-    "# 使用 pip 工具安装 matplotlib 和 numpy\n",
-    "! python3 -m pip install matplotlib numpy -i https://mirror.baidu.com/pypi/simple"
-   ]
+   }
   },
   {
    "cell_type": "markdown",
-   "id": "7ba0ec0a-bc6c-43cc-8478-488cc231bca8",
-   "metadata": {},
    "source": [
     "下面是手写数字识别任务的完整代码，如果想直接运行代码，可以拷贝下面的完整代码到一个Python文件中运行。"
-   ]
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "code",
    "execution_count": 5,
-   "id": "93d0c8a1",
-   "metadata": {
-    "execution": {
-     "iopub.execute_input": "2021-12-16T06:31:11.340019Z",
-     "iopub.status.busy": "2021-12-16T06:31:11.338995Z",
-     "iopub.status.idle": "2021-12-16T06:33:13.307650Z",
-     "shell.execute_reply": "2021-12-16T06:33:13.306938Z",
-     "shell.execute_reply.started": "2021-12-16T06:31:11.339980Z"
-    },
-    "scrolled": true
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "The loss value printed in the log is the current step, and the metric is the average value of previous steps.\n",
-      "Epoch 1/5\n",
-      "step 938/938 [==============================] - loss: 0.0519 - acc: 0.9344 - 14ms/step          \n",
-      "Epoch 2/5\n",
-      "step 938/938 [==============================] - loss: 0.0239 - acc: 0.9767 - 14ms/step          \n",
-      "Epoch 3/5\n",
-      "step 938/938 [==============================] - loss: 0.0416 - acc: 0.9811 - 14ms/step          \n",
-      "Epoch 4/5\n",
-      "step 938/938 [==============================] - loss: 0.0084 - acc: 0.9837 - 14ms/step          \n",
-      "Epoch 5/5\n",
-      "step 938/938 [==============================] - loss: 0.0838 - acc: 0.9860 - 14ms/step          \n",
-      "Eval begin...\n",
-      "step 157/157 [==============================] - loss: 1.7577e-04 - acc: 0.9844 - 6ms/step         \n",
-      "Eval samples: 10000\n",
-      "true label: 7, pred label: 7\n"
-     ]
-    },
-    {
-     "data": {
-      "text/plain": [
-       "<matplotlib.image.AxesImage at 0x7feb8a585dd0>"
-      ]
-     },
-     "execution_count": 5,
-     "metadata": {},
-     "output_type": "execute_result"
-    },
-    {
-     "data": {
-      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAPsAAAD4CAYAAAAq5pAIAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAANiklEQVR4nO3df4wc9XnH8c8n/kV8QGtDcF3j4ISQqE4aSHWBRNDKESUFImSiJBRLtVyJ5lALElRRW0QVBalVSlEIok0aySluHESgaQBhJTSNa6W1UKljg4yxgdaEmsau8QFOaxPAP/DTP24cHXD7vWNndmft5/2SVrs7z87Oo/F9PLMzO/t1RAjA8e9tbTcAoD8IO5AEYQeSIOxAEoQdSGJ6Pxc207PiBA31c5FAKq/qZzoYBzxRrVbYbV8s6XZJ0yT9bUTcXHr9CRrSeb6wziIBFGyIdR1rXe/G254m6auSLpG0WNIy24u7fT8AvVXnM/u5kp6OiGci4qCkeyQtbaYtAE2rE/YFkn4y7vnOatrr2B6xvcn2pkM6UGNxAOro+dH4iFgZEcMRMTxDs3q9OAAd1An7LkkLxz0/vZoGYADVCftGSWfZfpftmZKulLSmmbYANK3rU28Rcdj2tZL+SWOn3lZFxLbGOgPQqFrn2SPiQUkPNtQLgB7i67JAEoQdSIKwA0kQdiAJwg4kQdiBJAg7kARhB5Ig7EAShB1IgrADSRB2IAnCDiRB2IEkCDuQBGEHkiDsQBKEHUiCsANJEHYgCcIOJEHYgSQIO5AEYQeSIOxAEoQdSIKwA0kQdiAJwg4kQdiBJGoN2Wx7h6T9kl6TdDgihptoCkDzaoW98rGIeKGB9wHQQ+zGA0nUDXtI+oHtR2yPTPQC2yO2N9nedEgHai4OQLfq7sZfEBG7bJ8maa3tpyJi/fgXRMRKSSsl6WTPjZrLA9ClWlv2iNhV3Y9Kul/SuU00BaB5XYfd9pDtk44+lvRxSVubagxAs+rsxs+TdL/to+/zrYj4fiNdAWhc12GPiGcknd1gLwB6iFNvQBKEHUiCsANJEHYgCcIOJNHEhTApvPjZj3asvXP508V5nxqdV6wfPDCjWF9wd7k+e+dLHWtHNj9RnBd5sGUHkiDsQBKEHUiCsANJEHYgCcIOJEHYgSQ4zz5Ff/xH3+pY+9TQT8szn1lz4UvK5R2HX+5Yu/35j9Vc+LHrR6NndKwN3foLxXmnr3uk6XZax5YdSIKwA0kQdiAJwg4kQdiBJAg7kARhB5JwRP8GaTnZc+M8X9i35TXpZ58+r2PthQ+W/8+c82R5Hf/0V1ysz/zg/xbrt3zgvo61i97+SnHe7718YrH+idmdr5Wv65U4WKxvODBUrC854VDXy37P964u1t87srHr927ThlinfbF3wj8otuxAEoQdSIKwA0kQdiAJwg4kQdiBJAg7kATXs0/R0Hc2FGr13vvkerPrr39pScfan5+/qLzsfy3/5v0tS97TRUdTM/2VI8X60Jbdxfop6+8t1n91Zuff25+9o/xb/MejSbfstlfZHrW9ddy0ubbX2t5e3c/pbZsA6prKbvw3JF38hmk3SFoXEWdJWlc9BzDAJg17RKyXtPcNk5dKWl09Xi3p8mbbAtC0bj+zz4uIox+onpPUcTAz2yOSRiTpBM3ucnEA6qp9ND7GrqTpeKVHRKyMiOGIGJ6hWXUXB6BL3YZ9j+35klTdjzbXEoBe6DbsayStqB6vkPRAM+0A6JVJP7Pbvltjv1x+qu2dkr4g6WZJ37Z9laRnJV3RyyZRdvi5PR1rQ/d2rknSa5O899B3Xuyio2bs+b2PFuvvn1n+8/3S3vd1rC36u2eK8x4uVo9Nk4Y9IpZ1KB2bv0IBJMXXZYEkCDuQBGEHkiDsQBKEHUiCS1zRmulnLCzWv3LjV4r1GZ5WrP/D7b/ZsXbK7oeL8x6P2LIDSRB2IAnCDiRB2IEkCDuQBGEHkiDsQBKcZ0drnvrDBcX6h2eVh7LedrA8HPXcJ15+yz0dz9iyA0kQdiAJwg4kQdiBJAg7kARhB5Ig7EASnGdHTx34xIc71h799G2TzF0eQej3r7uuWH/7v/1okvfPhS07kARhB5Ig7EAShB1IgrADSRB2IAnCDiTBeXb01H9f0nl7cqLL59GX/ddFxfrs7z9WrEexms+kW3bbq2yP2t46btpNtnfZ3lzdLu1tmwDqmspu/DckXTzB9Nsi4pzq9mCzbQFo2qRhj4j1kvb2oRcAPVTnAN21trdUu/lzOr3I9ojtTbY3HdKBGosDUEe3Yf+apDMlnSNpt6RbO70wIlZGxHBEDM+Y5MIGAL3TVdgjYk9EvBYRRyR9XdK5zbYFoGldhd32/HFPPylpa6fXAhgMk55nt323pCWSTrW9U9IXJC2xfY7GTmXukHR171rEIHvbSScV68t//aGOtX1HXi3OO/rFdxfrsw5sLNbxepOGPSKWTTD5jh70AqCH+LoskARhB5Ig7EAShB1IgrADSXCJK2rZftP7i/Xvnvo3HWtLt3+qOO+sBzm11iS27EAShB1IgrADSRB2IAnCDiRB2IEkCDuQBOfZUfR/v/ORYn3Lb/9Vsf7jw4c61l76y9OL887S7mIdbw1bdiAJwg4kQdiBJAg7kARhB5Ig7EAShB1IgvPsyU1f8MvF+vWf//tifZbLf0JXPra8Y+0d/8j16v3Elh1IgrADSRB2IAnCDiRB2IEkCDuQBGEHkuA8+3HO08v/xGd/d2ex/pkTXyzW79p/WrE+7/OdtydHinOiaZNu2W0vtP1D20/Y3mb7umr6XNtrbW+v7uf0vl0A3ZrKbvxhSZ+LiMWSPiLpGtuLJd0gaV1EnCVpXfUcwICaNOwRsTsiHq0e75f0pKQFkpZKWl29bLWky3vUI4AGvKXP7LYXSfqQpA2S5kXE0R8Je07SvA7zjEgakaQTNLvrRgHUM+Wj8bZPlHSvpOsjYt/4WkSEpJhovohYGRHDETE8Q7NqNQuge1MKu+0ZGgv6XRFxXzV5j+35VX2+pNHetAigCZPuxtu2pDskPRkRXx5XWiNphaSbq/sHetIh6jn7fcXyn512Z623/+oXP1Os/+JjD9d6fzRnKp/Zz5e0XNLjtjdX027UWMi/bfsqSc9KuqInHQJoxKRhj4iHJLlD+cJm2wHQK3xdFkiCsANJEHYgCcIOJEHYgSS4xPU4MG3xezvWRu6p9/WHxauuKdYX3fnvtd4f/cOWHUiCsANJEHYgCcIOJEHYgSQIO5AEYQeS4Dz7ceCpP+j8w76Xzd7XsTYVp//LwfILYsIfKMIAYssOJEHYgSQIO5AEYQeSIOxAEoQdSIKwA0lwnv0Y8Opl5xbr6y67tVBlyC2MYcsOJEHYgSQIO5AEYQeSIOxAEoQdSIKwA0lMZXz2hZK+KWmepJC0MiJut32TpM9Ker566Y0R8WCvGs3sf86fVqy/c3r359Lv2n9asT5jX/l6dq5mP3ZM5Us1hyV9LiIetX2SpEdsr61qt0XEl3rXHoCmTGV89t2SdleP99t+UtKCXjcGoFlv6TO77UWSPiRpQzXpWttbbK+yPeFvI9kesb3J9qZDOlCvWwBdm3LYbZ8o6V5J10fEPklfk3SmpHM0tuWf8AvaEbEyIoYjYniGZtXvGEBXphR22zM0FvS7IuI+SYqIPRHxWkQckfR1SeWrNQC0atKw27akOyQ9GRFfHjd9/riXfVLS1ubbA9CUqRyNP1/SckmP295cTbtR0jLb52js7MsOSVf3oD/U9BcvLi7WH/6tRcV67H68wW7QpqkcjX9IkicocU4dOIbwDTogCcIOJEHYgSQIO5AEYQeSIOxAEo4+Drl7sufGeb6wb8sDstkQ67Qv9k50qpwtO5AFYQeSIOxAEoQdSIKwA0kQdiAJwg4k0dfz7Lafl/TsuEmnSnqhbw28NYPa26D2JdFbt5rs7YyIeMdEhb6G/U0LtzdFxHBrDRQMam+D2pdEb93qV2/sxgNJEHYgibbDvrLl5ZcMam+D2pdEb93qS2+tfmYH0D9tb9kB9AlhB5JoJey2L7b9H7aftn1DGz10YnuH7cdtb7a9qeVeVtketb113LS5ttfa3l7dTzjGXku93WR7V7XuNtu+tKXeFtr+oe0nbG+zfV01vdV1V+irL+ut75/ZbU+T9J+SLpK0U9JGScsi4om+NtKB7R2ShiOi9S9g2P4NSS9J+mZEfKCadoukvRFxc/Uf5ZyI+JMB6e0mSS+1PYx3NVrR/PHDjEu6XNLvqsV1V+jrCvVhvbWxZT9X0tMR8UxEHJR0j6SlLfQx8CJivaS9b5i8VNLq6vFqjf2x9F2H3gZCROyOiEerx/slHR1mvNV1V+irL9oI+wJJPxn3fKcGa7z3kPQD24/YHmm7mQnMi4jd1ePnJM1rs5kJTDqMdz+9YZjxgVl33Qx/XhcH6N7sgoj4NUmXSLqm2l0dSDH2GWyQzp1OaRjvfplgmPGfa3PddTv8eV1thH2XpIXjnp9eTRsIEbGruh+VdL8GbyjqPUdH0K3uR1vu5+cGaRjviYYZ1wCsuzaHP28j7BslnWX7XbZnSrpS0poW+ngT20PVgRPZHpL0cQ3eUNRrJK2oHq+Q9ECLvbzOoAzj3WmYcbW87lof/jwi+n6TdKnGjsj/WNKfttFDh77eLemx6rat7d4k3a2x3bpDGju2cZWkUyStk7Rd0j9LmjtAvd0p6XFJWzQWrPkt9XaBxnbRt0jaXN0ubXvdFfrqy3rj67JAEhygA5Ig7EAShB1IgrADSRB2IAnCDiRB2IEk/h9BCfQTovZf9wAAAABJRU5ErkJggg==",
-      "text/plain": [
-       "<Figure size 432x288 with 1 Axes>"
-      ]
-     },
-     "metadata": {
-      "needs_background": "light"
-     },
-     "output_type": "display_data"
-    }
-   ],
    "source": [
     "import paddle\n",
     "import numpy as np\n",
@@ -227,12 +164,65 @@
     "# 可视化图片\n",
     "from matplotlib import pyplot as plt\n",
     "plt.imshow(img[0])"
-   ]
+   ],
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "The loss value printed in the log is the current step, and the metric is the average value of previous steps.\n",
+      "Epoch 1/5\n",
+      "step 938/938 [==============================] - loss: 0.0519 - acc: 0.9344 - 14ms/step          \n",
+      "Epoch 2/5\n",
+      "step 938/938 [==============================] - loss: 0.0239 - acc: 0.9767 - 14ms/step          \n",
+      "Epoch 3/5\n",
+      "step 938/938 [==============================] - loss: 0.0416 - acc: 0.9811 - 14ms/step          \n",
+      "Epoch 4/5\n",
+      "step 938/938 [==============================] - loss: 0.0084 - acc: 0.9837 - 14ms/step          \n",
+      "Epoch 5/5\n",
+      "step 938/938 [==============================] - loss: 0.0838 - acc: 0.9860 - 14ms/step          \n",
+      "Eval begin...\n",
+      "step 157/157 [==============================] - loss: 1.7577e-04 - acc: 0.9844 - 6ms/step         \n",
+      "Eval samples: 10000\n",
+      "true label: 7, pred label: 7\n"
+     ]
+    },
+    {
+     "output_type": "execute_result",
+     "data": {
+      "text/plain": [
+       "<matplotlib.image.AxesImage at 0x7feb8a585dd0>"
+      ]
+     },
+     "metadata": {},
+     "execution_count": 5
+    },
+    {
+     "output_type": "display_data",
+     "data": {
+      "text/plain": [
+       "<Figure size 432x288 with 1 Axes>"
+      ],
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAPsAAAD4CAYAAAAq5pAIAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAANiklEQVR4nO3df4wc9XnH8c8n/kV8QGtDcF3j4ISQqE4aSHWBRNDKESUFImSiJBRLtVyJ5lALElRRW0QVBalVSlEIok0aySluHESgaQBhJTSNa6W1UKljg4yxgdaEmsau8QFOaxPAP/DTP24cHXD7vWNndmft5/2SVrs7z87Oo/F9PLMzO/t1RAjA8e9tbTcAoD8IO5AEYQeSIOxAEoQdSGJ6Pxc207PiBA31c5FAKq/qZzoYBzxRrVbYbV8s6XZJ0yT9bUTcXHr9CRrSeb6wziIBFGyIdR1rXe/G254m6auSLpG0WNIy24u7fT8AvVXnM/u5kp6OiGci4qCkeyQtbaYtAE2rE/YFkn4y7vnOatrr2B6xvcn2pkM6UGNxAOro+dH4iFgZEcMRMTxDs3q9OAAd1An7LkkLxz0/vZoGYADVCftGSWfZfpftmZKulLSmmbYANK3rU28Rcdj2tZL+SWOn3lZFxLbGOgPQqFrn2SPiQUkPNtQLgB7i67JAEoQdSIKwA0kQdiAJwg4kQdiBJAg7kARhB5Ig7EAShB1IgrADSRB2IAnCDiRB2IEkCDuQBGEHkiDsQBKEHUiCsANJEHYgCcIOJEHYgSQIO5AEYQeSIOxAEoQdSIKwA0kQdiAJwg4kQdiBJGoN2Wx7h6T9kl6TdDgihptoCkDzaoW98rGIeKGB9wHQQ+zGA0nUDXtI+oHtR2yPTPQC2yO2N9nedEgHai4OQLfq7sZfEBG7bJ8maa3tpyJi/fgXRMRKSSsl6WTPjZrLA9ClWlv2iNhV3Y9Kul/SuU00BaB5XYfd9pDtk44+lvRxSVubagxAs+rsxs+TdL/to+/zrYj4fiNdAWhc12GPiGcknd1gLwB6iFNvQBKEHUiCsANJEHYgCcIOJNHEhTApvPjZj3asvXP508V5nxqdV6wfPDCjWF9wd7k+e+dLHWtHNj9RnBd5sGUHkiDsQBKEHUiCsANJEHYgCcIOJEHYgSQ4zz5Ff/xH3+pY+9TQT8szn1lz4UvK5R2HX+5Yu/35j9Vc+LHrR6NndKwN3foLxXmnr3uk6XZax5YdSIKwA0kQdiAJwg4kQdiBJAg7kARhB5JwRP8GaTnZc+M8X9i35TXpZ58+r2PthQ+W/8+c82R5Hf/0V1ysz/zg/xbrt3zgvo61i97+SnHe7718YrH+idmdr5Wv65U4WKxvODBUrC854VDXy37P964u1t87srHr927ThlinfbF3wj8otuxAEoQdSIKwA0kQdiAJwg4kQdiBJAg7kATXs0/R0Hc2FGr13vvkerPrr39pScfan5+/qLzsfy3/5v0tS97TRUdTM/2VI8X60Jbdxfop6+8t1n91Zuff25+9o/xb/MejSbfstlfZHrW9ddy0ubbX2t5e3c/pbZsA6prKbvw3JF38hmk3SFoXEWdJWlc9BzDAJg17RKyXtPcNk5dKWl09Xi3p8mbbAtC0bj+zz4uIox+onpPUcTAz2yOSRiTpBM3ucnEA6qp9ND7GrqTpeKVHRKyMiOGIGJ6hWXUXB6BL3YZ9j+35klTdjzbXEoBe6DbsayStqB6vkPRAM+0A6JVJP7Pbvltjv1x+qu2dkr4g6WZJ37Z9laRnJV3RyyZRdvi5PR1rQ/d2rknSa5O899B3Xuyio2bs+b2PFuvvn1n+8/3S3vd1rC36u2eK8x4uVo9Nk4Y9IpZ1KB2bv0IBJMXXZYEkCDuQBGEHkiDsQBKEHUiCS1zRmulnLCzWv3LjV4r1GZ5WrP/D7b/ZsXbK7oeL8x6P2LIDSRB2IAnCDiRB2IEkCDuQBGEHkiDsQBKcZ0drnvrDBcX6h2eVh7LedrA8HPXcJ15+yz0dz9iyA0kQdiAJwg4kQdiBJAg7kARhB5Ig7EASnGdHTx34xIc71h799G2TzF0eQej3r7uuWH/7v/1okvfPhS07kARhB5Ig7EAShB1IgrADSRB2IAnCDiTBeXb01H9f0nl7cqLL59GX/ddFxfrs7z9WrEexms+kW3bbq2yP2t46btpNtnfZ3lzdLu1tmwDqmspu/DckXTzB9Nsi4pzq9mCzbQFo2qRhj4j1kvb2oRcAPVTnAN21trdUu/lzOr3I9ojtTbY3HdKBGosDUEe3Yf+apDMlnSNpt6RbO70wIlZGxHBEDM+Y5MIGAL3TVdgjYk9EvBYRRyR9XdK5zbYFoGldhd32/HFPPylpa6fXAhgMk55nt323pCWSTrW9U9IXJC2xfY7GTmXukHR171rEIHvbSScV68t//aGOtX1HXi3OO/rFdxfrsw5sLNbxepOGPSKWTTD5jh70AqCH+LoskARhB5Ig7EAShB1IgrADSXCJK2rZftP7i/Xvnvo3HWtLt3+qOO+sBzm11iS27EAShB1IgrADSRB2IAnCDiRB2IEkCDuQBOfZUfR/v/ORYn3Lb/9Vsf7jw4c61l76y9OL887S7mIdbw1bdiAJwg4kQdiBJAg7kARhB5Ig7EAShB1IgvPsyU1f8MvF+vWf//tifZbLf0JXPra8Y+0d/8j16v3Elh1IgrADSRB2IAnCDiRB2IEkCDuQBGEHkuA8+3HO08v/xGd/d2ex/pkTXyzW79p/WrE+7/OdtydHinOiaZNu2W0vtP1D20/Y3mb7umr6XNtrbW+v7uf0vl0A3ZrKbvxhSZ+LiMWSPiLpGtuLJd0gaV1EnCVpXfUcwICaNOwRsTsiHq0e75f0pKQFkpZKWl29bLWky3vUI4AGvKXP7LYXSfqQpA2S5kXE0R8Je07SvA7zjEgakaQTNLvrRgHUM+Wj8bZPlHSvpOsjYt/4WkSEpJhovohYGRHDETE8Q7NqNQuge1MKu+0ZGgv6XRFxXzV5j+35VX2+pNHetAigCZPuxtu2pDskPRkRXx5XWiNphaSbq/sHetIh6jn7fcXyn512Z623/+oXP1Os/+JjD9d6fzRnKp/Zz5e0XNLjtjdX027UWMi/bfsqSc9KuqInHQJoxKRhj4iHJLlD+cJm2wHQK3xdFkiCsANJEHYgCcIOJEHYgSS4xPU4MG3xezvWRu6p9/WHxauuKdYX3fnvtd4f/cOWHUiCsANJEHYgCcIOJEHYgSQIO5AEYQeS4Dz7ceCpP+j8w76Xzd7XsTYVp//LwfILYsIfKMIAYssOJEHYgSQIO5AEYQeSIOxAEoQdSIKwA0lwnv0Y8Opl5xbr6y67tVBlyC2MYcsOJEHYgSQIO5AEYQeSIOxAEoQdSIKwA0lMZXz2hZK+KWmepJC0MiJut32TpM9Ker566Y0R8WCvGs3sf86fVqy/c3r359Lv2n9asT5jX/l6dq5mP3ZM5Us1hyV9LiIetX2SpEdsr61qt0XEl3rXHoCmTGV89t2SdleP99t+UtKCXjcGoFlv6TO77UWSPiRpQzXpWttbbK+yPeFvI9kesb3J9qZDOlCvWwBdm3LYbZ8o6V5J10fEPklfk3SmpHM0tuWf8AvaEbEyIoYjYniGZtXvGEBXphR22zM0FvS7IuI+SYqIPRHxWkQckfR1SeWrNQC0atKw27akOyQ9GRFfHjd9/riXfVLS1ubbA9CUqRyNP1/SckmP295cTbtR0jLb52js7MsOSVf3oD/U9BcvLi7WH/6tRcV67H68wW7QpqkcjX9IkicocU4dOIbwDTogCcIOJEHYgSQIO5AEYQeSIOxAEo4+Drl7sufGeb6wb8sDstkQ67Qv9k50qpwtO5AFYQeSIOxAEoQdSIKwA0kQdiAJwg4k0dfz7Lafl/TsuEmnSnqhbw28NYPa26D2JdFbt5rs7YyIeMdEhb6G/U0LtzdFxHBrDRQMam+D2pdEb93qV2/sxgNJEHYgibbDvrLl5ZcMam+D2pdEb93qS2+tfmYH0D9tb9kB9AlhB5JoJey2L7b9H7aftn1DGz10YnuH7cdtb7a9qeVeVtketb113LS5ttfa3l7dTzjGXku93WR7V7XuNtu+tKXeFtr+oe0nbG+zfV01vdV1V+irL+ut75/ZbU+T9J+SLpK0U9JGScsi4om+NtKB7R2ShiOi9S9g2P4NSS9J+mZEfKCadoukvRFxc/Uf5ZyI+JMB6e0mSS+1PYx3NVrR/PHDjEu6XNLvqsV1V+jrCvVhvbWxZT9X0tMR8UxEHJR0j6SlLfQx8CJivaS9b5i8VNLq6vFqjf2x9F2H3gZCROyOiEerx/slHR1mvNV1V+irL9oI+wJJPxn3fKcGa7z3kPQD24/YHmm7mQnMi4jd1ePnJM1rs5kJTDqMdz+9YZjxgVl33Qx/XhcH6N7sgoj4NUmXSLqm2l0dSDH2GWyQzp1OaRjvfplgmPGfa3PddTv8eV1thH2XpIXjnp9eTRsIEbGruh+VdL8GbyjqPUdH0K3uR1vu5+cGaRjviYYZ1wCsuzaHP28j7BslnWX7XbZnSrpS0poW+ngT20PVgRPZHpL0cQ3eUNRrJK2oHq+Q9ECLvbzOoAzj3WmYcbW87lof/jwi+n6TdKnGjsj/WNKfttFDh77eLemx6rat7d4k3a2x3bpDGju2cZWkUyStk7Rd0j9LmjtAvd0p6XFJWzQWrPkt9XaBxnbRt0jaXN0ubXvdFfrqy3rj67JAEhygA5Ig7EAShB1IgrADSRB2IAnCDiRB2IEk/h9BCfQTovZf9wAAAABJRU5ErkJggg=="
+     },
+     "metadata": {
+      "needs_background": "light"
+     }
+    }
+   ],
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2021-12-16T06:31:11.340019Z",
+     "iopub.status.busy": "2021-12-16T06:31:11.338995Z",
+     "iopub.status.idle": "2021-12-16T06:33:13.307650Z",
+     "shell.execute_reply": "2021-12-16T06:33:13.306938Z",
+     "shell.execute_reply.started": "2021-12-16T06:31:11.339980Z"
+    },
+    "scrolled": true
+   }
   },
   {
    "cell_type": "markdown",
-   "id": "f1be5eec",
-   "metadata": {},
    "source": [
     "以上代码使用 MNIST 数据集训练并测试了 LeNet 模型，并最终成功推理出了一张手写数字图片的标签，该图片推理结果是 7 （ pred label: 7），真实标签也是7 （true label: 7）。\n",
     "\n",
@@ -245,41 +235,21 @@
     "4. 模型推理\n",
     "\n",
     "接下来逐个步骤介绍，帮助你快速掌握使用飞桨框架实践深度学习任务的方法。\n"
-   ]
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "markdown",
-   "id": "9fdc68f2-fe82-45f0-8022-e180a5960bf7",
-   "metadata": {},
    "source": [
     "###  3.1 数据集定义与加载\n",
     "\n",
     "飞桨在 [paddle.vision.datasets](../../api/paddle/vision/Overview_cn.html#api) 下内置了计算机视觉（Computer Vision，CV）领域常见的数据集，如 MNIST、Cifar10、Cifar100、FashionMNIST 和 VOC2012 等。在本任务中，先后加载了 MNIST 训练集（`mode='train'`）和测试集（`mode='test'`），训练集用于训练模型，测试集用于评估模型效果。"
-   ]
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "code",
    "execution_count": 2,
-   "id": "f99c914f",
-   "metadata": {
-    "execution": {
-     "iopub.execute_input": "2021-12-17T06:55:43.185059Z",
-     "iopub.status.busy": "2021-12-17T06:55:43.184142Z",
-     "iopub.status.idle": "2021-12-17T06:55:53.018229Z",
-     "shell.execute_reply": "2021-12-17T06:55:53.017346Z",
-     "shell.execute_reply.started": "2021-12-17T06:55:43.185027Z"
-    },
-    "scrolled": true
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "60000 images in train_dataset, 10000 images in test_dataset\n"
-     ]
-    }
-   ],
    "source": [
     "import paddle\n",
     "from paddle.vision.transforms import Normalize\n",
@@ -291,12 +261,29 @@
     "\n",
     "# 打印数据集里图片数量\n",
     "print('{} images in train_dataset, {} images in test_dataset'.format(len(train_dataset), len(test_dataset)))"
-   ]
+   ],
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "60000 images in train_dataset, 10000 images in test_dataset\n"
+     ]
+    }
+   ],
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2021-12-17T06:55:43.185059Z",
+     "iopub.status.busy": "2021-12-17T06:55:43.184142Z",
+     "iopub.status.idle": "2021-12-17T06:55:53.018229Z",
+     "shell.execute_reply": "2021-12-17T06:55:53.017346Z",
+     "shell.execute_reply.started": "2021-12-17T06:55:43.185027Z"
+    },
+    "scrolled": true
+   }
   },
   {
    "cell_type": "markdown",
-   "id": "2d89cb67",
-   "metadata": {},
    "source": [
     "飞桨除了内置了 CV 领域常见的数据集，还在 [paddle.text](../../api/paddle/text/Overview_cn.html#api) 下内置了自然语言处理（Natural Language Processing，NLP）领域常见的数据集，并提供了自定义数据集与加载功能的 [paddle.io.Dataset](../../api/paddle/io/Dataset_cn.html#dataset) 和 [paddle.io.DataLoader](../../api/paddle/io/DataLoader_cn.html#dataloader) API，详细使用方法可参考『数据集定义与加载』 章节。\n",
     "\n",
@@ -305,14 +292,13 @@
     "\n",
     "\n",
     "更多参考：\n",
-    "* [数据集定义与加载](02_data_load_cn.html)\n",
-    "* [数据预处理](03_data_preprocessing_cn.html)\n"
-   ]
+    "* [数据集定义与加载](data_load_cn.html)\n",
+    "* [数据预处理](data_preprocessing_cn.html)\n"
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "markdown",
-   "id": "6ba82de9",
-   "metadata": {},
    "source": [
     "###  3.2 模型组网\n",
     "\n",
@@ -322,26 +308,23 @@
     "『手写数字识别任务』比较简单，普通的神经网络就能达到很高的精度，在本任务中使用了飞桨内置的 LeNet 作为模型。飞桨在 [paddle.vision.models](../../api/paddle/vision/Overview_cn.html#about-models) 下内置了 CV 领域的一些经典模型，LeNet 就是其中之一，调用很方便，只需一行代码即可完成 LeNet 的网络构建和初始化。`num_classes` 字段中定义分类的类别数，因为需要对 0 ~ 9 的十类数字进行分类，所以设置为 10。\n",
     "\n",
     "另外通过 [paddle.summary](../../api/paddle/summary_cn.html#summary) 可方便地打印网络的基础结构和参数信息。"
-   ]
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "code",
    "execution_count": 3,
-   "id": "e8bf3841",
-   "metadata": {
-    "execution": {
-     "iopub.execute_input": "2021-12-17T06:59:47.978121Z",
-     "iopub.status.busy": "2021-12-17T06:59:47.977323Z",
-     "iopub.status.idle": "2021-12-17T06:59:47.990123Z",
-     "shell.execute_reply": "2021-12-17T06:59:47.989596Z",
-     "shell.execute_reply.started": "2021-12-17T06:59:47.978088Z"
-    },
-    "scrolled": true
-   },
+   "source": [
+    "# 模型组网并初始化网络\n",
+    "lenet = paddle.vision.models.LeNet(num_classes=10)\n",
+    "\n",
+    "# 可视化模型组网结构和参数\n",
+    "paddle.summary(lenet,(1, 1, 28, 28))"
+   ],
    "outputs": [
     {
-     "name": "stdout",
      "output_type": "stream",
+     "name": "stdout",
      "text": [
       "---------------------------------------------------------------------------\n",
       " Layer (type)       Input Shape          Output Shape         Param #    \n",
@@ -369,39 +352,39 @@
      ]
     },
     {
+     "output_type": "execute_result",
      "data": {
       "text/plain": [
        "{'total_params': 61610, 'trainable_params': 61610}"
       ]
      },
-     "execution_count": 3,
      "metadata": {},
-     "output_type": "execute_result"
+     "execution_count": 3
     }
    ],
-   "source": [
-    "# 模型组网并初始化网络\n",
-    "lenet = paddle.vision.models.LeNet(num_classes=10)\n",
-    "\n",
-    "# 可视化模型组网结构和参数\n",
-    "paddle.summary(lenet,(1, 1, 28, 28))"
-   ]
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2021-12-17T06:59:47.978121Z",
+     "iopub.status.busy": "2021-12-17T06:59:47.977323Z",
+     "iopub.status.idle": "2021-12-17T06:59:47.990123Z",
+     "shell.execute_reply": "2021-12-17T06:59:47.989596Z",
+     "shell.execute_reply.started": "2021-12-17T06:59:47.978088Z"
+    },
+    "scrolled": true
+   }
   },
   {
    "cell_type": "markdown",
-   "id": "67dfcc50",
-   "metadata": {},
    "source": [
     "通过飞桨的 [paddle.nn.Sequential](../../api/paddle/nn/Sequential_cn.html) 和 [paddle.nn.Layer](../../api/paddle/nn/Layer_cn.html) API 可以更灵活方便的组建自定义的神经网络，详细使用方法可参考『模型组网』章节。\n",
     "\n",
     "更多参考：\n",
-    "* [模型组网](04_model_cn.html)"
-   ]
+    "* [模型组网](model_cn.html)"
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "markdown",
-   "id": "4902817f",
-   "metadata": {},
    "source": [
     "### 3.3 模型训练与评估\n",
     "\n",
@@ -415,26 +398,28 @@
     "\n",
     "\n",
     "因为是分类任务，这里损失函数使用常见的 [CrossEntropyLoss](../../api/paddle/nn/CrossEntropyLoss_cn.html#crossentropyloss) （交叉熵损失函数），优化器使用 [Adam](../../api/paddle/optimizer/Adam_cn.html#adam)，评价指标使用 [Accuracy](../../api/paddle/metric/Accuracy_cn.html#accuracy) 来计算模型在训练集上的精度。"
-   ]
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "code",
    "execution_count": 5,
-   "id": "3333a7bb",
-   "metadata": {
-    "execution": {
-     "iopub.execute_input": "2021-12-17T04:01:43.380962Z",
-     "iopub.status.busy": "2021-12-17T04:01:43.380575Z",
-     "iopub.status.idle": "2021-12-17T04:03:17.852495Z",
-     "shell.execute_reply": "2021-12-17T04:03:17.851918Z",
-     "shell.execute_reply.started": "2021-12-17T04:01:43.380928Z"
-    },
-    "scrolled": true
-   },
+   "source": [
+    "# 封装模型，便于进行后续的训练、评估和推理\n",
+    "model = paddle.Model(lenet)\n",
+    "\n",
+    "# 模型训练的配置准备，准备损失函数，优化器和评价指标\n",
+    "model.prepare(paddle.optimizer.Adam(parameters=model.parameters()), \n",
+    "              paddle.nn.CrossEntropyLoss(),\n",
+    "              paddle.metric.Accuracy())\n",
+    "\n",
+    "# 开始训练\n",
+    "model.fit(train_dataset, epochs=5, batch_size=64, verbose=1)"
+   ],
    "outputs": [
     {
-     "name": "stdout",
      "output_type": "stream",
+     "name": "stdout",
      "text": [
       "The loss value printed in the log is the current step, and the metric is the average value of previous steps.\n",
       "Epoch 1/5\n",
@@ -450,48 +435,44 @@
      ]
     }
    ],
-   "source": [
-    "# 封装模型，便于进行后续的训练、评估和推理\n",
-    "model = paddle.Model(lenet)\n",
-    "\n",
-    "# 模型训练的配置准备，准备损失函数，优化器和评价指标\n",
-    "model.prepare(paddle.optimizer.Adam(parameters=model.parameters()), \n",
-    "              paddle.nn.CrossEntropyLoss(),\n",
-    "              paddle.metric.Accuracy())\n",
-    "\n",
-    "# 开始训练\n",
-    "model.fit(train_dataset, epochs=5, batch_size=64, verbose=1)"
-   ]
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2021-12-17T04:01:43.380962Z",
+     "iopub.status.busy": "2021-12-17T04:01:43.380575Z",
+     "iopub.status.idle": "2021-12-17T04:03:17.852495Z",
+     "shell.execute_reply": "2021-12-17T04:03:17.851918Z",
+     "shell.execute_reply.started": "2021-12-17T04:01:43.380928Z"
+    },
+    "scrolled": true
+   }
   },
   {
    "cell_type": "markdown",
-   "id": "c6604e7c",
-   "metadata": {},
    "source": [
     "从训练过程的打印日志中，可观察到损失函数值 loss 逐渐变小，精度 acc 逐渐上升的趋势，反映出不错的训练效果。"
-   ]
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "markdown",
-   "id": "684e7be6",
-   "metadata": {},
    "source": [
     "#### 3.3.2 模型评估\n",
     "\n",
     "模型训练完成之后，调用 [paddle.Model.evaluate](../../api/paddle/Model_cn.html#evaluate-eval-data-batch-size-1-log-freq-10-verbose-2-num-workers-0-callbacks-none) ，使用预先定义的测试数据集，来评估训练好的模型效果，评估完成后将输出模型在测试集上的损失函数值 loss 和精度 acc。\n"
-   ]
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "code",
    "execution_count": 6,
-   "id": "b86f0289",
-   "metadata": {
-    "scrolled": true
-   },
+   "source": [
+    "# 进行模型评估\n",
+    "model.evaluate(test_dataset, batch_size=64, verbose=1)"
+   ],
    "outputs": [
     {
-     "name": "stdout",
      "output_type": "stream",
+     "name": "stdout",
      "text": [
       "Eval begin...\n",
       "step 157/157 [==============================] - loss: 5.7177e-04 - acc: 0.9859 - 6ms/step         \n",
@@ -499,57 +480,57 @@
      ]
     },
     {
+     "output_type": "execute_result",
      "data": {
       "text/plain": [
        "{'loss': [0.00057177414], 'acc': 0.9859}"
       ]
      },
-     "execution_count": 6,
      "metadata": {},
-     "output_type": "execute_result"
+     "execution_count": 6
     }
    ],
-   "source": [
-    "# 进行模型评估\n",
-    "model.evaluate(test_dataset, batch_size=64, verbose=1)"
-   ]
+   "metadata": {
+    "scrolled": true
+   }
   },
   {
    "cell_type": "markdown",
-   "id": "94c4a7af",
-   "metadata": {},
    "source": [
     "从结果可以看到，初步训练得到的模型精度在98%附近，在逐渐熟悉深度学习模型开发和训练技巧后，可以通过调整其中的训练参数来进一步提升模型的精度。\n",
     "\n",
     "更多参考：\n",
-    "* [模型训练、评估与推理](05_train_eval_predict_cn.html)"
-   ]
+    "* [模型训练、评估与推理](train_eval_predict_cn.html)"
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "markdown",
-   "id": "e991757c",
-   "metadata": {},
    "source": [
     "### 3.4 模型推理\n",
     "\n"
-   ]
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "markdown",
-   "id": "39ee250a",
-   "metadata": {},
    "source": [
     "#### 3.4.1 模型保存\n",
     "\n",
     "模型训练完成后，通常需要将训练好的模型参数和优化器等信息，持久化保存到参数文件中，便于后续执行推理验证。\n",
     "\n",
     "在飞桨中可通过调用 [paddle.Model.save](../../api/paddle/Model_cn.html#save-path-training-true) 保存模型。代码示例如下，其中 output 为模型保存的文件夹名称，minst 为保存的模型文件名称。"
-   ]
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "code",
    "execution_count": 5,
-   "id": "07bff4b4",
+   "source": [
+    "# 保存模型，文件夹会自动创建\n",
+    "model.save('./output/mnist')"
+   ],
+   "outputs": [],
    "metadata": {
     "execution": {
      "iopub.execute_input": "2021-12-17T04:11:42.623618Z",
@@ -559,17 +540,10 @@
      "shell.execute_reply.started": "2021-12-17T04:11:42.623579Z"
     },
     "scrolled": true
-   },
-   "outputs": [],
-   "source": [
-    "# 保存模型，文件夹会自动创建\n",
-    "model.save('./output/mnist')"
-   ]
+   }
   },
   {
    "cell_type": "markdown",
-   "id": "0daaf2e3",
-   "metadata": {},
    "source": [
     "以上代码执行后会在`output`目录下保存两个文件，`mnist.pdopt`为优化器的参数，`mnist.pdparams`为模型的参数。\n",
     "```bash\n",
@@ -577,90 +551,86 @@
     "├── mnist.pdopt     # 优化器的参数\n",
     "└── mnist.pdparams  # 模型的参数\n",
     "```\n"
-   ]
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "markdown",
-   "id": "e2d87664-6b04-4969-8af4-02f57356e1d4",
-   "metadata": {},
    "source": [
     "#### 3.4.2 模型加载并执行推理\n",
     "\n",
     "执行模型推理时，可调用 [paddle.Model.load](../../api/paddle/Model_cn.html#load-path-skip-mismatch-false-reset-optimizer-false) 加载模型，然后即可通过 [paddle.Model.predict_batch](../../api/paddle/Model_cn.html#predict-batch-inputs) 执行推理操作。\n",
     "\n",
     "如下示例中，针对前面创建的 `model` 网络加载保存的参数文件 `output/mnist`，并选择测试集中的一张图片 `test_dataset[0]` 作为输入，执行推理并打印结果，可以看到推理的结果与可视化图片一致。\n"
-   ]
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "code",
    "execution_count": 10,
-   "id": "bb8328ef",
-   "metadata": {
-    "scrolled": true
-   },
+   "source": [
+    "# 加载模型\n",
+    "model.load('output/mnist')\n",
+    "\n",
+    "# 从测试集中取出一张图片\n",
+    "img, label = test_dataset[0]\n",
+    "# 将图片shape从1*28*28变为1*1*28*28，增加一个batch维度，以匹配模型输入格式要求\n",
+    "img_batch = np.expand_dims(img.astype('float32'), axis=0)\n",
+    "\n",
+    "# 执行推理并打印结果，此处predict_batch返回的是一个list，取出其中数据获得预测结果\n",
+    "out = model.predict_batch(img_batch)[0]\n",
+    "pred_label = out.argmax()\n",
+    "print('true label: {}, pred label: {}'.format(label[0], pred_label))\n",
+    "# 可视化图片\n",
+    "from matplotlib import pyplot as plt\n",
+    "plt.imshow(img[0])"
+   ],
    "outputs": [
     {
-     "name": "stdout",
      "output_type": "stream",
+     "name": "stdout",
      "text": [
       "true label: 7, pred label: 7\n"
      ]
     },
     {
+     "output_type": "execute_result",
      "data": {
       "text/plain": [
        "<matplotlib.image.AxesImage at 0x7f853e6f0e50>"
       ]
      },
-     "execution_count": 10,
      "metadata": {},
-     "output_type": "execute_result"
+     "execution_count": 10
     },
     {
+     "output_type": "display_data",
      "data": {
-      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAPsAAAD4CAYAAAAq5pAIAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAANiklEQVR4nO3df4wc9XnH8c8n/kV8QGtDcF3j4ISQqE4aSHWBRNDKESUFImSiJBRLtVyJ5lALElRRW0QVBalVSlEIok0aySluHESgaQBhJTSNa6W1UKljg4yxgdaEmsau8QFOaxPAP/DTP24cHXD7vWNndmft5/2SVrs7z87Oo/F9PLMzO/t1RAjA8e9tbTcAoD8IO5AEYQeSIOxAEoQdSGJ6Pxc207PiBA31c5FAKq/qZzoYBzxRrVbYbV8s6XZJ0yT9bUTcXHr9CRrSeb6wziIBFGyIdR1rXe/G254m6auSLpG0WNIy24u7fT8AvVXnM/u5kp6OiGci4qCkeyQtbaYtAE2rE/YFkn4y7vnOatrr2B6xvcn2pkM6UGNxAOro+dH4iFgZEcMRMTxDs3q9OAAd1An7LkkLxz0/vZoGYADVCftGSWfZfpftmZKulLSmmbYANK3rU28Rcdj2tZL+SWOn3lZFxLbGOgPQqFrn2SPiQUkPNtQLgB7i67JAEoQdSIKwA0kQdiAJwg4kQdiBJAg7kARhB5Ig7EAShB1IgrADSRB2IAnCDiRB2IEkCDuQBGEHkiDsQBKEHUiCsANJEHYgCcIOJEHYgSQIO5AEYQeSIOxAEoQdSIKwA0kQdiAJwg4kQdiBJGoN2Wx7h6T9kl6TdDgihptoCkDzaoW98rGIeKGB9wHQQ+zGA0nUDXtI+oHtR2yPTPQC2yO2N9nedEgHai4OQLfq7sZfEBG7bJ8maa3tpyJi/fgXRMRKSSsl6WTPjZrLA9ClWlv2iNhV3Y9Kul/SuU00BaB5XYfd9pDtk44+lvRxSVubagxAs+rsxs+TdL/to+/zrYj4fiNdAWhc12GPiGcknd1gLwB6iFNvQBKEHUiCsANJEHYgCcIOJNHEhTApvPjZj3asvXP508V5nxqdV6wfPDCjWF9wd7k+e+dLHWtHNj9RnBd5sGUHkiDsQBKEHUiCsANJEHYgCcIOJEHYgSQ4zz5Ff/xH3+pY+9TQT8szn1lz4UvK5R2HX+5Yu/35j9Vc+LHrR6NndKwN3foLxXmnr3uk6XZax5YdSIKwA0kQdiAJwg4kQdiBJAg7kARhB5JwRP8GaTnZc+M8X9i35TXpZ58+r2PthQ+W/8+c82R5Hf/0V1ysz/zg/xbrt3zgvo61i97+SnHe7718YrH+idmdr5Wv65U4WKxvODBUrC854VDXy37P964u1t87srHr927ThlinfbF3wj8otuxAEoQdSIKwA0kQdiAJwg4kQdiBJAg7kATXs0/R0Hc2FGr13vvkerPrr39pScfan5+/qLzsfy3/5v0tS97TRUdTM/2VI8X60Jbdxfop6+8t1n91Zuff25+9o/xb/MejSbfstlfZHrW9ddy0ubbX2t5e3c/pbZsA6prKbvw3JF38hmk3SFoXEWdJWlc9BzDAJg17RKyXtPcNk5dKWl09Xi3p8mbbAtC0bj+zz4uIox+onpPUcTAz2yOSRiTpBM3ucnEA6qp9ND7GrqTpeKVHRKyMiOGIGJ6hWXUXB6BL3YZ9j+35klTdjzbXEoBe6DbsayStqB6vkPRAM+0A6JVJP7Pbvltjv1x+qu2dkr4g6WZJ37Z9laRnJV3RyyZRdvi5PR1rQ/d2rknSa5O899B3Xuyio2bs+b2PFuvvn1n+8/3S3vd1rC36u2eK8x4uVo9Nk4Y9IpZ1KB2bv0IBJMXXZYEkCDuQBGEHkiDsQBKEHUiCS1zRmulnLCzWv3LjV4r1GZ5WrP/D7b/ZsXbK7oeL8x6P2LIDSRB2IAnCDiRB2IEkCDuQBGEHkiDsQBKcZ0drnvrDBcX6h2eVh7LedrA8HPXcJ15+yz0dz9iyA0kQdiAJwg4kQdiBJAg7kARhB5Ig7EASnGdHTx34xIc71h799G2TzF0eQej3r7uuWH/7v/1okvfPhS07kARhB5Ig7EAShB1IgrADSRB2IAnCDiTBeXb01H9f0nl7cqLL59GX/ddFxfrs7z9WrEexms+kW3bbq2yP2t46btpNtnfZ3lzdLu1tmwDqmspu/DckXTzB9Nsi4pzq9mCzbQFo2qRhj4j1kvb2oRcAPVTnAN21trdUu/lzOr3I9ojtTbY3HdKBGosDUEe3Yf+apDMlnSNpt6RbO70wIlZGxHBEDM+Y5MIGAL3TVdgjYk9EvBYRRyR9XdK5zbYFoGldhd32/HFPPylpa6fXAhgMk55nt323pCWSTrW9U9IXJC2xfY7GTmXukHR171rEIHvbSScV68t//aGOtX1HXi3OO/rFdxfrsw5sLNbxepOGPSKWTTD5jh70AqCH+LoskARhB5Ig7EAShB1IgrADSXCJK2rZftP7i/Xvnvo3HWtLt3+qOO+sBzm11iS27EAShB1IgrADSRB2IAnCDiRB2IEkCDuQBOfZUfR/v/ORYn3Lb/9Vsf7jw4c61l76y9OL887S7mIdbw1bdiAJwg4kQdiBJAg7kARhB5Ig7EAShB1IgvPsyU1f8MvF+vWf//tifZbLf0JXPra8Y+0d/8j16v3Elh1IgrADSRB2IAnCDiRB2IEkCDuQBGEHkuA8+3HO08v/xGd/d2ex/pkTXyzW79p/WrE+7/OdtydHinOiaZNu2W0vtP1D20/Y3mb7umr6XNtrbW+v7uf0vl0A3ZrKbvxhSZ+LiMWSPiLpGtuLJd0gaV1EnCVpXfUcwICaNOwRsTsiHq0e75f0pKQFkpZKWl29bLWky3vUI4AGvKXP7LYXSfqQpA2S5kXE0R8Je07SvA7zjEgakaQTNLvrRgHUM+Wj8bZPlHSvpOsjYt/4WkSEpJhovohYGRHDETE8Q7NqNQuge1MKu+0ZGgv6XRFxXzV5j+35VX2+pNHetAigCZPuxtu2pDskPRkRXx5XWiNphaSbq/sHetIh6jn7fcXyn512Z623/+oXP1Os/+JjD9d6fzRnKp/Zz5e0XNLjtjdX027UWMi/bfsqSc9KuqInHQJoxKRhj4iHJLlD+cJm2wHQK3xdFkiCsANJEHYgCcIOJEHYgSS4xPU4MG3xezvWRu6p9/WHxauuKdYX3fnvtd4f/cOWHUiCsANJEHYgCcIOJEHYgSQIO5AEYQeS4Dz7ceCpP+j8w76Xzd7XsTYVp//LwfILYsIfKMIAYssOJEHYgSQIO5AEYQeSIOxAEoQdSIKwA0lwnv0Y8Opl5xbr6y67tVBlyC2MYcsOJEHYgSQIO5AEYQeSIOxAEoQdSIKwA0lMZXz2hZK+KWmepJC0MiJut32TpM9Ker566Y0R8WCvGs3sf86fVqy/c3r359Lv2n9asT5jX/l6dq5mP3ZM5Us1hyV9LiIetX2SpEdsr61qt0XEl3rXHoCmTGV89t2SdleP99t+UtKCXjcGoFlv6TO77UWSPiRpQzXpWttbbK+yPeFvI9kesb3J9qZDOlCvWwBdm3LYbZ8o6V5J10fEPklfk3SmpHM0tuWf8AvaEbEyIoYjYniGZtXvGEBXphR22zM0FvS7IuI+SYqIPRHxWkQckfR1SeWrNQC0atKw27akOyQ9GRFfHjd9/riXfVLS1ubbA9CUqRyNP1/SckmP295cTbtR0jLb52js7MsOSVf3oD/U9BcvLi7WH/6tRcV67H68wW7QpqkcjX9IkicocU4dOIbwDTogCcIOJEHYgSQIO5AEYQeSIOxAEo4+Drl7sufGeb6wb8sDstkQ67Qv9k50qpwtO5AFYQeSIOxAEoQdSIKwA0kQdiAJwg4k0dfz7Lafl/TsuEmnSnqhbw28NYPa26D2JdFbt5rs7YyIeMdEhb6G/U0LtzdFxHBrDRQMam+D2pdEb93qV2/sxgNJEHYgibbDvrLl5ZcMam+D2pdEb93qS2+tfmYH0D9tb9kB9AlhB5JoJey2L7b9H7aftn1DGz10YnuH7cdtb7a9qeVeVtketb113LS5ttfa3l7dTzjGXku93WR7V7XuNtu+tKXeFtr+oe0nbG+zfV01vdV1V+irL+ut75/ZbU+T9J+SLpK0U9JGScsi4om+NtKB7R2ShiOi9S9g2P4NSS9J+mZEfKCadoukvRFxc/Uf5ZyI+JMB6e0mSS+1PYx3NVrR/PHDjEu6XNLvqsV1V+jrCvVhvbWxZT9X0tMR8UxEHJR0j6SlLfQx8CJivaS9b5i8VNLq6vFqjf2x9F2H3gZCROyOiEerx/slHR1mvNV1V+irL9oI+wJJPxn3fKcGa7z3kPQD24/YHmm7mQnMi4jd1ePnJM1rs5kJTDqMdz+9YZjxgVl33Qx/XhcH6N7sgoj4NUmXSLqm2l0dSDH2GWyQzp1OaRjvfplgmPGfa3PddTv8eV1thH2XpIXjnp9eTRsIEbGruh+VdL8GbyjqPUdH0K3uR1vu5+cGaRjviYYZ1wCsuzaHP28j7BslnWX7XbZnSrpS0poW+ngT20PVgRPZHpL0cQ3eUNRrJK2oHq+Q9ECLvbzOoAzj3WmYcbW87lof/jwi+n6TdKnGjsj/WNKfttFDh77eLemx6rat7d4k3a2x3bpDGju2cZWkUyStk7Rd0j9LmjtAvd0p6XFJWzQWrPkt9XaBxnbRt0jaXN0ubXvdFfrqy3rj67JAEhygA5Ig7EAShB1IgrADSRB2IAnCDiRB2IEk/h9BCfQTovZf9wAAAABJRU5ErkJggg==",
       "text/plain": [
        "<Figure size 432x288 with 1 Axes>"
-      ]
+      ],
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAPsAAAD4CAYAAAAq5pAIAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAANiklEQVR4nO3df4wc9XnH8c8n/kV8QGtDcF3j4ISQqE4aSHWBRNDKESUFImSiJBRLtVyJ5lALElRRW0QVBalVSlEIok0aySluHESgaQBhJTSNa6W1UKljg4yxgdaEmsau8QFOaxPAP/DTP24cHXD7vWNndmft5/2SVrs7z87Oo/F9PLMzO/t1RAjA8e9tbTcAoD8IO5AEYQeSIOxAEoQdSGJ6Pxc207PiBA31c5FAKq/qZzoYBzxRrVbYbV8s6XZJ0yT9bUTcXHr9CRrSeb6wziIBFGyIdR1rXe/G254m6auSLpG0WNIy24u7fT8AvVXnM/u5kp6OiGci4qCkeyQtbaYtAE2rE/YFkn4y7vnOatrr2B6xvcn2pkM6UGNxAOro+dH4iFgZEcMRMTxDs3q9OAAd1An7LkkLxz0/vZoGYADVCftGSWfZfpftmZKulLSmmbYANK3rU28Rcdj2tZL+SWOn3lZFxLbGOgPQqFrn2SPiQUkPNtQLgB7i67JAEoQdSIKwA0kQdiAJwg4kQdiBJAg7kARhB5Ig7EAShB1IgrADSRB2IAnCDiRB2IEkCDuQBGEHkiDsQBKEHUiCsANJEHYgCcIOJEHYgSQIO5AEYQeSIOxAEoQdSIKwA0kQdiAJwg4kQdiBJGoN2Wx7h6T9kl6TdDgihptoCkDzaoW98rGIeKGB9wHQQ+zGA0nUDXtI+oHtR2yPTPQC2yO2N9nedEgHai4OQLfq7sZfEBG7bJ8maa3tpyJi/fgXRMRKSSsl6WTPjZrLA9ClWlv2iNhV3Y9Kul/SuU00BaB5XYfd9pDtk44+lvRxSVubagxAs+rsxs+TdL/to+/zrYj4fiNdAWhc12GPiGcknd1gLwB6iFNvQBKEHUiCsANJEHYgCcIOJNHEhTApvPjZj3asvXP508V5nxqdV6wfPDCjWF9wd7k+e+dLHWtHNj9RnBd5sGUHkiDsQBKEHUiCsANJEHYgCcIOJEHYgSQ4zz5Ff/xH3+pY+9TQT8szn1lz4UvK5R2HX+5Yu/35j9Vc+LHrR6NndKwN3foLxXmnr3uk6XZax5YdSIKwA0kQdiAJwg4kQdiBJAg7kARhB5JwRP8GaTnZc+M8X9i35TXpZ58+r2PthQ+W/8+c82R5Hf/0V1ysz/zg/xbrt3zgvo61i97+SnHe7718YrH+idmdr5Wv65U4WKxvODBUrC854VDXy37P964u1t87srHr927ThlinfbF3wj8otuxAEoQdSIKwA0kQdiAJwg4kQdiBJAg7kATXs0/R0Hc2FGr13vvkerPrr39pScfan5+/qLzsfy3/5v0tS97TRUdTM/2VI8X60Jbdxfop6+8t1n91Zuff25+9o/xb/MejSbfstlfZHrW9ddy0ubbX2t5e3c/pbZsA6prKbvw3JF38hmk3SFoXEWdJWlc9BzDAJg17RKyXtPcNk5dKWl09Xi3p8mbbAtC0bj+zz4uIox+onpPUcTAz2yOSRiTpBM3ucnEA6qp9ND7GrqTpeKVHRKyMiOGIGJ6hWXUXB6BL3YZ9j+35klTdjzbXEoBe6DbsayStqB6vkPRAM+0A6JVJP7Pbvltjv1x+qu2dkr4g6WZJ37Z9laRnJV3RyyZRdvi5PR1rQ/d2rknSa5O899B3Xuyio2bs+b2PFuvvn1n+8/3S3vd1rC36u2eK8x4uVo9Nk4Y9IpZ1KB2bv0IBJMXXZYEkCDuQBGEHkiDsQBKEHUiCS1zRmulnLCzWv3LjV4r1GZ5WrP/D7b/ZsXbK7oeL8x6P2LIDSRB2IAnCDiRB2IEkCDuQBGEHkiDsQBKcZ0drnvrDBcX6h2eVh7LedrA8HPXcJ15+yz0dz9iyA0kQdiAJwg4kQdiBJAg7kARhB5Ig7EASnGdHTx34xIc71h799G2TzF0eQej3r7uuWH/7v/1okvfPhS07kARhB5Ig7EAShB1IgrADSRB2IAnCDiTBeXb01H9f0nl7cqLL59GX/ddFxfrs7z9WrEexms+kW3bbq2yP2t46btpNtnfZ3lzdLu1tmwDqmspu/DckXTzB9Nsi4pzq9mCzbQFo2qRhj4j1kvb2oRcAPVTnAN21trdUu/lzOr3I9ojtTbY3HdKBGosDUEe3Yf+apDMlnSNpt6RbO70wIlZGxHBEDM+Y5MIGAL3TVdgjYk9EvBYRRyR9XdK5zbYFoGldhd32/HFPPylpa6fXAhgMk55nt323pCWSTrW9U9IXJC2xfY7GTmXukHR171rEIHvbSScV68t//aGOtX1HXi3OO/rFdxfrsw5sLNbxepOGPSKWTTD5jh70AqCH+LoskARhB5Ig7EAShB1IgrADSXCJK2rZftP7i/Xvnvo3HWtLt3+qOO+sBzm11iS27EAShB1IgrADSRB2IAnCDiRB2IEkCDuQBOfZUfR/v/ORYn3Lb/9Vsf7jw4c61l76y9OL887S7mIdbw1bdiAJwg4kQdiBJAg7kARhB5Ig7EAShB1IgvPsyU1f8MvF+vWf//tifZbLf0JXPra8Y+0d/8j16v3Elh1IgrADSRB2IAnCDiRB2IEkCDuQBGEHkuA8+3HO08v/xGd/d2ex/pkTXyzW79p/WrE+7/OdtydHinOiaZNu2W0vtP1D20/Y3mb7umr6XNtrbW+v7uf0vl0A3ZrKbvxhSZ+LiMWSPiLpGtuLJd0gaV1EnCVpXfUcwICaNOwRsTsiHq0e75f0pKQFkpZKWl29bLWky3vUI4AGvKXP7LYXSfqQpA2S5kXE0R8Je07SvA7zjEgakaQTNLvrRgHUM+Wj8bZPlHSvpOsjYt/4WkSEpJhovohYGRHDETE8Q7NqNQuge1MKu+0ZGgv6XRFxXzV5j+35VX2+pNHetAigCZPuxtu2pDskPRkRXx5XWiNphaSbq/sHetIh6jn7fcXyn512Z623/+oXP1Os/+JjD9d6fzRnKp/Zz5e0XNLjtjdX027UWMi/bfsqSc9KuqInHQJoxKRhj4iHJLlD+cJm2wHQK3xdFkiCsANJEHYgCcIOJEHYgSS4xPU4MG3xezvWRu6p9/WHxauuKdYX3fnvtd4f/cOWHUiCsANJEHYgCcIOJEHYgSQIO5AEYQeS4Dz7ceCpP+j8w76Xzd7XsTYVp//LwfILYsIfKMIAYssOJEHYgSQIO5AEYQeSIOxAEoQdSIKwA0lwnv0Y8Opl5xbr6y67tVBlyC2MYcsOJEHYgSQIO5AEYQeSIOxAEoQdSIKwA0lMZXz2hZK+KWmepJC0MiJut32TpM9Ker566Y0R8WCvGs3sf86fVqy/c3r359Lv2n9asT5jX/l6dq5mP3ZM5Us1hyV9LiIetX2SpEdsr61qt0XEl3rXHoCmTGV89t2SdleP99t+UtKCXjcGoFlv6TO77UWSPiRpQzXpWttbbK+yPeFvI9kesb3J9qZDOlCvWwBdm3LYbZ8o6V5J10fEPklfk3SmpHM0tuWf8AvaEbEyIoYjYniGZtXvGEBXphR22zM0FvS7IuI+SYqIPRHxWkQckfR1SeWrNQC0atKw27akOyQ9GRFfHjd9/riXfVLS1ubbA9CUqRyNP1/SckmP295cTbtR0jLb52js7MsOSVf3oD/U9BcvLi7WH/6tRcV67H68wW7QpqkcjX9IkicocU4dOIbwDTogCcIOJEHYgSQIO5AEYQeSIOxAEo4+Drl7sufGeb6wb8sDstkQ67Qv9k50qpwtO5AFYQeSIOxAEoQdSIKwA0kQdiAJwg4k0dfz7Lafl/TsuEmnSnqhbw28NYPa26D2JdFbt5rs7YyIeMdEhb6G/U0LtzdFxHBrDRQMam+D2pdEb93qV2/sxgNJEHYgibbDvrLl5ZcMam+D2pdEb93qS2+tfmYH0D9tb9kB9AlhB5JoJey2L7b9H7aftn1DGz10YnuH7cdtb7a9qeVeVtketb113LS5ttfa3l7dTzjGXku93WR7V7XuNtu+tKXeFtr+oe0nbG+zfV01vdV1V+irL+ut75/ZbU+T9J+SLpK0U9JGScsi4om+NtKB7R2ShiOi9S9g2P4NSS9J+mZEfKCadoukvRFxc/Uf5ZyI+JMB6e0mSS+1PYx3NVrR/PHDjEu6XNLvqsV1V+jrCvVhvbWxZT9X0tMR8UxEHJR0j6SlLfQx8CJivaS9b5i8VNLq6vFqjf2x9F2H3gZCROyOiEerx/slHR1mvNV1V+irL9oI+wJJPxn3fKcGa7z3kPQD24/YHmm7mQnMi4jd1ePnJM1rs5kJTDqMdz+9YZjxgVl33Qx/XhcH6N7sgoj4NUmXSLqm2l0dSDH2GWyQzp1OaRjvfplgmPGfa3PddTv8eV1thH2XpIXjnp9eTRsIEbGruh+VdL8GbyjqPUdH0K3uR1vu5+cGaRjviYYZ1wCsuzaHP28j7BslnWX7XbZnSrpS0poW+ngT20PVgRPZHpL0cQ3eUNRrJK2oHq+Q9ECLvbzOoAzj3WmYcbW87lof/jwi+n6TdKnGjsj/WNKfttFDh77eLemx6rat7d4k3a2x3bpDGju2cZWkUyStk7Rd0j9LmjtAvd0p6XFJWzQWrPkt9XaBxnbRt0jaXN0ubXvdFfrqy3rj67JAEhygA5Ig7EAShB1IgrADSRB2IAnCDiRB2IEk/h9BCfQTovZf9wAAAABJRU5ErkJggg=="
      },
      "metadata": {
       "needs_background": "light"
-     },
-     "output_type": "display_data"
+     }
     }
    ],
-   "source": [
-    "# 加载模型\n",
-    "model.load('output/mnist')\n",
-    "\n",
-    "# 从测试集中取出一张图片\n",
-    "img, label = test_dataset[0]\n",
-    "# 将图片shape从1*28*28变为1*1*28*28，增加一个batch维度，以匹配模型输入格式要求\n",
-    "img_batch = np.expand_dims(img.astype('float32'), axis=0)\n",
-    "\n",
-    "# 执行推理并打印结果，此处predict_batch返回的是一个list，取出其中数据获得预测结果\n",
-    "out = model.predict_batch(img_batch)[0]\n",
-    "pred_label = out.argmax()\n",
-    "print('true label: {}, pred label: {}'.format(label[0], pred_label))\n",
-    "# 可视化图片\n",
-    "from matplotlib import pyplot as plt\n",
-    "plt.imshow(img[0])"
-   ]
+   "metadata": {
+    "scrolled": true
+   }
   },
   {
    "cell_type": "markdown",
-   "id": "54a041fd",
-   "metadata": {},
    "source": [
     "更多参考：\n",
-    "* [模型保存与加载](08_model_save_load_cn.html)\n",
-    "* [模型训练、评估与推理](05_train_eval_predict_cn.html)\n"
-   ]
+    "* [模型保存与加载](model_save_load_cn.html)\n",
+    "* [模型训练、评估与推理](train_eval_predict_cn.html)\n"
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "markdown",
-   "id": "07dc4cc1-4f8e-439c-a7e5-ce9a8c8d014f",
-   "metadata": {},
    "source": [
     "## 四、总结\n",
     "\n",
@@ -671,7 +641,8 @@
     "图 2：模型开发流程\n",
     "\n",
     "如果想要完成更复杂的深度学习任务，开发更强大的模型，飞桨提供了功能丰富的 API 帮助开发者完成任务，比如对数据集应用数据增强、使用更大的 CNN 模型、调优性能等。飞桨官网提供了丰富的教程与案例可供参考，欢迎一起探索深度学习的世界。"
-   ]
+   ],
+   "metadata": {}
   }
  ],
  "metadata": {
@@ -699,4 +670,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 5
-}
+}
\ No newline at end of file
diff --git a/docs/guides/beginner/train_eval_predict_cn.ipynb b/docs/guides/beginner/train_eval_predict_cn.ipynb
index 56fe653cf06..1936f6d02fa 100644
--- a/docs/guides/beginner/train_eval_predict_cn.ipynb
+++ b/docs/guides/beginner/train_eval_predict_cn.ipynb
@@ -2,8 +2,6 @@
  "cells": [
   {
    "cell_type": "markdown",
-   "id": "f97a8a96",
-   "metadata": {},
    "source": [
     "# 模型训练、评估与推理\n",
     "\n",
@@ -21,22 +19,26 @@
     "\n",
     "\n",
     "高层 API 如 `Model.fit` 、 `Model.evaluate` 、 `Model.predict` 等都可以通过基础 API 实现，本文先介绍高层 API 的使用方式，然后将高层 API 拆解为基础 API 介绍，方便对比学习。\n"
-   ]
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "markdown",
-   "id": "1659a965-891d-420e-bc6e-7b8346e5d641",
-   "metadata": {},
    "source": [
     "## 一、训练前准备\n",
     "\n",
     "开始之前，需要使用下面的命令安装 Python 的 matplotlib 库，用于可视化图片。"
-   ]
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "a88a8752-bbab-4ecf-b384-4199626210df",
+   "source": [
+    "# 使用 pip 工具安装 matplotlib\n",
+    "! python3 -m pip install matplotlib -i https://mirror.baidu.com/pypi/simple"
+   ],
+   "outputs": [],
    "metadata": {
     "execution": {
      "iopub.execute_input": "2022-02-15T05:48:34.721177Z",
@@ -46,99 +48,78 @@
      "shell.execute_reply.started": "2022-02-15T05:48:34.721148Z"
     },
     "scrolled": true
-   },
-   "outputs": [],
-   "source": [
-    "# 使用 pip 工具安装 matplotlib\n",
-    "! python3 -m pip install matplotlib -i https://mirror.baidu.com/pypi/simple"
-   ]
+   }
   },
   {
    "cell_type": "markdown",
-   "id": "bc9cf191-82e7-47e0-b6c5-bf9c55802b3f",
-   "metadata": {},
    "source": [
     "### 1.1 （可选）指定训练的硬件\n",
     "\n",
     "模型训练时，需要用到 CPU、 GPU 等计算处理器资源，由于飞桨框架的安装包是区分处理器类型的，默认情况下飞桨框架会根据所安装的版本自动选择对应硬件，比如安装的 GPU 版本的飞桨，则自动使用 GPU 训练模型，无需手动指定。因此一般情况下，无需执行此步骤。\n",
     "\n",
     "但是如果安装的 GPU 版本的飞桨框架，想切换到 CPU 上训练，则可通过 [paddle.device.set_device](../../api/paddle/device/set_device_cn.html#set-device) 修改。如果本机有多个 GPU 卡，也可以通过该 API 选择指定的卡进行训练，不指定的情况下则默认使用 'gpu:0'。"
-   ]
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "code",
    "execution_count": 1,
-   "id": "0da73e3c-adb6-49bb-9da2-637ab920f558",
-   "metadata": {
-    "execution": {
-     "iopub.execute_input": "2022-02-23T03:53:25.061855Z",
-     "iopub.status.busy": "2022-02-23T03:53:25.061069Z",
-     "iopub.status.idle": "2022-02-23T03:53:26.486095Z",
-     "shell.execute_reply": "2022-02-23T03:53:26.485309Z",
-     "shell.execute_reply.started": "2022-02-23T03:53:25.061819Z"
-    },
-    "scrolled": true
-   },
+   "source": [
+    "import paddle\n",
+    "\n",
+    "# 指定在 CPU 上训练\n",
+    "paddle.device.set_device('cpu')\n",
+    "\n",
+    "# 指定在 GPU 第 0 号卡上训练\n",
+    "# paddle.device.set_device('gpu:0')"
+   ],
    "outputs": [
     {
+     "output_type": "execute_result",
      "data": {
       "text/plain": [
        "CPUPlace"
       ]
      },
-     "execution_count": 1,
      "metadata": {},
-     "output_type": "execute_result"
+     "execution_count": 1
     }
    ],
-   "source": [
-    "import paddle\n",
-    "\n",
-    "# 指定在 CPU 上训练\n",
-    "paddle.device.set_device('cpu')\n",
-    "\n",
-    "# 指定在 GPU 第 0 号卡上训练\n",
-    "# paddle.device.set_device('gpu:0')"
-   ]
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2022-02-23T03:53:25.061855Z",
+     "iopub.status.busy": "2022-02-23T03:53:25.061069Z",
+     "iopub.status.idle": "2022-02-23T03:53:26.486095Z",
+     "shell.execute_reply": "2022-02-23T03:53:26.485309Z",
+     "shell.execute_reply.started": "2022-02-23T03:53:25.061819Z"
+    },
+    "scrolled": true
+   }
   },
   {
    "cell_type": "markdown",
-   "id": "28d8eeac-909d-4152-b223-89a3da9e4fc6",
-   "metadata": {},
    "source": [
     "需要注意的是，使用 `paddle.device.set_device` 时，只能使用 `CUDA_VISIBLE_DEVICES` 设置范围内的显卡，例如可以设置`export CUDA_VISIBLE_DEVICES=0,1,2` 和 `paddle.device.set_device('gpu:0')`，但是设置 `export CUDA_VISIBLE_DEVICES=1` 和 `paddle.device.set_device('gpu:0')` 时会冲突报错。\n",
     "\n",
     "\n",
     "> 注：\n",
-    "> * 本文仅以单机单卡场景为例，介绍模型训练的方法，如果需要使用单机多卡、多机多卡训练，请参考如下章节：[单机多卡训练](06_device_cn.html)、[分布式训练](../06_distributed_training/index_cn.html)。\n",
-    "> * 飞桨框架除了支持在 CPU、GPU 上训练，还支持在百度昆仑 XPU、华为昇腾 NPU 等 AI 计算处理器上训练，对应的训练指导请参考 [硬件支持](../09_hardware_support/index_cn.html) 章节。\n"
-   ]
+    "> * 本文仅以单机单卡场景为例，介绍模型训练的方法，如果需要使用单机多卡、多机多卡训练，请参考如下章节：[分布式训练](../06_distributed_training/index_cn.html)。\n",
+    "> * 飞桨框架除了支持在 CPU、GPU 上训练，还支持在百度昆仑 XPU、华为昇腾 NPU 等 AI 计算处理器上训练，对应的训练指导请参考 [硬件支持](../hardware_support/index_cn.html) 章节。\n"
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "markdown",
-   "id": "ddc1ef31-4128-445d-9ff7-18d43f5b90b8",
-   "metadata": {},
    "source": [
     "### 1.2 准备训练用的数据集和模型\n",
     "\n",
     "模型训练前，需要先完成数据集的加载和模型组网，以 MNIST 手写数字识别任务为例，代码示例如下：\n"
-   ]
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "code",
    "execution_count": 6,
-   "id": "c20487eb-ae05-4a35-a459-a00484f02f1b",
-   "metadata": {
-    "execution": {
-     "iopub.execute_input": "2022-02-23T04:04:26.920997Z",
-     "iopub.status.busy": "2022-02-23T04:04:26.920406Z",
-     "iopub.status.idle": "2022-02-23T04:04:30.892862Z",
-     "shell.execute_reply": "2022-02-23T04:04:30.892026Z",
-     "shell.execute_reply.started": "2022-02-23T04:04:26.920963Z"
-    },
-    "scrolled": true
-   },
-   "outputs": [],
    "source": [
     "from paddle.vision.transforms import Normalize\n",
     "\n",
@@ -155,12 +136,21 @@
     "    paddle.nn.Dropout(0.2), \n",
     "    paddle.nn.Linear(512, 10)\n",
     ")"
-   ]
+   ],
+   "outputs": [],
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2022-02-23T04:04:26.920997Z",
+     "iopub.status.busy": "2022-02-23T04:04:26.920406Z",
+     "iopub.status.idle": "2022-02-23T04:04:30.892862Z",
+     "shell.execute_reply": "2022-02-23T04:04:30.892026Z",
+     "shell.execute_reply.started": "2022-02-23T04:04:26.920963Z"
+    },
+    "scrolled": true
+   }
   },
   {
    "cell_type": "markdown",
-   "id": "d9d0367d",
-   "metadata": {},
    "source": [
     "\n",
     "\n",
@@ -168,22 +158,26 @@
     "\n",
     "\n",
     "以手写数字识别任务为例，使用高层 API 进行模型训练、评估与推理的步骤如下：\n"
-   ]
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "markdown",
-   "id": "94022bff-6b80-40aa-a366-e41365099794",
-   "metadata": {},
    "source": [
     "### 2.1 使用 paddle.Model 封装模型\n",
     "\n",
     "使用高层 API 训练模型前，可使用 [paddle.Model](../../api/paddle/Model_cn.html) 将模型封装为一个实例，方便后续进行训练、评估与推理。代码如下："
-   ]
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "code",
    "execution_count": 4,
-   "id": "a7705595",
+   "source": [
+    "# 封装模型为一个 model 实例，便于进行后续的训练、评估和推理\n",
+    "model = paddle.Model(mnist)"
+   ],
+   "outputs": [],
    "metadata": {
     "execution": {
      "iopub.execute_input": "2022-02-15T05:48:42.417342Z",
@@ -193,17 +187,10 @@
      "shell.execute_reply.started": "2022-02-15T05:48:42.417318Z"
     },
     "scrolled": true
-   },
-   "outputs": [],
-   "source": [
-    "# 封装模型为一个 model 实例，便于进行后续的训练、评估和推理\n",
-    "model = paddle.Model(mnist)"
-   ]
+   }
   },
   {
    "cell_type": "markdown",
-   "id": "ebab1c33-2e92-4632-b902-f8c2dea85e9d",
-   "metadata": {},
    "source": [
     "### 2.2 使用 Model.prepare 配置训练准备参数\n",
     "\n",
@@ -212,12 +199,19 @@
     "- **优化器（optimizer）**：即寻找最优解的方法，可计算和更新梯度，并根据梯度更新模型参数。飞桨框架在 [paddle.optimizer](../../api/paddle/optimizer/Overview_cn.html#paddle-optimizer) 下提供了优化器相关 API。并且需要为优化器设置合适的学习率，或者指定合适的学习率策略，飞桨框架在 [paddle.optimizer.lr](../../api/paddle/optimizer/Overview_cn.html#about-lr) 下提供了学习率策略相关的 API。\n",
     "- **损失函数（loss）**：用于评估模型的预测值和真实值的差距，模型训练过程即取得尽可能小的 loss 的过程。飞桨框架在 [paddle.nn Loss层](../../api/paddle/nn/Overview_cn.html#loss) 提供了适用不同深度学习任务的损失函数相关 API。\n",
     "- **评价指标（metrics）**：用于评估模型的好坏，不同的任务通常有不同的评价指标。飞桨框架在 [paddle.metric](../../api/paddle/metric/Overview_cn.html) 下提供了评价指标相关 API。\n"
-   ]
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "code",
    "execution_count": 5,
-   "id": "ccefe291",
+   "source": [
+    "# 为模型训练做准备，设置优化器及其学习率，并将网络的参数传入优化器，设置损失函数和精度计算方式\n",
+    "model.prepare(optimizer=paddle.optimizer.Adam(learning_rate=0.001, parameters=model.parameters()), \n",
+    "              loss=paddle.nn.CrossEntropyLoss(), \n",
+    "              metrics=paddle.metric.Accuracy())"
+   ],
+   "outputs": [],
    "metadata": {
     "execution": {
      "iopub.execute_input": "2022-02-15T05:48:42.421929Z",
@@ -227,27 +221,17 @@
      "shell.execute_reply.started": "2022-02-15T05:48:42.421908Z"
     },
     "scrolled": true
-   },
-   "outputs": [],
-   "source": [
-    "# 为模型训练做准备，设置优化器及其学习率，并将网络的参数传入优化器，设置损失函数和精度计算方式\n",
-    "model.prepare(optimizer=paddle.optimizer.Adam(learning_rate=0.001, parameters=model.parameters()), \n",
-    "              loss=paddle.nn.CrossEntropyLoss(), \n",
-    "              metrics=paddle.metric.Accuracy())"
-   ]
+   }
   },
   {
    "cell_type": "markdown",
-   "id": "6a73ec3f-8e7a-40ab-bf92-01ba7247c507",
-   "metadata": {},
    "source": [
     "示例中使用 [Adam](../../api/paddle/optimizer/Adam_cn.html#adam) 优化器，设置优化器的学习率 `learning_rate=0.001`，并传入封装好的全部模型参数 `model.parameters` 用于后续更新；使用交叉熵损失函数 [CrossEntropyLoss](../../api/paddle/nn/CrossEntropyLoss_cn.html#crossentropyloss) 用于分类任务评估；使用分类任务常用的准确率指标 [Accuracy](../../api/paddle/metric/Accuracy_cn.html#accuracy) 计算模型在训练集上的精度。"
-   ]
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "markdown",
-   "id": "5da3322e",
-   "metadata": {},
    "source": [
     "### 2.3 使用 Model.fit 训练模型\n",
     "\n",
@@ -258,27 +242,24 @@
     "- **训练轮次（epoch）**：训练时遍历数据集的次数，即外循环轮次。\n",
     "- **批次大小（batch_size）**：内循环中每个批次的训练样本数。\n",
     "\n",
-    "除此之外，还可以设置样本乱序（`shuffle`）、丢弃不完整的批次样本（`drop_last`）、同步/异步读取数据（`num_workers`） 等参数，另外可通过 `Callback` 参数传入回调函数，在模型训练的各个阶段进行一些自定义操作，比如收集训练过程中的一些数据和参数，详细介绍可参见 [自定义 Callback](07_customize_cn.html) 章节。"
-   ]
+    "除此之外，还可以设置样本乱序（`shuffle`）、丢弃不完整的批次样本（`drop_last`）、同步/异步读取数据（`num_workers`） 等参数，另外可通过 `Callback` 参数传入回调函数，在模型训练的各个阶段进行一些自定义操作，比如收集训练过程中的一些数据和参数，详细介绍可参见 [自定义 Callback](../advanced/customize_cn.html) 章节。"
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "code",
    "execution_count": 6,
-   "id": "51021638",
-   "metadata": {
-    "execution": {
-     "iopub.execute_input": "2022-02-15T05:48:42.434747Z",
-     "iopub.status.busy": "2022-02-15T05:48:42.434575Z",
-     "iopub.status.idle": "2022-02-15T05:49:30.599976Z",
-     "shell.execute_reply": "2022-02-15T05:49:30.599106Z",
-     "shell.execute_reply.started": "2022-02-15T05:48:42.434726Z"
-    },
-    "scrolled": true
-   },
+   "source": [
+    "# 启动模型训练，指定训练数据集，设置训练轮次，设置每次数据集计算的批次大小，设置日志格式\n",
+    "model.fit(train_dataset, \n",
+    "          epochs=5, \n",
+    "          batch_size=64,\n",
+    "          verbose=1)"
+   ],
    "outputs": [
     {
-     "name": "stdout",
      "output_type": "stream",
+     "name": "stdout",
      "text": [
       "The loss value printed in the log is the current step, and the metric is the average value of previous steps.\n",
       "Epoch 1/5\n",
@@ -294,27 +275,27 @@
      ]
     }
    ],
-   "source": [
-    "# 启动模型训练，指定训练数据集，设置训练轮次，设置每次数据集计算的批次大小，设置日志格式\n",
-    "model.fit(train_dataset, \n",
-    "          epochs=5, \n",
-    "          batch_size=64,\n",
-    "          verbose=1)"
-   ]
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2022-02-15T05:48:42.434747Z",
+     "iopub.status.busy": "2022-02-15T05:48:42.434575Z",
+     "iopub.status.idle": "2022-02-15T05:49:30.599976Z",
+     "shell.execute_reply": "2022-02-15T05:49:30.599106Z",
+     "shell.execute_reply.started": "2022-02-15T05:48:42.434726Z"
+    },
+    "scrolled": true
+   }
   },
   {
    "cell_type": "markdown",
-   "id": "74aa5dcd-70f7-45de-b2d4-3e5527bb8458",
-   "metadata": {},
    "source": [
     "示例中传入数据集 `train_dataset` 进行迭代训练，共遍历 5 轮（`epochs=5`），每轮迭代中分批次取数据训练，每批次 64 个样本（`batch_size=64`），并打印训练过程中的日志（`verbose=1`）。\n",
     "从打印日志中可观察到损失函数 loss 值减小，精度指标 acc 值提高的趋势，说明模型训练取得了成效。"
-   ]
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "markdown",
-   "id": "eeff5f00",
-   "metadata": {},
    "source": [
     "### 2.4 使用 Model.evaluate 评估模型\n",
     "\n",
@@ -324,26 +305,21 @@
     "* 只包含loss， `{'loss': xxx}` \n",
     "* 包含loss和一个评估指标， `{'loss': xxx, 'metric name': xxx}` \n",
     "* 包含loss和多个评估指标， `{'loss': xxx, 'metric name1': xxx, 'metric name2': xxx}`\n"
-   ]
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "code",
    "execution_count": 7,
-   "id": "70f670ec",
-   "metadata": {
-    "execution": {
-     "iopub.execute_input": "2022-02-15T05:49:30.601360Z",
-     "iopub.status.busy": "2022-02-15T05:49:30.600927Z",
-     "iopub.status.idle": "2022-02-15T05:49:51.509694Z",
-     "shell.execute_reply": "2022-02-15T05:49:51.509091Z",
-     "shell.execute_reply.started": "2022-02-15T05:49:30.601333Z"
-    },
-    "scrolled": true
-   },
+   "source": [
+    "# 用 evaluate 在测试集上对模型进行验证\n",
+    "eval_result = model.evaluate(test_dataset, verbose=1)\n",
+    "print(eval_result)"
+   ],
    "outputs": [
     {
-     "name": "stdout",
      "output_type": "stream",
+     "name": "stdout",
      "text": [
       "Eval begin...\n",
       "step 10000/10000 [==============================] - loss: 2.3842e-07 - acc: 0.9714 - 2ms/step          \n",
@@ -352,24 +328,26 @@
      ]
     }
    ],
-   "source": [
-    "# 用 evaluate 在测试集上对模型进行验证\n",
-    "eval_result = model.evaluate(test_dataset, verbose=1)\n",
-    "print(eval_result)"
-   ]
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2022-02-15T05:49:30.601360Z",
+     "iopub.status.busy": "2022-02-15T05:49:30.600927Z",
+     "iopub.status.idle": "2022-02-15T05:49:51.509694Z",
+     "shell.execute_reply": "2022-02-15T05:49:51.509091Z",
+     "shell.execute_reply.started": "2022-02-15T05:49:30.601333Z"
+    },
+    "scrolled": true
+   }
   },
   {
    "cell_type": "markdown",
-   "id": "62c8053a-bdee-4539-8df7-646919d22805",
-   "metadata": {},
    "source": [
     "示例中返回一个 loss 和 一个 acc 准确率指标的结果。在模型之前未\"见过\"的测试集上，评估出仍然有  98.1% 的准确率，验证了模型在该任务上取得不错的效果。"
-   ]
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "markdown",
-   "id": "109ac763",
-   "metadata": {},
    "source": [
     "### 2.5 使用 Model.predict 执行推理\n",
     "\n",
@@ -388,26 +366,34 @@
     "\n",
     "\n",
     "\n"
-   ]
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "code",
    "execution_count": 8,
-   "id": "d6318f18",
-   "metadata": {
-    "execution": {
-     "iopub.execute_input": "2022-02-15T05:49:51.511077Z",
-     "iopub.status.busy": "2022-02-15T05:49:51.510653Z",
-     "iopub.status.idle": "2022-02-15T05:50:07.724522Z",
-     "shell.execute_reply": "2022-02-15T05:50:07.723742Z",
-     "shell.execute_reply.started": "2022-02-15T05:49:51.511050Z"
-    },
-    "scrolled": true
-   },
+   "source": [
+    "\n",
+    "# 用 predict 在测试集上对模型进行推理\n",
+    "test_result = model.predict(test_dataset)\n",
+    "# 由于模型是单一输出，test_result的形状为[1, 10000]，10000是测试数据集的数据量。这里打印第一个数据的结果，这个数组表示每个数字的预测概率\n",
+    "print(len(test_result))\n",
+    "print(test_result[0][0])\n",
+    "\n",
+    "# 从测试集中取出一张图片\n",
+    "img, label = test_dataset[0]\n",
+    "\n",
+    "# 打印推理结果，这里的argmax函数用于取出预测值中概率最高的一个的下标，作为预测标签\n",
+    "pred_label = test_result[0][0].argmax()\n",
+    "print('true label: {}, pred label: {}'.format(label[0], pred_label))\n",
+    "# 使用matplotlib库，可视化图片\n",
+    "from matplotlib import pyplot as plt\n",
+    "plt.imshow(img[0])"
+   ],
    "outputs": [
     {
-     "name": "stdout",
      "output_type": "stream",
+     "name": "stdout",
      "text": [
       "Predict begin...\n",
       "step 10000/10000 [==============================] - 2ms/step          \n",
@@ -419,51 +405,41 @@
      ]
     },
     {
+     "output_type": "execute_result",
      "data": {
       "text/plain": [
        "<matplotlib.image.AxesImage at 0x7f95f014f4d0>"
       ]
      },
-     "execution_count": 12,
      "metadata": {},
-     "output_type": "execute_result"
+     "execution_count": 12
     },
     {
+     "output_type": "display_data",
      "data": {
-      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAP8AAAD8CAYAAAC4nHJkAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAADaVJREFUeJzt3X+MXOV1xvHnib1e4jU0GILrGgcnhKA6NDjVxiSCVo4IKZAgEyWhWKrlSpRFLUhQRW2Rq6iWWqUUhSC3SSM5wY1BBGgCCCtx01CrrYVKHS/I2IBpTajT2DVewLQ2AfwDn/6x19EGdt5d5ted9fl+pNXO3HPv3KPrfXzvzDszryNCAPJ5R90NAKgH4QeSIvxAUoQfSIrwA0kRfiApwg8kRfiBpAg/kNT0bu5shvvjJA10c5dAKq/rZzochzyZdVsKv+1LJa2WNE3SNyPiltL6J2lAF/jiVnYJoGBzbJz0uk1f9tueJulrki6TtFDSMtsLm308AN3VynP+xZKejYjnIuKwpHslLW1PWwA6rZXwz5P00zH3d1fLfoHtIdvDtoeP6FALuwPQTh1/tT8i1kTEYEQM9qm/07sDMEmthH+PpPlj7p9ZLQMwBbQS/i2SzrH9XtszJF0taX172gLQaU0P9UXEUds3SPpHjQ71rY2Ip9rWGYCOammcPyI2SNrQpl4AdBFv7wWSIvxAUoQfSIrwA0kRfiApwg8kRfiBpAg/kBThB5Ii/EBShB9IivADSRF+ICnCDyRF+IGkCD+QFOEHkiL8QFKEH0iK8ANJEX4gKcIPJEX4gaQIP5AU4QeSIvxAUoQfSIrwA0kRfiCplmbptb1L0kFJb0g6GhGD7WgKQOe1FP7KxyPixTY8DoAu4rIfSKrV8IekH9p+zPZQOxoC0B2tXvZfFBF7bJ8h6WHbz0TEprErVP8pDEnSSZrZ4u4AtEtLZ/6I2FP9HpH0oKTF46yzJiIGI2KwT/2t7A5AGzUdftsDtk8+flvSJyU92a7GAHRWK5f9cyQ9aPv443w7In7Qlq4AdFzT4Y+I5ySd38ZeAHQRQ31AUoQfSIrwA0kRfiApwg8kRfiBpNrxqb4UXrr2Yw1r71n+bHHbZ0bmFOuHD/UV6/PuKddn7n6lYe3Y1qeL2yIvzvxAUoQfSIrwA0kRfiApwg8kRfiBpAg/kBTj/JP0x3/07Ya1zw68XN747BZ3vqRc3nX01Ya11S98vMWdT10/GjmrYW3gtl8qbjt942PtbqfncOYHkiL8QFKEH0iK8ANJEX4gKcIPJEX4gaQcEV3b2SmeHRf44q7tr51+9rkLGtZe/FD5/9BTd5SP8cu/6mJ9xof+t1i/9bwHGtYueedrxW2//+qsYv1TMxt/V0CrXovDxfrmQwPF+pKTjjS97/d//7pi/QNDW5p+7Dptjo06EPvLf1AVzvxAUoQfSIrwA0kRfiApwg8kRfiBpAg/kNSEn+e3vVbSpyWNRMR51bLZku6TtEDSLklXRcQEH2qf2ga+u7lQa+2xT2ltc/3NLy9pWPuLCxeU9/2v5TkHbl3y/iY6mpzprx0r1ge27S3WT9t0f7H+azMaz3cwc1d5LoQMJnPm/5akS9+07GZJGyPiHEkbq/sAppAJwx8RmyTtf9PipZLWVbfXSbqyzX0B6LBmn/PPiYjj12TPSyrPRwWg57T8gl+Mfjig4ZvXbQ/ZHrY9fESHWt0dgDZpNvz7bM+VpOr3SKMVI2JNRAxGxGCf+pvcHYB2azb86yWtqG6vkPRQe9oB0C0Tht/2PZIelXSu7d22r5F0i6RLbO+U9InqPoApZMJx/ohY1qA0NT+YfwI6+vy+hrWB+xvXJOmNCR574LsvNdFRe+z7vY8V6x+cUf7z/fL+cxvWFvzdc8VtjxarJwbe4QckRfiBpAg/kBThB5Ii/EBShB9Iiim6UZvpZ80v1r+68qvFep+nFevfWf2JhrXT9j5a3DYDzvxAUoQfSIrwA0kRfiApwg8kRfiBpAg/kBTj/KjNM384r1j/SH95pumnDpenH5/99Ktvu6dMOPMDSRF+ICnCDyRF+IGkCD+QFOEHkiL8QFKM86OjDn3qIw1rj3/u9gm2Ls/w9Ps33lisv/PffjTB4+fGmR9IivADSRF+ICnCDyRF+IGkCD+QFOEHkppwnN/2WkmfljQSEedVy1ZJulbSC9VqKyNiQ6eaxNT135c1Pr/Mcnkcf9l/XVKsz/zBE8V6FKuYzJn/W5IuHWf57RGxqPoh+MAUM2H4I2KTpP1d6AVAF7XynP8G29tsr7V9ats6AtAVzYb/65LOlrRI0l5JtzVa0faQ7WHbw0d0qMndAWi3psIfEfsi4o2IOCbpG5IWF9ZdExGDETHYN8EHNQB0T1Phtz13zN3PSHqyPe0A6JbJDPXdI2mJpNNt75b0Z5KW2F6k0dGUXZKu62CPADpgwvBHxLJxFt/RgV4wBb3j5JOL9eW/8UjD2oFjrxe3HfnS+4r1/kNbinWU8Q4/ICnCDyRF+IGkCD+QFOEHkiL8QFJ8dTdasnPVB4v1753+tw1rS3d+trht/waG8jqJMz+QFOEHkiL8QFKEH0iK8ANJEX4gKcIPJMU4P4r+73c+Wqxv++2/LtZ/fPRIw9orf3Vmcdt+7S3W0RrO/EBShB9IivADSRF+ICnCDyRF+IGkCD+QFOP8yU2f9yvF+k1fvK9Y73f5T+jqJ5Y3rL37H/i8fp048wNJEX4gKcIPJEX4gaQIP5AU4QeSIvxAUhOO89ueL+lOSXMkhaQ1EbHa9mxJ90laIGmXpKsi4uXOtYpmeHr5n/j87+0u1j8/66Vi/e6DZxTrc77Y+PxyrLglOm0yZ/6jkr4QEQslfVTS9bYXSrpZ0saIOEfSxuo+gCliwvBHxN6IeLy6fVDSDknzJC2VtK5abZ2kKzvVJID2e1vP+W0vkPRhSZslzYmI49+z9LxGnxYAmCImHX7bsyTdL+mmiDgwthYRodHXA8bbbsj2sO3hIzrUUrMA2mdS4bfdp9Hg3x0RD1SL99meW9XnShoZb9uIWBMRgxEx2Kf+dvQMoA0mDL9tS7pD0o6I+MqY0npJK6rbKyQ91P72AHTKZD7Se6Gk5ZK2295aLVsp6RZJf2/7Gkk/kXRVZ1pES84/t1j+8zPuaunhv/alzxfr73ri0ZYeH50zYfgj4hFJblC+uL3tAOgW3uEHJEX4gaQIP5AU4QeSIvxAUoQfSIqv7j4BTFv4gYa1oXtbe+/VwrXXF+sL7vr3lh4f9eHMDyRF+IGkCD+QFOEHkiL8QFKEH0iK8ANJMc5/AnjmD05tWLti5oGGtck4818Ol1eIcb+9DVMAZ34gKcIPJEX4gaQIP5AU4QeSIvxAUoQfSIpx/ing9SsWF+sbr7itUJ3Z3mZwwuDMDyRF+IGkCD+QFOEHkiL8QFKEH0iK8ANJTTjOb3u+pDslzZEUktZExGrbqyRdK+mFatWVEbGhU41m9j8XTivW3zO9+bH8uw+eUaz3HSh/np9P809dk3mTz1FJX4iIx22fLOkx2w9Xtdsj4sudaw9Ap0wY/ojYK2lvdfug7R2S5nW6MQCd9bae89teIOnDkjZXi26wvc32WtvjfpeU7SHbw7aHj+hQS80CaJ9Jh9/2LEn3S7opIg5I+rqksyUt0uiVwbhvMI+INRExGBGDfepvQ8sA2mFS4bfdp9Hg3x0RD0hSROyLiDci4pikb0gqf/oEQE+ZMPy2LekOSTsi4itjls8ds9pnJD3Z/vYAdMpkXu2/UNJySdttb62WrZS0zPYijY727JJ0XUc6REv+8qWFxfqjv7WgWI+929vYDXrJZF7tf0SSxykxpg9MYbzDD0iK8ANJEX4gKcIPJEX4gaQIP5CUo4tTLJ/i2XGBL+7a/oBsNsdGHYj94w3NvwVnfiApwg8kRfiBpAg/kBThB5Ii/EBShB9Iqqvj/LZfkPSTMYtOl/Ri1xp4e3q1t17tS6K3ZrWzt7Mi4t2TWbGr4X/Lzu3hiBisrYGCXu2tV/uS6K1ZdfXGZT+QFOEHkqo7/Gtq3n9Jr/bWq31J9NasWnqr9Tk/gPrUfeYHUJNawm/7Utv/YftZ2zfX0UMjtnfZ3m57q+3hmntZa3vE9pNjls22/bDtndXvcadJq6m3Vbb3VMduq+3La+ptvu1/tv207ads31gtr/XYFfqq5bh1/bLf9jRJ/ynpEkm7JW2RtCwinu5qIw3Y3iVpMCJqHxO2/ZuSXpF0Z0ScVy27VdL+iLil+o/z1Ij4kx7pbZWkV+qeubmaUGbu2JmlJV0p6XdV47Er9HWVajhudZz5F0t6NiKei4jDku6VtLSGPnpeRGyStP9Ni5dKWlfdXqfRP56ua9BbT4iIvRHxeHX7oKTjM0vXeuwKfdWijvDPk/TTMfd3q7em/A5JP7T9mO2hupsZx5xq2nRJel7SnDqbGceEMzd305tmlu6ZY9fMjNftxgt+b3VRRPy6pMskXV9d3vakGH3O1kvDNZOaublbxplZ+ufqPHbNznjdbnWEf4+k+WPun1kt6wkRsaf6PSLpQfXe7MP7jk+SWv0eqbmfn+ulmZvHm1laPXDsemnG6zrCv0XSObbfa3uGpKslra+hj7ewPVC9ECPbA5I+qd6bfXi9pBXV7RWSHqqxl1/QKzM3N5pZWjUfu56b8Toiuv4j6XKNvuL/Y0l/WkcPDfp6n6Qnqp+n6u5N0j0avQw8otHXRq6RdJqkjZJ2SvonSbN7qLe7JG2XtE2jQZtbU28XafSSfpukrdXP5XUfu0JftRw33uEHJMULfkBShB9IivADSRF+ICnCDyRF+IGkCD+QFOEHkvp/uK0ZUt56JeQAAAAASUVORK5CYII=",
       "text/plain": [
        "<Figure size 432x288 with 1 Axes>"
-      ]
+      ],
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAP8AAAD8CAYAAAC4nHJkAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAADaVJREFUeJzt3X+MXOV1xvHnib1e4jU0GILrGgcnhKA6NDjVxiSCVo4IKZAgEyWhWKrlSpRFLUhQRW2Rq6iWWqUUhSC3SSM5wY1BBGgCCCtx01CrrYVKHS/I2IBpTajT2DVewLQ2AfwDn/6x19EGdt5d5ted9fl+pNXO3HPv3KPrfXzvzDszryNCAPJ5R90NAKgH4QeSIvxAUoQfSIrwA0kRfiApwg8kRfiBpAg/kNT0bu5shvvjJA10c5dAKq/rZzochzyZdVsKv+1LJa2WNE3SNyPiltL6J2lAF/jiVnYJoGBzbJz0uk1f9tueJulrki6TtFDSMtsLm308AN3VynP+xZKejYjnIuKwpHslLW1PWwA6rZXwz5P00zH3d1fLfoHtIdvDtoeP6FALuwPQTh1/tT8i1kTEYEQM9qm/07sDMEmthH+PpPlj7p9ZLQMwBbQS/i2SzrH9XtszJF0taX172gLQaU0P9UXEUds3SPpHjQ71rY2Ip9rWGYCOammcPyI2SNrQpl4AdBFv7wWSIvxAUoQfSIrwA0kRfiApwg8kRfiBpAg/kBThB5Ii/EBShB9IivADSRF+ICnCDyRF+IGkCD+QFOEHkiL8QFKEH0iK8ANJEX4gKcIPJEX4gaQIP5AU4QeSIvxAUoQfSIrwA0kRfiCplmbptb1L0kFJb0g6GhGD7WgKQOe1FP7KxyPixTY8DoAu4rIfSKrV8IekH9p+zPZQOxoC0B2tXvZfFBF7bJ8h6WHbz0TEprErVP8pDEnSSZrZ4u4AtEtLZ/6I2FP9HpH0oKTF46yzJiIGI2KwT/2t7A5AGzUdftsDtk8+flvSJyU92a7GAHRWK5f9cyQ9aPv443w7In7Qlq4AdFzT4Y+I5ySd38ZeAHQRQ31AUoQfSIrwA0kRfiApwg8kRfiBpNrxqb4UXrr2Yw1r71n+bHHbZ0bmFOuHD/UV6/PuKddn7n6lYe3Y1qeL2yIvzvxAUoQfSIrwA0kRfiApwg8kRfiBpAg/kBTj/JP0x3/07Ya1zw68XN747BZ3vqRc3nX01Ya11S98vMWdT10/GjmrYW3gtl8qbjt942PtbqfncOYHkiL8QFKEH0iK8ANJEX4gKcIPJEX4gaQcEV3b2SmeHRf44q7tr51+9rkLGtZe/FD5/9BTd5SP8cu/6mJ9xof+t1i/9bwHGtYueedrxW2//+qsYv1TMxt/V0CrXovDxfrmQwPF+pKTjjS97/d//7pi/QNDW5p+7Dptjo06EPvLf1AVzvxAUoQfSIrwA0kRfiApwg8kRfiBpAg/kNSEn+e3vVbSpyWNRMR51bLZku6TtEDSLklXRcQEH2qf2ga+u7lQa+2xT2ltc/3NLy9pWPuLCxeU9/2v5TkHbl3y/iY6mpzprx0r1ge27S3WT9t0f7H+azMaz3cwc1d5LoQMJnPm/5akS9+07GZJGyPiHEkbq/sAppAJwx8RmyTtf9PipZLWVbfXSbqyzX0B6LBmn/PPiYjj12TPSyrPRwWg57T8gl+Mfjig4ZvXbQ/ZHrY9fESHWt0dgDZpNvz7bM+VpOr3SKMVI2JNRAxGxGCf+pvcHYB2azb86yWtqG6vkPRQe9oB0C0Tht/2PZIelXSu7d22r5F0i6RLbO+U9InqPoApZMJx/ohY1qA0NT+YfwI6+vy+hrWB+xvXJOmNCR574LsvNdFRe+z7vY8V6x+cUf7z/fL+cxvWFvzdc8VtjxarJwbe4QckRfiBpAg/kBThB5Ii/EBShB9Iiim6UZvpZ80v1r+68qvFep+nFevfWf2JhrXT9j5a3DYDzvxAUoQfSIrwA0kRfiApwg8kRfiBpAg/kBTj/KjNM384r1j/SH95pumnDpenH5/99Ktvu6dMOPMDSRF+ICnCDyRF+IGkCD+QFOEHkiL8QFKM86OjDn3qIw1rj3/u9gm2Ls/w9Ps33lisv/PffjTB4+fGmR9IivADSRF+ICnCDyRF+IGkCD+QFOEHkppwnN/2WkmfljQSEedVy1ZJulbSC9VqKyNiQ6eaxNT135c1Pr/Mcnkcf9l/XVKsz/zBE8V6FKuYzJn/W5IuHWf57RGxqPoh+MAUM2H4I2KTpP1d6AVAF7XynP8G29tsr7V9ats6AtAVzYb/65LOlrRI0l5JtzVa0faQ7WHbw0d0qMndAWi3psIfEfsi4o2IOCbpG5IWF9ZdExGDETHYN8EHNQB0T1Phtz13zN3PSHqyPe0A6JbJDPXdI2mJpNNt75b0Z5KW2F6k0dGUXZKu62CPADpgwvBHxLJxFt/RgV4wBb3j5JOL9eW/8UjD2oFjrxe3HfnS+4r1/kNbinWU8Q4/ICnCDyRF+IGkCD+QFOEHkiL8QFJ8dTdasnPVB4v1753+tw1rS3d+trht/waG8jqJMz+QFOEHkiL8QFKEH0iK8ANJEX4gKcIPJMU4P4r+73c+Wqxv++2/LtZ/fPRIw9orf3Vmcdt+7S3W0RrO/EBShB9IivADSRF+ICnCDyRF+IGkCD+QFOP8yU2f9yvF+k1fvK9Y73f5T+jqJ5Y3rL37H/i8fp048wNJEX4gKcIPJEX4gaQIP5AU4QeSIvxAUhOO89ueL+lOSXMkhaQ1EbHa9mxJ90laIGmXpKsi4uXOtYpmeHr5n/j87+0u1j8/66Vi/e6DZxTrc77Y+PxyrLglOm0yZ/6jkr4QEQslfVTS9bYXSrpZ0saIOEfSxuo+gCliwvBHxN6IeLy6fVDSDknzJC2VtK5abZ2kKzvVJID2e1vP+W0vkPRhSZslzYmI49+z9LxGnxYAmCImHX7bsyTdL+mmiDgwthYRodHXA8bbbsj2sO3hIzrUUrMA2mdS4bfdp9Hg3x0RD1SL99meW9XnShoZb9uIWBMRgxEx2Kf+dvQMoA0mDL9tS7pD0o6I+MqY0npJK6rbKyQ91P72AHTKZD7Se6Gk5ZK2295aLVsp6RZJf2/7Gkk/kXRVZ1pES84/t1j+8zPuaunhv/alzxfr73ri0ZYeH50zYfgj4hFJblC+uL3tAOgW3uEHJEX4gaQIP5AU4QeSIvxAUoQfSIqv7j4BTFv4gYa1oXtbe+/VwrXXF+sL7vr3lh4f9eHMDyRF+IGkCD+QFOEHkiL8QFKEH0iK8ANJMc5/AnjmD05tWLti5oGGtck4818Ol1eIcb+9DVMAZ34gKcIPJEX4gaQIP5AU4QeSIvxAUoQfSIpx/ing9SsWF+sbr7itUJ3Z3mZwwuDMDyRF+IGkCD+QFOEHkiL8QFKEH0iK8ANJTTjOb3u+pDslzZEUktZExGrbqyRdK+mFatWVEbGhU41m9j8XTivW3zO9+bH8uw+eUaz3HSh/np9P809dk3mTz1FJX4iIx22fLOkx2w9Xtdsj4sudaw9Ap0wY/ojYK2lvdfug7R2S5nW6MQCd9bae89teIOnDkjZXi26wvc32WtvjfpeU7SHbw7aHj+hQS80CaJ9Jh9/2LEn3S7opIg5I+rqksyUt0uiVwbhvMI+INRExGBGDfepvQ8sA2mFS4bfdp9Hg3x0RD0hSROyLiDci4pikb0gqf/oEQE+ZMPy2LekOSTsi4itjls8ds9pnJD3Z/vYAdMpkXu2/UNJySdttb62WrZS0zPYijY727JJ0XUc6REv+8qWFxfqjv7WgWI+929vYDXrJZF7tf0SSxykxpg9MYbzDD0iK8ANJEX4gKcIPJEX4gaQIP5CUo4tTLJ/i2XGBL+7a/oBsNsdGHYj94w3NvwVnfiApwg8kRfiBpAg/kBThB5Ii/EBShB9Iqqvj/LZfkPSTMYtOl/Ri1xp4e3q1t17tS6K3ZrWzt7Mi4t2TWbGr4X/Lzu3hiBisrYGCXu2tV/uS6K1ZdfXGZT+QFOEHkqo7/Gtq3n9Jr/bWq31J9NasWnqr9Tk/gPrUfeYHUJNawm/7Utv/YftZ2zfX0UMjtnfZ3m57q+3hmntZa3vE9pNjls22/bDtndXvcadJq6m3Vbb3VMduq+3La+ptvu1/tv207ads31gtr/XYFfqq5bh1/bLf9jRJ/ynpEkm7JW2RtCwinu5qIw3Y3iVpMCJqHxO2/ZuSXpF0Z0ScVy27VdL+iLil+o/z1Ij4kx7pbZWkV+qeubmaUGbu2JmlJV0p6XdV47Er9HWVajhudZz5F0t6NiKei4jDku6VtLSGPnpeRGyStP9Ni5dKWlfdXqfRP56ua9BbT4iIvRHxeHX7oKTjM0vXeuwKfdWijvDPk/TTMfd3q7em/A5JP7T9mO2hupsZx5xq2nRJel7SnDqbGceEMzd305tmlu6ZY9fMjNftxgt+b3VRRPy6pMskXV9d3vakGH3O1kvDNZOaublbxplZ+ufqPHbNznjdbnWEf4+k+WPun1kt6wkRsaf6PSLpQfXe7MP7jk+SWv0eqbmfn+ulmZvHm1laPXDsemnG6zrCv0XSObbfa3uGpKslra+hj7ewPVC9ECPbA5I+qd6bfXi9pBXV7RWSHqqxl1/QKzM3N5pZWjUfu56b8Toiuv4j6XKNvuL/Y0l/WkcPDfp6n6Qnqp+n6u5N0j0avQw8otHXRq6RdJqkjZJ2SvonSbN7qLe7JG2XtE2jQZtbU28XafSSfpukrdXP5XUfu0JftRw33uEHJMULfkBShB9IivADSRF+ICnCDyRF+IGkCD+QFOEHkvp/uK0ZUt56JeQAAAAASUVORK5CYII="
      },
      "metadata": {
       "needs_background": "light"
-     },
-     "output_type": "display_data"
+     }
     }
    ],
-   "source": [
-    "\n",
-    "# 用 predict 在测试集上对模型进行推理\n",
-    "test_result = model.predict(test_dataset)\n",
-    "# 由于模型是单一输出，test_result的形状为[1, 10000]，10000是测试数据集的数据量。这里打印第一个数据的结果，这个数组表示每个数字的预测概率\n",
-    "print(len(test_result))\n",
-    "print(test_result[0][0])\n",
-    "\n",
-    "# 从测试集中取出一张图片\n",
-    "img, label = test_dataset[0]\n",
-    "\n",
-    "# 打印推理结果，这里的argmax函数用于取出预测值中概率最高的一个的下标，作为预测标签\n",
-    "pred_label = test_result[0][0].argmax()\n",
-    "print('true label: {}, pred label: {}'.format(label[0], pred_label))\n",
-    "# 使用matplotlib库，可视化图片\n",
-    "from matplotlib import pyplot as plt\n",
-    "plt.imshow(img[0])"
-   ]
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2022-02-15T05:49:51.511077Z",
+     "iopub.status.busy": "2022-02-15T05:49:51.510653Z",
+     "iopub.status.idle": "2022-02-15T05:50:07.724522Z",
+     "shell.execute_reply": "2022-02-15T05:50:07.723742Z",
+     "shell.execute_reply.started": "2022-02-15T05:49:51.511050Z"
+    },
+    "scrolled": true
+   }
   },
   {
    "cell_type": "markdown",
-   "id": "a3110884-fc0f-41e5-901e-40d136eaab6c",
-   "metadata": {},
    "source": [
     "\n",
     "示例中对测试集 `test_dataset` 中每一个样本执行预测，测试数据集中包含 10000 个数据，因此将取得 10000 个预测输出。\n",
@@ -480,22 +456,20 @@
     "* [Model.predict_batch](../api/paddle/Model_cn.html#predict-batch-inputs)：在一个批次的数据集上进行推理。\n",
     "\n",
     "这三个 API 与上面介绍的三个 API 的输入数据的维度有所不同，详细介绍可参考对应 API 文档。\n"
-   ]
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "markdown",
-   "id": "9508034b",
-   "metadata": {},
    "source": [
     "## 三、使用基础 API 训练、评估与推理\n",
     "\n",
     "除了通过高层 API 实现模型的训练、评估与推理，飞桨框架也同样支持通过基础 API。简单来说， `Model.prepare` 、 `Model.fit` 、 `Model.evaluate` 、 `Model.predict` 都是由基础 API 封装而来。下面通过拆解高层 API 到基础 API 的方式，来了解如何用基础 API 完成模型训练、评估与推理。\n"
-   ]
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "markdown",
-   "id": "c02524fd",
-   "metadata": {},
    "source": [
     "### 3.1 模型训练（拆解 Model.prepare、Model.fit）\n",
     "\n",
@@ -513,35 +487,12 @@
     "    - 3.7 执行一次优化器步骤，即按照选择的优化算法，根据当前批次数据的梯度更新传入优化器的参数\n",
     "    - 3.8 将优化器的梯度进行清零\n",
     "    \n"
-   ]
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "code",
    "execution_count": 9,
-   "id": "8419b510",
-   "metadata": {
-    "execution": {
-     "iopub.execute_input": "2022-02-15T05:50:07.726048Z",
-     "iopub.status.busy": "2022-02-15T05:50:07.725599Z",
-     "iopub.status.idle": "2022-02-15T05:50:52.862759Z",
-     "shell.execute_reply": "2022-02-15T05:50:52.861931Z",
-     "shell.execute_reply.started": "2022-02-15T05:50:07.726018Z"
-    },
-    "scrolled": true
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "epoch: 0, batch_id: 900, loss is: [0.06991791], acc is: [0.96875]\n",
-      "epoch: 1, batch_id: 900, loss is: [0.02878829], acc is: [1.]\n",
-      "epoch: 2, batch_id: 900, loss is: [0.07192856], acc is: [0.96875]\n",
-      "epoch: 3, batch_id: 900, loss is: [0.20411499], acc is: [0.96875]\n",
-      "epoch: 4, batch_id: 900, loss is: [0.13589518], acc is: [0.96875]\n"
-     ]
-    }
-   ],
    "source": [
     "# dataset与mnist的定义与使用高层API的内容一致\n",
     "# 用 DataLoader 实现数据加载\n",
@@ -580,12 +531,33 @@
     "        optim.step()\n",
     "        # 梯度清零\n",
     "        optim.clear_grad()"
-   ]
+   ],
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "epoch: 0, batch_id: 900, loss is: [0.06991791], acc is: [0.96875]\n",
+      "epoch: 1, batch_id: 900, loss is: [0.02878829], acc is: [1.]\n",
+      "epoch: 2, batch_id: 900, loss is: [0.07192856], acc is: [0.96875]\n",
+      "epoch: 3, batch_id: 900, loss is: [0.20411499], acc is: [0.96875]\n",
+      "epoch: 4, batch_id: 900, loss is: [0.13589518], acc is: [0.96875]\n"
+     ]
+    }
+   ],
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2022-02-15T05:50:07.726048Z",
+     "iopub.status.busy": "2022-02-15T05:50:07.725599Z",
+     "iopub.status.idle": "2022-02-15T05:50:52.862759Z",
+     "shell.execute_reply": "2022-02-15T05:50:52.861931Z",
+     "shell.execute_reply.started": "2022-02-15T05:50:07.726018Z"
+    },
+    "scrolled": true
+   }
   },
   {
    "cell_type": "markdown",
-   "id": "00a077d3",
-   "metadata": {},
    "source": [
     "### 3.2 模型评估（拆解 Model.evaluate）\n",
     "\n",
@@ -595,35 +567,12 @@
     "1. 模型实例从 `train` 模式改为 `eval` 模式\n",
     "1. 不需要反向传播、优化器参数更新和优化器梯度清零\n",
     "\n"
-   ]
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "code",
    "execution_count": 10,
-   "id": "d27f6ec2",
-   "metadata": {
-    "execution": {
-     "iopub.execute_input": "2022-02-15T05:50:52.864342Z",
-     "iopub.status.busy": "2022-02-15T05:50:52.863912Z",
-     "iopub.status.idle": "2022-02-15T05:50:54.048046Z",
-     "shell.execute_reply": "2022-02-15T05:50:54.047104Z",
-     "shell.execute_reply.started": "2022-02-15T05:50:52.864305Z"
-    },
-    "scrolled": true
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "batch_id: 30, loss is: [0.23106411], acc is: [0.953125]\n",
-      "batch_id: 60, loss is: [0.4329119], acc is: [0.90625]\n",
-      "batch_id: 90, loss is: [0.07333981], acc is: [0.96875]\n",
-      "batch_id: 120, loss is: [0.00324837], acc is: [1.]\n",
-      "batch_id: 150, loss is: [0.0857158], acc is: [0.96875]\n"
-     ]
-    }
-   ],
    "source": [
     "# 加载测试数据集\n",
     "test_loader = paddle.io.DataLoader(test_dataset, batch_size=64, drop_last=True)\n",
@@ -645,12 +594,33 @@
     "    # 打印信息\n",
     "    if (batch_id+1) % 30 == 0:\n",
     "        print(\"batch_id: {}, loss is: {}, acc is: {}\".format(batch_id+1, loss.numpy(), acc.numpy()))"
-   ]
+   ],
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "batch_id: 30, loss is: [0.23106411], acc is: [0.953125]\n",
+      "batch_id: 60, loss is: [0.4329119], acc is: [0.90625]\n",
+      "batch_id: 90, loss is: [0.07333981], acc is: [0.96875]\n",
+      "batch_id: 120, loss is: [0.00324837], acc is: [1.]\n",
+      "batch_id: 150, loss is: [0.0857158], acc is: [0.96875]\n"
+     ]
+    }
+   ],
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2022-02-15T05:50:52.864342Z",
+     "iopub.status.busy": "2022-02-15T05:50:52.863912Z",
+     "iopub.status.idle": "2022-02-15T05:50:54.048046Z",
+     "shell.execute_reply": "2022-02-15T05:50:54.047104Z",
+     "shell.execute_reply.started": "2022-02-15T05:50:52.864305Z"
+    },
+    "scrolled": true
+   }
   },
   {
    "cell_type": "markdown",
-   "id": "214cc6de",
-   "metadata": {},
    "source": [
     "### 3.3 模型推理（拆解 Model.predict）\n",
     "\n",
@@ -659,108 +629,126 @@
     "1. 加载待执行推理的测试数据，并将模型设置为 `eval` 模式\n",
     "1. 读取测试数据并获得预测结果\n",
     "1. 对预测结果进行后处理"
-   ]
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "code",
    "execution_count": 11,
-   "id": "1d79305f",
-   "metadata": {
-    "execution": {
-     "iopub.execute_input": "2022-02-15T05:50:54.051031Z",
-     "iopub.status.busy": "2022-02-15T05:50:54.050435Z",
-     "iopub.status.idle": "2022-02-15T05:50:55.278437Z",
-     "shell.execute_reply": "2022-02-15T05:50:55.277810Z",
-     "shell.execute_reply.started": "2022-02-15T05:50:54.050992Z"
-    },
-    "scrolled": true
-   },
+   "source": [
+    "# 加载测试数据集\n",
+    "test_loader = paddle.io.DataLoader(test_dataset, batch_size=64, drop_last=True)\n",
+    "# 将该模型及其所有子层设置为预测模式\n",
+    "mnist.eval()\n",
+    "for batch_id, data in enumerate(test_loader()):\n",
+    "    # 取出测试数据\n",
+    "    x_data = data[0] \n",
+    "    # 获取预测结果\n",
+    "    predicts = mnist(x_data)\n",
+    "print(\"predict finished\")\n",
+    "\n",
+    "# 从测试集中取出一组数据\n",
+    "img, label = test_loader().next()\n",
+    "\n",
+    "# 执行推理并打印结果\n",
+    "pred_label = mnist(img)[0].argmax()\n",
+    "print('true label: {}, pred label: {}'.format(label[0].item(), pred_label[0].item()))\n",
+    "# 可视化图片\n",
+    "from matplotlib import pyplot as plt\n",
+    "plt.imshow(img[0][0])"
+   ],
    "outputs": [
     {
-     "name": "stdout",
      "output_type": "stream",
+     "name": "stdout",
      "text": [
       "predict finished\n",
       "true label: 7, pred label: 7\n"
      ]
     },
     {
+     "output_type": "execute_result",
      "data": {
       "text/plain": [
        "<matplotlib.image.AxesImage at 0x7f95f009be50>"
       ]
      },
-     "execution_count": 12,
      "metadata": {},
-     "output_type": "execute_result"
+     "execution_count": 12
     },
     {
+     "output_type": "display_data",
      "data": {
-      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAP8AAAD8CAYAAAC4nHJkAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAADaVJREFUeJzt3X+MXOV1xvHnib1e4jU0GILrGgcnhKA6NDjVxiSCVo4IKZAgEyWhWKrlSpRFLUhQRW2Rq6iWWqUUhSC3SSM5wY1BBGgCCCtx01CrrYVKHS/I2IBpTajT2DVewLQ2AfwDn/6x19EGdt5d5ted9fl+pNXO3HPv3KPrfXzvzDszryNCAPJ5R90NAKgH4QeSIvxAUoQfSIrwA0kRfiApwg8kRfiBpAg/kNT0bu5shvvjJA10c5dAKq/rZzochzyZdVsKv+1LJa2WNE3SNyPiltL6J2lAF/jiVnYJoGBzbJz0uk1f9tueJulrki6TtFDSMtsLm308AN3VynP+xZKejYjnIuKwpHslLW1PWwA6rZXwz5P00zH3d1fLfoHtIdvDtoeP6FALuwPQTh1/tT8i1kTEYEQM9qm/07sDMEmthH+PpPlj7p9ZLQMwBbQS/i2SzrH9XtszJF0taX172gLQaU0P9UXEUds3SPpHjQ71rY2Ip9rWGYCOammcPyI2SNrQpl4AdBFv7wWSIvxAUoQfSIrwA0kRfiApwg8kRfiBpAg/kBThB5Ii/EBShB9IivADSRF+ICnCDyRF+IGkCD+QFOEHkiL8QFKEH0iK8ANJEX4gKcIPJEX4gaQIP5AU4QeSIvxAUoQfSIrwA0kRfiCplmbptb1L0kFJb0g6GhGD7WgKQOe1FP7KxyPixTY8DoAu4rIfSKrV8IekH9p+zPZQOxoC0B2tXvZfFBF7bJ8h6WHbz0TEprErVP8pDEnSSZrZ4u4AtEtLZ/6I2FP9HpH0oKTF46yzJiIGI2KwT/2t7A5AGzUdftsDtk8+flvSJyU92a7GAHRWK5f9cyQ9aPv443w7In7Qlq4AdFzT4Y+I5ySd38ZeAHQRQ31AUoQfSIrwA0kRfiApwg8kRfiBpNrxqb4UXrr2Yw1r71n+bHHbZ0bmFOuHD/UV6/PuKddn7n6lYe3Y1qeL2yIvzvxAUoQfSIrwA0kRfiApwg8kRfiBpAg/kBTj/JP0x3/07Ya1zw68XN747BZ3vqRc3nX01Ya11S98vMWdT10/GjmrYW3gtl8qbjt942PtbqfncOYHkiL8QFKEH0iK8ANJEX4gKcIPJEX4gaQcEV3b2SmeHRf44q7tr51+9rkLGtZe/FD5/9BTd5SP8cu/6mJ9xof+t1i/9bwHGtYueedrxW2//+qsYv1TMxt/V0CrXovDxfrmQwPF+pKTjjS97/d//7pi/QNDW5p+7Dptjo06EPvLf1AVzvxAUoQfSIrwA0kRfiApwg8kRfiBpAg/kNSEn+e3vVbSpyWNRMR51bLZku6TtEDSLklXRcQEH2qf2ga+u7lQa+2xT2ltc/3NLy9pWPuLCxeU9/2v5TkHbl3y/iY6mpzprx0r1ge27S3WT9t0f7H+azMaz3cwc1d5LoQMJnPm/5akS9+07GZJGyPiHEkbq/sAppAJwx8RmyTtf9PipZLWVbfXSbqyzX0B6LBmn/PPiYjj12TPSyrPRwWg57T8gl+Mfjig4ZvXbQ/ZHrY9fESHWt0dgDZpNvz7bM+VpOr3SKMVI2JNRAxGxGCf+pvcHYB2azb86yWtqG6vkPRQe9oB0C0Tht/2PZIelXSu7d22r5F0i6RLbO+U9InqPoApZMJx/ohY1qA0NT+YfwI6+vy+hrWB+xvXJOmNCR574LsvNdFRe+z7vY8V6x+cUf7z/fL+cxvWFvzdc8VtjxarJwbe4QckRfiBpAg/kBThB5Ii/EBShB9Iiim6UZvpZ80v1r+68qvFep+nFevfWf2JhrXT9j5a3DYDzvxAUoQfSIrwA0kRfiApwg8kRfiBpAg/kBTj/KjNM384r1j/SH95pumnDpenH5/99Ktvu6dMOPMDSRF+ICnCDyRF+IGkCD+QFOEHkiL8QFKM86OjDn3qIw1rj3/u9gm2Ls/w9Ps33lisv/PffjTB4+fGmR9IivADSRF+ICnCDyRF+IGkCD+QFOEHkppwnN/2WkmfljQSEedVy1ZJulbSC9VqKyNiQ6eaxNT135c1Pr/Mcnkcf9l/XVKsz/zBE8V6FKuYzJn/W5IuHWf57RGxqPoh+MAUM2H4I2KTpP1d6AVAF7XynP8G29tsr7V9ats6AtAVzYb/65LOlrRI0l5JtzVa0faQ7WHbw0d0qMndAWi3psIfEfsi4o2IOCbpG5IWF9ZdExGDETHYN8EHNQB0T1Phtz13zN3PSHqyPe0A6JbJDPXdI2mJpNNt75b0Z5KW2F6k0dGUXZKu62CPADpgwvBHxLJxFt/RgV4wBb3j5JOL9eW/8UjD2oFjrxe3HfnS+4r1/kNbinWU8Q4/ICnCDyRF+IGkCD+QFOEHkiL8QFJ8dTdasnPVB4v1753+tw1rS3d+trht/waG8jqJMz+QFOEHkiL8QFKEH0iK8ANJEX4gKcIPJMU4P4r+73c+Wqxv++2/LtZ/fPRIw9orf3Vmcdt+7S3W0RrO/EBShB9IivADSRF+ICnCDyRF+IGkCD+QFOP8yU2f9yvF+k1fvK9Y73f5T+jqJ5Y3rL37H/i8fp048wNJEX4gKcIPJEX4gaQIP5AU4QeSIvxAUhOO89ueL+lOSXMkhaQ1EbHa9mxJ90laIGmXpKsi4uXOtYpmeHr5n/j87+0u1j8/66Vi/e6DZxTrc77Y+PxyrLglOm0yZ/6jkr4QEQslfVTS9bYXSrpZ0saIOEfSxuo+gCliwvBHxN6IeLy6fVDSDknzJC2VtK5abZ2kKzvVJID2e1vP+W0vkPRhSZslzYmI49+z9LxGnxYAmCImHX7bsyTdL+mmiDgwthYRodHXA8bbbsj2sO3hIzrUUrMA2mdS4bfdp9Hg3x0RD1SL99meW9XnShoZb9uIWBMRgxEx2Kf+dvQMoA0mDL9tS7pD0o6I+MqY0npJK6rbKyQ91P72AHTKZD7Se6Gk5ZK2295aLVsp6RZJf2/7Gkk/kXRVZ1pES84/t1j+8zPuaunhv/alzxfr73ri0ZYeH50zYfgj4hFJblC+uL3tAOgW3uEHJEX4gaQIP5AU4QeSIvxAUoQfSIqv7j4BTFv4gYa1oXtbe+/VwrXXF+sL7vr3lh4f9eHMDyRF+IGkCD+QFOEHkiL8QFKEH0iK8ANJMc5/AnjmD05tWLti5oGGtck4818Ol1eIcb+9DVMAZ34gKcIPJEX4gaQIP5AU4QeSIvxAUoQfSIpx/ing9SsWF+sbr7itUJ3Z3mZwwuDMDyRF+IGkCD+QFOEHkiL8QFKEH0iK8ANJTTjOb3u+pDslzZEUktZExGrbqyRdK+mFatWVEbGhU41m9j8XTivW3zO9+bH8uw+eUaz3HSh/np9P809dk3mTz1FJX4iIx22fLOkx2w9Xtdsj4sudaw9Ap0wY/ojYK2lvdfug7R2S5nW6MQCd9bae89teIOnDkjZXi26wvc32WtvjfpeU7SHbw7aHj+hQS80CaJ9Jh9/2LEn3S7opIg5I+rqksyUt0uiVwbhvMI+INRExGBGDfepvQ8sA2mFS4bfdp9Hg3x0RD0hSROyLiDci4pikb0gqf/oEQE+ZMPy2LekOSTsi4itjls8ds9pnJD3Z/vYAdMpkXu2/UNJySdttb62WrZS0zPYijY727JJ0XUc6REv+8qWFxfqjv7WgWI+929vYDXrJZF7tf0SSxykxpg9MYbzDD0iK8ANJEX4gKcIPJEX4gaQIP5CUo4tTLJ/i2XGBL+7a/oBsNsdGHYj94w3NvwVnfiApwg8kRfiBpAg/kBThB5Ii/EBShB9Iqqvj/LZfkPSTMYtOl/Ri1xp4e3q1t17tS6K3ZrWzt7Mi4t2TWbGr4X/Lzu3hiBisrYGCXu2tV/uS6K1ZdfXGZT+QFOEHkqo7/Gtq3n9Jr/bWq31J9NasWnqr9Tk/gPrUfeYHUJNawm/7Utv/YftZ2zfX0UMjtnfZ3m57q+3hmntZa3vE9pNjls22/bDtndXvcadJq6m3Vbb3VMduq+3La+ptvu1/tv207ads31gtr/XYFfqq5bh1/bLf9jRJ/ynpEkm7JW2RtCwinu5qIw3Y3iVpMCJqHxO2/ZuSXpF0Z0ScVy27VdL+iLil+o/z1Ij4kx7pbZWkV+qeubmaUGbu2JmlJV0p6XdV47Er9HWVajhudZz5F0t6NiKei4jDku6VtLSGPnpeRGyStP9Ni5dKWlfdXqfRP56ua9BbT4iIvRHxeHX7oKTjM0vXeuwKfdWijvDPk/TTMfd3q7em/A5JP7T9mO2hupsZx5xq2nRJel7SnDqbGceEMzd305tmlu6ZY9fMjNftxgt+b3VRRPy6pMskXV9d3vakGH3O1kvDNZOaublbxplZ+ufqPHbNznjdbnWEf4+k+WPun1kt6wkRsaf6PSLpQfXe7MP7jk+SWv0eqbmfn+ulmZvHm1laPXDsemnG6zrCv0XSObbfa3uGpKslra+hj7ewPVC9ECPbA5I+qd6bfXi9pBXV7RWSHqqxl1/QKzM3N5pZWjUfu56b8Toiuv4j6XKNvuL/Y0l/WkcPDfp6n6Qnqp+n6u5N0j0avQw8otHXRq6RdJqkjZJ2SvonSbN7qLe7JG2XtE2jQZtbU28XafSSfpukrdXP5XUfu0JftRw33uEHJMULfkBShB9IivADSRF+ICnCDyRF+IGkCD+QFOEHkvp/uK0ZUt56JeQAAAAASUVORK5CYII=",
       "text/plain": [
        "<Figure size 432x288 with 1 Axes>"
-      ]
+      ],
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAP8AAAD8CAYAAAC4nHJkAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAADaVJREFUeJzt3X+MXOV1xvHnib1e4jU0GILrGgcnhKA6NDjVxiSCVo4IKZAgEyWhWKrlSpRFLUhQRW2Rq6iWWqUUhSC3SSM5wY1BBGgCCCtx01CrrYVKHS/I2IBpTajT2DVewLQ2AfwDn/6x19EGdt5d5ted9fl+pNXO3HPv3KPrfXzvzDszryNCAPJ5R90NAKgH4QeSIvxAUoQfSIrwA0kRfiApwg8kRfiBpAg/kNT0bu5shvvjJA10c5dAKq/rZzochzyZdVsKv+1LJa2WNE3SNyPiltL6J2lAF/jiVnYJoGBzbJz0uk1f9tueJulrki6TtFDSMtsLm308AN3VynP+xZKejYjnIuKwpHslLW1PWwA6rZXwz5P00zH3d1fLfoHtIdvDtoeP6FALuwPQTh1/tT8i1kTEYEQM9qm/07sDMEmthH+PpPlj7p9ZLQMwBbQS/i2SzrH9XtszJF0taX172gLQaU0P9UXEUds3SPpHjQ71rY2Ip9rWGYCOammcPyI2SNrQpl4AdBFv7wWSIvxAUoQfSIrwA0kRfiApwg8kRfiBpAg/kBThB5Ii/EBShB9IivADSRF+ICnCDyRF+IGkCD+QFOEHkiL8QFKEH0iK8ANJEX4gKcIPJEX4gaQIP5AU4QeSIvxAUoQfSIrwA0kRfiCplmbptb1L0kFJb0g6GhGD7WgKQOe1FP7KxyPixTY8DoAu4rIfSKrV8IekH9p+zPZQOxoC0B2tXvZfFBF7bJ8h6WHbz0TEprErVP8pDEnSSZrZ4u4AtEtLZ/6I2FP9HpH0oKTF46yzJiIGI2KwT/2t7A5AGzUdftsDtk8+flvSJyU92a7GAHRWK5f9cyQ9aPv443w7In7Qlq4AdFzT4Y+I5ySd38ZeAHQRQ31AUoQfSIrwA0kRfiApwg8kRfiBpNrxqb4UXrr2Yw1r71n+bHHbZ0bmFOuHD/UV6/PuKddn7n6lYe3Y1qeL2yIvzvxAUoQfSIrwA0kRfiApwg8kRfiBpAg/kBTj/JP0x3/07Ya1zw68XN747BZ3vqRc3nX01Ya11S98vMWdT10/GjmrYW3gtl8qbjt942PtbqfncOYHkiL8QFKEH0iK8ANJEX4gKcIPJEX4gaQcEV3b2SmeHRf44q7tr51+9rkLGtZe/FD5/9BTd5SP8cu/6mJ9xof+t1i/9bwHGtYueedrxW2//+qsYv1TMxt/V0CrXovDxfrmQwPF+pKTjjS97/d//7pi/QNDW5p+7Dptjo06EPvLf1AVzvxAUoQfSIrwA0kRfiApwg8kRfiBpAg/kNSEn+e3vVbSpyWNRMR51bLZku6TtEDSLklXRcQEH2qf2ga+u7lQa+2xT2ltc/3NLy9pWPuLCxeU9/2v5TkHbl3y/iY6mpzprx0r1ge27S3WT9t0f7H+azMaz3cwc1d5LoQMJnPm/5akS9+07GZJGyPiHEkbq/sAppAJwx8RmyTtf9PipZLWVbfXSbqyzX0B6LBmn/PPiYjj12TPSyrPRwWg57T8gl+Mfjig4ZvXbQ/ZHrY9fESHWt0dgDZpNvz7bM+VpOr3SKMVI2JNRAxGxGCf+pvcHYB2azb86yWtqG6vkPRQe9oB0C0Tht/2PZIelXSu7d22r5F0i6RLbO+U9InqPoApZMJx/ohY1qA0NT+YfwI6+vy+hrWB+xvXJOmNCR574LsvNdFRe+z7vY8V6x+cUf7z/fL+cxvWFvzdc8VtjxarJwbe4QckRfiBpAg/kBThB5Ii/EBShB9Iiim6UZvpZ80v1r+68qvFep+nFevfWf2JhrXT9j5a3DYDzvxAUoQfSIrwA0kRfiApwg8kRfiBpAg/kBTj/KjNM384r1j/SH95pumnDpenH5/99Ktvu6dMOPMDSRF+ICnCDyRF+IGkCD+QFOEHkiL8QFKM86OjDn3qIw1rj3/u9gm2Ls/w9Ps33lisv/PffjTB4+fGmR9IivADSRF+ICnCDyRF+IGkCD+QFOEHkppwnN/2WkmfljQSEedVy1ZJulbSC9VqKyNiQ6eaxNT135c1Pr/Mcnkcf9l/XVKsz/zBE8V6FKuYzJn/W5IuHWf57RGxqPoh+MAUM2H4I2KTpP1d6AVAF7XynP8G29tsr7V9ats6AtAVzYb/65LOlrRI0l5JtzVa0faQ7WHbw0d0qMndAWi3psIfEfsi4o2IOCbpG5IWF9ZdExGDETHYN8EHNQB0T1Phtz13zN3PSHqyPe0A6JbJDPXdI2mJpNNt75b0Z5KW2F6k0dGUXZKu62CPADpgwvBHxLJxFt/RgV4wBb3j5JOL9eW/8UjD2oFjrxe3HfnS+4r1/kNbinWU8Q4/ICnCDyRF+IGkCD+QFOEHkiL8QFJ8dTdasnPVB4v1753+tw1rS3d+trht/waG8jqJMz+QFOEHkiL8QFKEH0iK8ANJEX4gKcIPJMU4P4r+73c+Wqxv++2/LtZ/fPRIw9orf3Vmcdt+7S3W0RrO/EBShB9IivADSRF+ICnCDyRF+IGkCD+QFOP8yU2f9yvF+k1fvK9Y73f5T+jqJ5Y3rL37H/i8fp048wNJEX4gKcIPJEX4gaQIP5AU4QeSIvxAUhOO89ueL+lOSXMkhaQ1EbHa9mxJ90laIGmXpKsi4uXOtYpmeHr5n/j87+0u1j8/66Vi/e6DZxTrc77Y+PxyrLglOm0yZ/6jkr4QEQslfVTS9bYXSrpZ0saIOEfSxuo+gCliwvBHxN6IeLy6fVDSDknzJC2VtK5abZ2kKzvVJID2e1vP+W0vkPRhSZslzYmI49+z9LxGnxYAmCImHX7bsyTdL+mmiDgwthYRodHXA8bbbsj2sO3hIzrUUrMA2mdS4bfdp9Hg3x0RD1SL99meW9XnShoZb9uIWBMRgxEx2Kf+dvQMoA0mDL9tS7pD0o6I+MqY0npJK6rbKyQ91P72AHTKZD7Se6Gk5ZK2295aLVsp6RZJf2/7Gkk/kXRVZ1pES84/t1j+8zPuaunhv/alzxfr73ri0ZYeH50zYfgj4hFJblC+uL3tAOgW3uEHJEX4gaQIP5AU4QeSIvxAUoQfSIqv7j4BTFv4gYa1oXtbe+/VwrXXF+sL7vr3lh4f9eHMDyRF+IGkCD+QFOEHkiL8QFKEH0iK8ANJMc5/AnjmD05tWLti5oGGtck4818Ol1eIcb+9DVMAZ34gKcIPJEX4gaQIP5AU4QeSIvxAUoQfSIpx/ing9SsWF+sbr7itUJ3Z3mZwwuDMDyRF+IGkCD+QFOEHkiL8QFKEH0iK8ANJTTjOb3u+pDslzZEUktZExGrbqyRdK+mFatWVEbGhU41m9j8XTivW3zO9+bH8uw+eUaz3HSh/np9P809dk3mTz1FJX4iIx22fLOkx2w9Xtdsj4sudaw9Ap0wY/ojYK2lvdfug7R2S5nW6MQCd9bae89teIOnDkjZXi26wvc32WtvjfpeU7SHbw7aHj+hQS80CaJ9Jh9/2LEn3S7opIg5I+rqksyUt0uiVwbhvMI+INRExGBGDfepvQ8sA2mFS4bfdp9Hg3x0RD0hSROyLiDci4pikb0gqf/oEQE+ZMPy2LekOSTsi4itjls8ds9pnJD3Z/vYAdMpkXu2/UNJySdttb62WrZS0zPYijY727JJ0XUc6REv+8qWFxfqjv7WgWI+929vYDXrJZF7tf0SSxykxpg9MYbzDD0iK8ANJEX4gKcIPJEX4gaQIP5CUo4tTLJ/i2XGBL+7a/oBsNsdGHYj94w3NvwVnfiApwg8kRfiBpAg/kBThB5Ii/EBShB9Iqqvj/LZfkPSTMYtOl/Ri1xp4e3q1t17tS6K3ZrWzt7Mi4t2TWbGr4X/Lzu3hiBisrYGCXu2tV/uS6K1ZdfXGZT+QFOEHkqo7/Gtq3n9Jr/bWq31J9NasWnqr9Tk/gPrUfeYHUJNawm/7Utv/YftZ2zfX0UMjtnfZ3m57q+3hmntZa3vE9pNjls22/bDtndXvcadJq6m3Vbb3VMduq+3La+ptvu1/tv207ads31gtr/XYFfqq5bh1/bLf9jRJ/ynpEkm7JW2RtCwinu5qIw3Y3iVpMCJqHxO2/ZuSXpF0Z0ScVy27VdL+iLil+o/z1Ij4kx7pbZWkV+qeubmaUGbu2JmlJV0p6XdV47Er9HWVajhudZz5F0t6NiKei4jDku6VtLSGPnpeRGyStP9Ni5dKWlfdXqfRP56ua9BbT4iIvRHxeHX7oKTjM0vXeuwKfdWijvDPk/TTMfd3q7em/A5JP7T9mO2hupsZx5xq2nRJel7SnDqbGceEMzd305tmlu6ZY9fMjNftxgt+b3VRRPy6pMskXV9d3vakGH3O1kvDNZOaublbxplZ+ufqPHbNznjdbnWEf4+k+WPun1kt6wkRsaf6PSLpQfXe7MP7jk+SWv0eqbmfn+ulmZvHm1laPXDsemnG6zrCv0XSObbfa3uGpKslra+hj7ewPVC9ECPbA5I+qd6bfXi9pBXV7RWSHqqxl1/QKzM3N5pZWjUfu56b8Toiuv4j6XKNvuL/Y0l/WkcPDfp6n6Qnqp+n6u5N0j0avQw8otHXRq6RdJqkjZJ2SvonSbN7qLe7JG2XtE2jQZtbU28XafSSfpukrdXP5XUfu0JftRw33uEHJMULfkBShB9IivADSRF+ICnCDyRF+IGkCD+QFOEHkvp/uK0ZUt56JeQAAAAASUVORK5CYII="
      },
      "metadata": {
       "needs_background": "light"
-     },
-     "output_type": "display_data"
+     }
     }
    ],
-   "source": [
-    "# 加载测试数据集\n",
-    "test_loader = paddle.io.DataLoader(test_dataset, batch_size=64, drop_last=True)\n",
-    "# 将该模型及其所有子层设置为预测模式\n",
-    "mnist.eval()\n",
-    "for batch_id, data in enumerate(test_loader()):\n",
-    "    # 取出测试数据\n",
-    "    x_data = data[0] \n",
-    "    # 获取预测结果\n",
-    "    predicts = mnist(x_data)\n",
-    "print(\"predict finished\")\n",
-    "\n",
-    "# 从测试集中取出一组数据\n",
-    "img, label = test_loader().next()\n",
-    "\n",
-    "# 执行推理并打印结果\n",
-    "pred_label = mnist(img)[0].argmax()\n",
-    "print('true label: {}, pred label: {}'.format(label[0].item(), pred_label[0].item()))\n",
-    "# 可视化图片\n",
-    "from matplotlib import pyplot as plt\n",
-    "plt.imshow(img[0][0])"
-   ]
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2022-02-15T05:50:54.051031Z",
+     "iopub.status.busy": "2022-02-15T05:50:54.050435Z",
+     "iopub.status.idle": "2022-02-15T05:50:55.278437Z",
+     "shell.execute_reply": "2022-02-15T05:50:55.277810Z",
+     "shell.execute_reply.started": "2022-02-15T05:50:54.050992Z"
+    },
+    "scrolled": true
+   }
   },
   {
    "cell_type": "markdown",
-   "id": "924b3c12-90e7-41e9-9aef-205866c02ee5",
-   "metadata": {},
    "source": [
     "## 四、总结\n",
     "\n",
-    "本节中介绍了在飞桨框架中使用高层 API 进行模型训练、评估和推理的方法，并拆解出对应的基础 API 实现方法。需要注意的是，这里的推理仅用于模型效果验证，实际生产应用中，则可使用飞桨提供的一系列推理部署工具，满足服务器端、移动端、网页/小程序等多种环境的模型部署上线需求，具体可参见 [推理部署](../05_inference_deployment/index_cn.html) 章节。\n",
+    "本节中介绍了在飞桨框架中使用高层 API 进行模型训练、评估和推理的方法，并拆解出对应的基础 API 实现方法。需要注意的是，这里的推理仅用于模型效果验证，实际生产应用中，则可使用飞桨提供的一系列推理部署工具，满足服务器端、移动端、网页/小程序等多种环境的模型部署上线需求，具体可参见 [推理部署](../infer/index_cn.html) 章节。\n",
     "\n",
     "同时，飞桨的高层 API 和基础 API 可以组合使用，并不是完全割裂开的，这样有助于开发者更便捷地完成算法迭代。示例代码如下：\n",
     "\n"
-   ]
+   ],
+   "metadata": {}
   },
   {
    "cell_type": "code",
    "execution_count": 10,
-   "id": "50b8953f-9df0-4fa3-b420-f5987ad0cf3b",
-   "metadata": {
-    "execution": {
-     "iopub.execute_input": "2022-02-23T04:10:14.866170Z",
-     "iopub.status.busy": "2022-02-23T04:10:14.865665Z",
-     "iopub.status.idle": "2022-02-23T04:11:59.489596Z",
-     "shell.execute_reply": "2022-02-23T04:11:59.488958Z",
-     "shell.execute_reply.started": "2022-02-23T04:10:14.866136Z"
-    },
-    "scrolled": true
-   },
+   "source": [
+    "from  paddle.vision.models import LeNet\n",
+    "\n",
+    "class FaceNet(paddle.nn.Layer):\n",
+    "    def __init__(self):\n",
+    "        super().__init__()\n",
+    "        # 使用高层API组网\n",
+    "        self.backbone = LeNet()\n",
+    "        # 使用基础API组网\n",
+    "        self.outLayer1 = paddle.nn.Sequential(\n",
+    "            paddle.nn.Linear(10, 512),\n",
+    "            paddle.nn.ReLU(),\n",
+    "            paddle.nn.Dropout(0.2)\n",
+    "        )\n",
+    "        self.outLayer2 = paddle.nn.Linear(512, 10)\n",
+    "    \n",
+    "    def forward(self, inputs):\n",
+    "        out = self.backbone(inputs)\n",
+    "        out = self.outLayer1(out)\n",
+    "        out = self.outLayer2(out)\n",
+    "        return out\n",
+    "# 使用高层API封装网络\n",
+    "model = paddle.Model(FaceNet())\n",
+    "# 使用基础API定义优化器\n",
+    "optim = paddle.optimizer.Adam(learning_rate=1e-3, parameters=model.parameters())\n",
+    "# 使用高层API封装优化器和损失函数\n",
+    "model.prepare(optim, paddle.nn.CrossEntropyLoss(), metrics=paddle.metric.Accuracy())\n",
+    "# 使用高层API训练网络\n",
+    "model.fit(train_dataset, test_dataset, epochs=5, batch_size=64, verbose=1)"
+   ],
    "outputs": [
     {
-     "name": "stdout",
      "output_type": "stream",
+     "name": "stdout",
      "text": [
       "The loss value printed in the log is the current step, and the metric is the average value of previous steps.\n",
       "Epoch 1/5\n",
@@ -791,36 +779,16 @@
      ]
     }
    ],
-   "source": [
-    "from  paddle.vision.models import LeNet\n",
-    "\n",
-    "class FaceNet(paddle.nn.Layer):\n",
-    "    def __init__(self):\n",
-    "        super().__init__()\n",
-    "        # 使用高层API组网\n",
-    "        self.backbone = LeNet()\n",
-    "        # 使用基础API组网\n",
-    "        self.outLayer1 = paddle.nn.Sequential(\n",
-    "            paddle.nn.Linear(10, 512),\n",
-    "            paddle.nn.ReLU(),\n",
-    "            paddle.nn.Dropout(0.2)\n",
-    "        )\n",
-    "        self.outLayer2 = paddle.nn.Linear(512, 10)\n",
-    "    \n",
-    "    def forward(self, inputs):\n",
-    "        out = self.backbone(inputs)\n",
-    "        out = self.outLayer1(out)\n",
-    "        out = self.outLayer2(out)\n",
-    "        return out\n",
-    "# 使用高层API封装网络\n",
-    "model = paddle.Model(FaceNet())\n",
-    "# 使用基础API定义优化器\n",
-    "optim = paddle.optimizer.Adam(learning_rate=1e-3, parameters=model.parameters())\n",
-    "# 使用高层API封装优化器和损失函数\n",
-    "model.prepare(optim, paddle.nn.CrossEntropyLoss(), metrics=paddle.metric.Accuracy())\n",
-    "# 使用高层API训练网络\n",
-    "model.fit(train_dataset, test_dataset, epochs=5, batch_size=64, verbose=1)"
-   ]
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2022-02-23T04:10:14.866170Z",
+     "iopub.status.busy": "2022-02-23T04:10:14.865665Z",
+     "iopub.status.idle": "2022-02-23T04:11:59.489596Z",
+     "shell.execute_reply": "2022-02-23T04:11:59.488958Z",
+     "shell.execute_reply.started": "2022-02-23T04:10:14.866136Z"
+    },
+    "scrolled": true
+   }
   }
  ],
  "metadata": {
@@ -844,4 +812,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 5
-}
+}
\ No newline at end of file
diff --git a/docs/guides/infer/index_en.rst b/docs/guides/infer/index_en.rst
index 29ac0e214ba..9718091a8d5 100644
--- a/docs/guides/infer/index_en.rst
+++ b/docs/guides/infer/index_en.rst
@@ -2,12 +2,9 @@
 Deploy Inference Model
 #######################
 
-- `Server side Deployment <inference/index_en.html>`_ : This section illustrates the method how to deploy and release the trained models on the servers
-
 - `Model Compression <paddleslim/paddle_slim_en.html>`_ : Introduce the features and usage of PaddleSlim which is a toolkit for model compression.
 
 ..  toctree::
     :hidden:
 
-    inference/index_en.rst 
     paddleslim/paddle_slim_en.rst
diff --git a/docs/guides/jit/index_en.rst b/docs/guides/jit/index_en.rst
index 3dcdc63ceb8..8a9f6a5e096 100644
--- a/docs/guides/jit/index_en.rst
+++ b/docs/guides/jit/index_en.rst
@@ -10,23 +10,15 @@ While dygraph has usability and debug benefits and static graph yields performan
 
 We introduce the transformation of dygraph to static graph in the following links:
 
-- `Basic Usage <basic_usage_en.html>`_ : Introduce the basic usage for @to_static.
 
 - `Supported Grammars <grammar_list_en.html>`_ : Introduce supported grammars and unsupported grammars .
 
-- `Predictive Model Export Tutorial <export_model_en.html>`_ : Introduce the tutorial for exporting predictive model.
-
-- `Case analysis of InputSpec <export_model_en.html>`_ : Introduce the common case studies of @to_static.
-
 - `Error Debugging Experience <debugging_en.html>`_ : Introduce the debugging methods when using @to_static.
 
 
 ..  toctree::
     :hidden:
 
-    basic_usage_en.rst
     grammar_list_en.md
-    export_model_en.md
-    case_analysis_en.md
     debugging_en.md
 
diff --git a/docs/guides/model_convert/index_cn.rst b/docs/guides/model_convert/index_cn.rst
index 37b79c9bc12..675d360202b 100644
--- a/docs/guides/model_convert/index_cn.rst
+++ b/docs/guides/model_convert/index_cn.rst
@@ -7,7 +7,7 @@
 
 - `升级指南 <./update_cn.html>`_: 介绍飞桨框架2.0 的主要变化和如何升级到最新版飞桨。
 - `版本迁移工具 <./migration_cn.html>`_: 介绍飞桨框架版本转换工具的使用。
-- `兼容载入旧格式模型 <./load_old_format_model.html>`_: 介绍飞桨框架如何在2.x版本加载1.x版本保存的模型。
+- `兼容载入旧格式模型 <./load_old_format_model_cn.html>`_: 介绍飞桨框架如何在2.x版本加载1.x版本保存的模型。
 - `Paddle API映射表 <./paddle_api_mapping_cn.html>`_ : 说明 Paddle 1.8 版本与 Paddle 2.0 API对应关系。
 - `PyTorch API映射表 <./pytorch_api_mapping_cn.html>`_ : 说明 PyTorch 1.8 版本与 Paddle 2.0 API对应关系。
 
@@ -16,6 +16,6 @@
 
     update_cn.md
     migration_cn.rst
-    load_old_format_model.rst
+    load_old_format_model_cn.rst
     paddle_api_mapping_cn.rst
     pytorch_api_mapping_cn.rst
diff --git a/docs/guides/model_convert/index_en.rst b/docs/guides/model_convert/index_en.rst
index 68b81d51d8d..e1452214534 100644
--- a/docs/guides/model_convert/index_en.rst
+++ b/docs/guides/model_convert/index_en.rst
@@ -1,16 +1,12 @@
 #####################
-Introduction
+Model Convert
 #####################
 
-Introduction of Paddle 2.
+Introduction of how to convert your model to Paddle 2.X.
 
-For more information, you can view these pages:
-
-- `basic concept <./basic_concept/index_en.html>`_ : introduction of PaddlePaddle 2.0  basic concept.
 - `update <./update_en.html>`_ : update guides for paddle 2.0.
 
 ..  toctree::
     :hidden:
     
-    basic_concept/index_en.rst
     update_en.md
diff --git a/docs/guides/new_op/index_en.rst b/docs/guides/new_op/index_en.rst
index 5fe0b2c77c8..d012ad22030 100644
--- a/docs/guides/new_op/index_en.rst
+++ b/docs/guides/new_op/index_en.rst
@@ -7,15 +7,9 @@ This section will guide you on how to use the custom operator mechanism of Paddl
 1. C++ operator: The writing method is relatively simple, does not involve the internal concept of the framework, does not need to recompile the paddle framework, and is used as an external module.
 2. Python operator: use Python to implement forward and backward methods, then used in network.
 
-- `Custom C++ Operator <./new_custom_op_cn.html>`_
-
-- `Custom Python Operator <./new_python_op_cn.html>`_
-
 - `Kernel Primitives API <./kernel_primitive_api/index_en.html>`_ : Introduce the block-level CUDA functions provided by PaddlePaddle to speed up operator development.
 
 .. toctree::
    :hidden:
 
-   new_op_en.md
-   op_notes_en.md
    kernel_primitive_api/index_en.rst
diff --git a/docs/guides/performance_improving/index_en.rst b/docs/guides/performance_improving/index_en.rst
index b0743bc8acd..cda6e67b286 100644
--- a/docs/guides/performance_improving/index_en.rst
+++ b/docs/guides/performance_improving/index_en.rst
@@ -1,7 +1,11 @@
-###############
-Practice Improving
-###############
+#######################
+Performance Improving
+#######################
+
+
+- `AMP <./amp_en.html>`_
 
 ..  toctree::
     :maxdepth: 1
 
+    amp_en.md
\ No newline at end of file

From 513f84dd9605a46baee9675a49873d5d0c546068 Mon Sep 17 00:00:00 2001
From: TCChenlong <1300851984@qq.com>
Date: Fri, 13 May 2022 11:17:42 +0800
Subject: [PATCH 09/11] update guides index

---
 docs/guides/advanced/index_cn.rst | 2 +-
 docs/guides/advanced/index_en.rst | 6 +++---
 docs/guides/index_cn.rst          | 2 +-
 docs/guides/index_en.rst          | 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/docs/guides/advanced/index_cn.rst b/docs/guides/advanced/index_cn.rst
index a56d0238d70..f32e09b1a6c 100644
--- a/docs/guides/advanced/index_cn.rst
+++ b/docs/guides/advanced/index_cn.rst
@@ -1,5 +1,5 @@
 ###################
-进阶用法
+模型开发更多用法
 ###################
 
 
diff --git a/docs/guides/advanced/index_en.rst b/docs/guides/advanced/index_en.rst
index 9981fbd4f0c..95abcbc139c 100644
--- a/docs/guides/advanced/index_en.rst
+++ b/docs/guides/advanced/index_en.rst
@@ -1,6 +1,6 @@
-########################
-Advanced Guides
-########################
+##################################
+More Uses for Model Development
+##################################
 
 - `Model Visualization <./visualdl_usage_en.html>`_
 - `Model and Layer <./layer_and_model_en.html>`_
diff --git a/docs/guides/index_cn.rst b/docs/guides/index_cn.rst
index f160baeb19f..e1923626b78 100644
--- a/docs/guides/index_cn.rst
+++ b/docs/guides/index_cn.rst
@@ -9,7 +9,7 @@
 使用教程分为如下的模块：
 
 - `模型开发 <./beginner/index_cn.html>`_
-- `进阶用法 <./advanced/index_cn.html>`_
+- `模型开发更多用法 <./advanced/index_cn.html>`_
 - `动态图转静态图 <./jit/index_cn.html>`_
 - `预测部署 <./infer/index_cn.html>`_ 
 - `分布式训练 <./06_distributed_training/index_cn.html>`_
diff --git a/docs/guides/index_en.rst b/docs/guides/index_en.rst
index b4c988644c6..fc890700395 100644
--- a/docs/guides/index_en.rst
+++ b/docs/guides/index_en.rst
@@ -10,7 +10,7 @@ Please refer to  `PaddlePaddle Github <https://github.com/PaddlePaddle/Paddle>`_
 Let's start with studying basic concept of PaddlePaddle:
 
 - `Model Development <./beginner/index_en.html>`_
-- `Advanced Guides <./advanced/index_en.html>`_
+- `More Uses for Model Development <./advanced/index_en.html>`_
 - `Dygraph to Static Graph <./jit/index_en.html>`_ : Introduce the transformation of dygraph to static graph.
 - `Inference and Deployment <./infer/index_en.html>`_ : Introduce the method of using the trained model to inference.
 - `Distributed Training <./06_distributed_training/index_en.html>`_ : Introduce how the PaddlePaddle uses distributed training

From ddca9ba9a57bd5e39ff5fb1be720ff12465ea2cb Mon Sep 17 00:00:00 2001
From: TCChenlong <1300851984@qq.com>
Date: Fri, 13 May 2022 12:42:15 +0800
Subject: [PATCH 10/11] fix api docs bugs test=document_fix

---
 docs/dev_guides/index_cn.rst                     |   2 ++
 docs/dev_guides/index_en.rst                     |   3 +++
 .../kernel_primitive_api/add_example_cn.md       |   0
 .../kernel_primitive_api/add_example_en.md       |   0
 .../kernel_primitive_api/api_description_cn.rst  |   0
 .../kernel_primitive_api/api_description_en.rst  |   0
 .../kernel_primitive_api/compute_api_cn.md       |   0
 .../kernel_primitive_api/compute_api_en.md       |   0
 .../kernel_primitive_api/example_cn.rst          |   0
 .../kernel_primitive_api/example_en.rst          |   0
 .../kernel_primitive_api/functor_api_cn.md       |   0
 .../kernel_primitive_api/functor_api_en.md       |   0
 .../images/compute_reduce.png                    | Bin
 .../kernel_primitive_api/images/example_add.png  | Bin
 .../images/example_reduce.png                    | Bin
 .../kernel_primitive_api/images/io_read_data.png | Bin
 .../images/io_read_data_broadcast.png            | Bin
 .../images/io_read_data_broadcast_stride.png     | Bin
 .../images/io_read_data_reduce.png               | Bin
 .../images/io_read_data_stride.png               | Bin
 .../images/io_write_data.png                     | Bin
 .../images/io_write_data_stride.png              | Bin
 .../kernel_primitive_api/index_cn.rst            |   0
 .../kernel_primitive_api/index_en.rst            |   0
 .../kernel_primitive_api/io_api_cn.md            |   0
 .../kernel_primitive_api/io_api_en.md            |   0
 .../kernel_primitive_api/reduce_example_cn.md    |   0
 .../kernel_primitive_api/reduce_example_en.md    |   0
 docs/guides/beginner/index_cn.rst                |   2 +-
 docs/guides/index_cn.rst                         |   2 +-
 docs/guides/index_en.rst                         |   2 --
 docs/guides/new_op/index_cn.rst                  |   6 ++----
 docs/guides/new_op/index_en.rst                  |  15 ---------------
 33 files changed, 9 insertions(+), 23 deletions(-)
 rename docs/{guides/new_op => dev_guides}/kernel_primitive_api/add_example_cn.md (100%)
 rename docs/{guides/new_op => dev_guides}/kernel_primitive_api/add_example_en.md (100%)
 rename docs/{guides/new_op => dev_guides}/kernel_primitive_api/api_description_cn.rst (100%)
 rename docs/{guides/new_op => dev_guides}/kernel_primitive_api/api_description_en.rst (100%)
 rename docs/{guides/new_op => dev_guides}/kernel_primitive_api/compute_api_cn.md (100%)
 rename docs/{guides/new_op => dev_guides}/kernel_primitive_api/compute_api_en.md (100%)
 rename docs/{guides/new_op => dev_guides}/kernel_primitive_api/example_cn.rst (100%)
 rename docs/{guides/new_op => dev_guides}/kernel_primitive_api/example_en.rst (100%)
 rename docs/{guides/new_op => dev_guides}/kernel_primitive_api/functor_api_cn.md (100%)
 rename docs/{guides/new_op => dev_guides}/kernel_primitive_api/functor_api_en.md (100%)
 rename docs/{guides/new_op => dev_guides}/kernel_primitive_api/images/compute_reduce.png (100%)
 rename docs/{guides/new_op => dev_guides}/kernel_primitive_api/images/example_add.png (100%)
 rename docs/{guides/new_op => dev_guides}/kernel_primitive_api/images/example_reduce.png (100%)
 rename docs/{guides/new_op => dev_guides}/kernel_primitive_api/images/io_read_data.png (100%)
 rename docs/{guides/new_op => dev_guides}/kernel_primitive_api/images/io_read_data_broadcast.png (100%)
 rename docs/{guides/new_op => dev_guides}/kernel_primitive_api/images/io_read_data_broadcast_stride.png (100%)
 rename docs/{guides/new_op => dev_guides}/kernel_primitive_api/images/io_read_data_reduce.png (100%)
 rename docs/{guides/new_op => dev_guides}/kernel_primitive_api/images/io_read_data_stride.png (100%)
 rename docs/{guides/new_op => dev_guides}/kernel_primitive_api/images/io_write_data.png (100%)
 rename docs/{guides/new_op => dev_guides}/kernel_primitive_api/images/io_write_data_stride.png (100%)
 rename docs/{guides/new_op => dev_guides}/kernel_primitive_api/index_cn.rst (100%)
 rename docs/{guides/new_op => dev_guides}/kernel_primitive_api/index_en.rst (100%)
 rename docs/{guides/new_op => dev_guides}/kernel_primitive_api/io_api_cn.md (100%)
 rename docs/{guides/new_op => dev_guides}/kernel_primitive_api/io_api_en.md (100%)
 rename docs/{guides/new_op => dev_guides}/kernel_primitive_api/reduce_example_cn.md (100%)
 rename docs/{guides/new_op => dev_guides}/kernel_primitive_api/reduce_example_en.md (100%)
 delete mode 100644 docs/guides/new_op/index_en.rst

diff --git a/docs/dev_guides/index_cn.rst b/docs/dev_guides/index_cn.rst
index 7ada18508ad..766a84f57ea 100644
--- a/docs/dev_guides/index_cn.rst
+++ b/docs/dev_guides/index_cn.rst
@@ -11,6 +11,7 @@
 - `Git 操作指南 <./git_guides/index_cn.html>`_ : Git 操作相关说明与Paddle CI 手册。
 - `编译安装 <https://www.paddlepaddle.org.cn/documentation/docs/zh/install/compile/fromsource.html>`_ : 如何从源码编译安装Paddle。
 - `API开发指南 <./api_contributing_guides/api_contributing_guides_cn.html>`_ : API开发相关说明。
+- `Kernel Primitives API <./kernel_primitive_api/index_cn.html>`_ : 介绍 PaddlePaddle 为加快算子开发提供的 Block 级 CUDA 函数。
 - `曙光开发指南 <./sugon/index_cn.html>`_ : 曙光开发相关说明。
 - `自定义新硬件接入指南 <./custom_device_docs/index_cn.html>`_: 介绍如何通过自定义硬件功能为飞桨接入新硬件后端。
 - `文档贡献指南 <./docs_contributing_guides_cn.html>`_ : 飞桨文档贡献指南。
@@ -22,6 +23,7 @@
     style_guides_cn.md
     git_guides/index_cn.rst
     api_contributing_guides/api_contributing_guides_cn.rst
+    kernel_primitive_api/index_cn.rst
     sugon/index_cn.rst
     custom_device_docs/index_cn.rst
     docs_contributing_guides_cn.md
diff --git a/docs/dev_guides/index_en.rst b/docs/dev_guides/index_en.rst
index 6bc18c7ae8f..a85b8b295a3 100644
--- a/docs/dev_guides/index_en.rst
+++ b/docs/dev_guides/index_en.rst
@@ -6,10 +6,13 @@ We very much welcome you to participate in the construction of the paddle. The f
 
 Similarly, if you feel that this document is missing, or that the description is unclear, we also welcome you to contribute to this series of documents.
 
+- `Kernel Primitives API <./kernel_primitive_api/index_cn.html>`_ : Introduce the block-level CUDA functions provided by PaddlePaddle to speed up operator development.
 - `custom_device_docs <./custom_device_docs/index_en.html>`_ : Contribution guidelines overview.
 
 
 ..  toctree::
     :hidden:
 
+
+    kernel_primitive_api/index_en.rst
     custom_device_docs/index_en.rst
diff --git a/docs/guides/new_op/kernel_primitive_api/add_example_cn.md b/docs/dev_guides/kernel_primitive_api/add_example_cn.md
similarity index 100%
rename from docs/guides/new_op/kernel_primitive_api/add_example_cn.md
rename to docs/dev_guides/kernel_primitive_api/add_example_cn.md
diff --git a/docs/guides/new_op/kernel_primitive_api/add_example_en.md b/docs/dev_guides/kernel_primitive_api/add_example_en.md
similarity index 100%
rename from docs/guides/new_op/kernel_primitive_api/add_example_en.md
rename to docs/dev_guides/kernel_primitive_api/add_example_en.md
diff --git a/docs/guides/new_op/kernel_primitive_api/api_description_cn.rst b/docs/dev_guides/kernel_primitive_api/api_description_cn.rst
similarity index 100%
rename from docs/guides/new_op/kernel_primitive_api/api_description_cn.rst
rename to docs/dev_guides/kernel_primitive_api/api_description_cn.rst
diff --git a/docs/guides/new_op/kernel_primitive_api/api_description_en.rst b/docs/dev_guides/kernel_primitive_api/api_description_en.rst
similarity index 100%
rename from docs/guides/new_op/kernel_primitive_api/api_description_en.rst
rename to docs/dev_guides/kernel_primitive_api/api_description_en.rst
diff --git a/docs/guides/new_op/kernel_primitive_api/compute_api_cn.md b/docs/dev_guides/kernel_primitive_api/compute_api_cn.md
similarity index 100%
rename from docs/guides/new_op/kernel_primitive_api/compute_api_cn.md
rename to docs/dev_guides/kernel_primitive_api/compute_api_cn.md
diff --git a/docs/guides/new_op/kernel_primitive_api/compute_api_en.md b/docs/dev_guides/kernel_primitive_api/compute_api_en.md
similarity index 100%
rename from docs/guides/new_op/kernel_primitive_api/compute_api_en.md
rename to docs/dev_guides/kernel_primitive_api/compute_api_en.md
diff --git a/docs/guides/new_op/kernel_primitive_api/example_cn.rst b/docs/dev_guides/kernel_primitive_api/example_cn.rst
similarity index 100%
rename from docs/guides/new_op/kernel_primitive_api/example_cn.rst
rename to docs/dev_guides/kernel_primitive_api/example_cn.rst
diff --git a/docs/guides/new_op/kernel_primitive_api/example_en.rst b/docs/dev_guides/kernel_primitive_api/example_en.rst
similarity index 100%
rename from docs/guides/new_op/kernel_primitive_api/example_en.rst
rename to docs/dev_guides/kernel_primitive_api/example_en.rst
diff --git a/docs/guides/new_op/kernel_primitive_api/functor_api_cn.md b/docs/dev_guides/kernel_primitive_api/functor_api_cn.md
similarity index 100%
rename from docs/guides/new_op/kernel_primitive_api/functor_api_cn.md
rename to docs/dev_guides/kernel_primitive_api/functor_api_cn.md
diff --git a/docs/guides/new_op/kernel_primitive_api/functor_api_en.md b/docs/dev_guides/kernel_primitive_api/functor_api_en.md
similarity index 100%
rename from docs/guides/new_op/kernel_primitive_api/functor_api_en.md
rename to docs/dev_guides/kernel_primitive_api/functor_api_en.md
diff --git a/docs/guides/new_op/kernel_primitive_api/images/compute_reduce.png b/docs/dev_guides/kernel_primitive_api/images/compute_reduce.png
similarity index 100%
rename from docs/guides/new_op/kernel_primitive_api/images/compute_reduce.png
rename to docs/dev_guides/kernel_primitive_api/images/compute_reduce.png
diff --git a/docs/guides/new_op/kernel_primitive_api/images/example_add.png b/docs/dev_guides/kernel_primitive_api/images/example_add.png
similarity index 100%
rename from docs/guides/new_op/kernel_primitive_api/images/example_add.png
rename to docs/dev_guides/kernel_primitive_api/images/example_add.png
diff --git a/docs/guides/new_op/kernel_primitive_api/images/example_reduce.png b/docs/dev_guides/kernel_primitive_api/images/example_reduce.png
similarity index 100%
rename from docs/guides/new_op/kernel_primitive_api/images/example_reduce.png
rename to docs/dev_guides/kernel_primitive_api/images/example_reduce.png
diff --git a/docs/guides/new_op/kernel_primitive_api/images/io_read_data.png b/docs/dev_guides/kernel_primitive_api/images/io_read_data.png
similarity index 100%
rename from docs/guides/new_op/kernel_primitive_api/images/io_read_data.png
rename to docs/dev_guides/kernel_primitive_api/images/io_read_data.png
diff --git a/docs/guides/new_op/kernel_primitive_api/images/io_read_data_broadcast.png b/docs/dev_guides/kernel_primitive_api/images/io_read_data_broadcast.png
similarity index 100%
rename from docs/guides/new_op/kernel_primitive_api/images/io_read_data_broadcast.png
rename to docs/dev_guides/kernel_primitive_api/images/io_read_data_broadcast.png
diff --git a/docs/guides/new_op/kernel_primitive_api/images/io_read_data_broadcast_stride.png b/docs/dev_guides/kernel_primitive_api/images/io_read_data_broadcast_stride.png
similarity index 100%
rename from docs/guides/new_op/kernel_primitive_api/images/io_read_data_broadcast_stride.png
rename to docs/dev_guides/kernel_primitive_api/images/io_read_data_broadcast_stride.png
diff --git a/docs/guides/new_op/kernel_primitive_api/images/io_read_data_reduce.png b/docs/dev_guides/kernel_primitive_api/images/io_read_data_reduce.png
similarity index 100%
rename from docs/guides/new_op/kernel_primitive_api/images/io_read_data_reduce.png
rename to docs/dev_guides/kernel_primitive_api/images/io_read_data_reduce.png
diff --git a/docs/guides/new_op/kernel_primitive_api/images/io_read_data_stride.png b/docs/dev_guides/kernel_primitive_api/images/io_read_data_stride.png
similarity index 100%
rename from docs/guides/new_op/kernel_primitive_api/images/io_read_data_stride.png
rename to docs/dev_guides/kernel_primitive_api/images/io_read_data_stride.png
diff --git a/docs/guides/new_op/kernel_primitive_api/images/io_write_data.png b/docs/dev_guides/kernel_primitive_api/images/io_write_data.png
similarity index 100%
rename from docs/guides/new_op/kernel_primitive_api/images/io_write_data.png
rename to docs/dev_guides/kernel_primitive_api/images/io_write_data.png
diff --git a/docs/guides/new_op/kernel_primitive_api/images/io_write_data_stride.png b/docs/dev_guides/kernel_primitive_api/images/io_write_data_stride.png
similarity index 100%
rename from docs/guides/new_op/kernel_primitive_api/images/io_write_data_stride.png
rename to docs/dev_guides/kernel_primitive_api/images/io_write_data_stride.png
diff --git a/docs/guides/new_op/kernel_primitive_api/index_cn.rst b/docs/dev_guides/kernel_primitive_api/index_cn.rst
similarity index 100%
rename from docs/guides/new_op/kernel_primitive_api/index_cn.rst
rename to docs/dev_guides/kernel_primitive_api/index_cn.rst
diff --git a/docs/guides/new_op/kernel_primitive_api/index_en.rst b/docs/dev_guides/kernel_primitive_api/index_en.rst
similarity index 100%
rename from docs/guides/new_op/kernel_primitive_api/index_en.rst
rename to docs/dev_guides/kernel_primitive_api/index_en.rst
diff --git a/docs/guides/new_op/kernel_primitive_api/io_api_cn.md b/docs/dev_guides/kernel_primitive_api/io_api_cn.md
similarity index 100%
rename from docs/guides/new_op/kernel_primitive_api/io_api_cn.md
rename to docs/dev_guides/kernel_primitive_api/io_api_cn.md
diff --git a/docs/guides/new_op/kernel_primitive_api/io_api_en.md b/docs/dev_guides/kernel_primitive_api/io_api_en.md
similarity index 100%
rename from docs/guides/new_op/kernel_primitive_api/io_api_en.md
rename to docs/dev_guides/kernel_primitive_api/io_api_en.md
diff --git a/docs/guides/new_op/kernel_primitive_api/reduce_example_cn.md b/docs/dev_guides/kernel_primitive_api/reduce_example_cn.md
similarity index 100%
rename from docs/guides/new_op/kernel_primitive_api/reduce_example_cn.md
rename to docs/dev_guides/kernel_primitive_api/reduce_example_cn.md
diff --git a/docs/guides/new_op/kernel_primitive_api/reduce_example_en.md b/docs/dev_guides/kernel_primitive_api/reduce_example_en.md
similarity index 100%
rename from docs/guides/new_op/kernel_primitive_api/reduce_example_en.md
rename to docs/dev_guides/kernel_primitive_api/reduce_example_en.md
diff --git a/docs/guides/beginner/index_cn.rst b/docs/guides/beginner/index_cn.rst
index 71a37aae029..6de9626265a 100644
--- a/docs/guides/beginner/index_cn.rst
+++ b/docs/guides/beginner/index_cn.rst
@@ -1,5 +1,5 @@
 ###################
-模型开发
+模型开发入门
 ###################
 
 本部分将介绍飞桨框架2.0的开发流程。
diff --git a/docs/guides/index_cn.rst b/docs/guides/index_cn.rst
index e1923626b78..c20007ced31 100644
--- a/docs/guides/index_cn.rst
+++ b/docs/guides/index_cn.rst
@@ -8,7 +8,7 @@
 
 使用教程分为如下的模块：
 
-- `模型开发 <./beginner/index_cn.html>`_
+- `模型开发入门 <./beginner/index_cn.html>`_
 - `模型开发更多用法 <./advanced/index_cn.html>`_
 - `动态图转静态图 <./jit/index_cn.html>`_
 - `预测部署 <./infer/index_cn.html>`_ 
diff --git a/docs/guides/index_en.rst b/docs/guides/index_en.rst
index fc890700395..3f84a2970c7 100644
--- a/docs/guides/index_en.rst
+++ b/docs/guides/index_en.rst
@@ -16,7 +16,6 @@ Let's start with studying basic concept of PaddlePaddle:
 - `Distributed Training <./06_distributed_training/index_en.html>`_ : Introduce how the PaddlePaddle uses distributed training
 - `Performance Improving <./performance_improving/index_en.html>`_ : Introduce how to improve performance of PaddlePaddle.
 - `Model Convert <./model_convert/index_en.html>`_ : Introduce how to convert your model to PaddlePaddle.
-- `Customize OP <./new_op/index_en.html>`_ :  Introduce how to customize OP for PaddlePaddle.
 - `FLAGS <./flags/flags_en.html>`_ : Introduce the envirenment flags in paddle.
 
 ..  toctree::
@@ -29,5 +28,4 @@ Let's start with studying basic concept of PaddlePaddle:
     06_distributed_training/index_en.rst
     performance_improving/index_en.rst
     model_convert/index_en.rst
-    new_op/index_en.rst
     flags/flags_en.rst
diff --git a/docs/guides/new_op/index_cn.rst b/docs/guides/new_op/index_cn.rst
index 43701ad20f1..0c58c07093f 100644
--- a/docs/guides/new_op/index_cn.rst
+++ b/docs/guides/new_op/index_cn.rst
@@ -2,7 +2,7 @@
 自定义算子
 #############
 
-本部分将指导您如何使用飞桨的自定义算子（Operator，简称Op）机制，包括以下两类：
+介绍如何使用飞桨的自定义算子（Operator，简称Op）机制，包括以下两类：
 
 1. C++算子：编写方法较为简洁，不涉及框架内部概念，无需重新编译飞桨框架，以外接模块的方式使用的算子
 2. Python算子：使用Python编写实现前向（forward）和反向（backward）方法，在模型组网中使用的自定义API
@@ -11,12 +11,10 @@
 
 - `自定义Python算子 <./new_python_op_cn.html>`_
 
-- `Kernel Primitives API <./kernel_primitive_api/index_cn.html>`_ : 介绍 PaddlePaddle 为加快算子开发提供的 Block 级 CUDA 函数。
 
 .. toctree::
    :hidden:
 
-   op_notes_cn.md
    new_custom_op_cn.md
    new_python_op_cn.md
-   kernel_primitive_api/index_cn.rst
+
diff --git a/docs/guides/new_op/index_en.rst b/docs/guides/new_op/index_en.rst
deleted file mode 100644
index d012ad22030..00000000000
--- a/docs/guides/new_op/index_en.rst
+++ /dev/null
@@ -1,15 +0,0 @@
-###################
-Write New Operators
-###################
-
-This section will guide you on how to use the custom operator mechanism of Paddle, including the following two categories:
-
-1. C++ operator: The writing method is relatively simple, does not involve the internal concept of the framework, does not need to recompile the paddle framework, and is used as an external module.
-2. Python operator: use Python to implement forward and backward methods, then used in network.
-
-- `Kernel Primitives API <./kernel_primitive_api/index_en.html>`_ : Introduce the block-level CUDA functions provided by PaddlePaddle to speed up operator development.
-
-.. toctree::
-   :hidden:
-
-   kernel_primitive_api/index_en.rst

From e758f33349d0af3e58751eb8d1a991e5be1353ab Mon Sep 17 00:00:00 2001
From: TCChenlong <1300851984@qq.com>
Date: Fri, 13 May 2022 16:28:42 +0800
Subject: [PATCH 11/11] update release note

---
 docs/release_note_cn.md | 269 ++++++++++++++++++++++++++++++++++------
 docs/release_note_en.md | 257 ++++++++++++++++++++++++++++++++------
 2 files changed, 450 insertions(+), 76 deletions(-)

diff --git a/docs/release_note_cn.md b/docs/release_note_cn.md
index 3932d35dc0f..7a3df48682f 100644
--- a/docs/release_note_cn.md
+++ b/docs/release_note_cn.md
@@ -1,9 +1,9 @@
 
-# 2.3.0-rc0 Release Note
+# 2.3.0 Release Note
 
 ## 1. 重要更新
 
-我们很高兴地发布飞桨框架 2.3.0-rc0 版本，本版本包含如下重要更新。
+我们很高兴地发布飞桨框架 2.3.0 版本，本版本包含如下重要更新。
 
 ### API
 
@@ -33,7 +33,7 @@
 
 ### 编译安装
 
-- 从 2.3.0-rc0 版本开始，飞桨对框架支持的 GPU 架构种类进行了调整和升级。
+- 从 2.3.0 版本开始，飞桨对框架支持的 GPU 架构种类进行了调整和升级。
 
 ### 推理部署
 
@@ -53,6 +53,8 @@
 
 ## 2. 不兼容升级
 
+- 预编译安装包中移除CUDA sm35 ARCH： 受到包体积大小的影响，在预编译的安装包中移除了 CUDA sm35 架构。 ([#41754](https://github.com/PaddlePaddle/Paddle/pull/41754))
+
 - `paddle.to_tensor` 将一个 python int scalar 转换为 Tensor 时，在 Windows 上的默认数据类型由 int32 变为 int64，从而与 Linux/Mac 保持对齐。([#39662](https://github.com/PaddlePaddle/Paddle/pull/39662)) 
 
 - 为了与 python3 下的除法行为保持一致，除法符号 `/` 从 rounding divide 变成 true divide，计算输出结果的数据类型从 int 切换成 float。 ([#40890](https://github.com/PaddlePaddle/Paddle/pull/40890)) 
@@ -63,7 +65,7 @@
 2.2
 </th>
 <th>
-2.3.0-rc0
+2.3.0
 </th>
 </tr>
 
@@ -105,7 +107,7 @@ Tensor(shape=[1], dtype=float32, place=Place(gpu:0), stop_gradient=True,
 2.2
 </th>
 <th>
-2.3.0-rc0
+2.3.0
 </th>
 </tr>
 
@@ -402,7 +404,7 @@ AssertionError: elu_ only support alpha >= 0, please use elu instead.
 
 - 新增学习率类 API
   
-  - 新增 `paddle.optimizer.lr.MultiplicativeDecay`，提供 `lambda` 函数设置学习率的策略。([#38250](https://github.com/PaddlePaddle/Paddle/pull/38250))
+  - 新增 `paddle.optimizer.lr.MultiplicativeDecay`，提供 `lambda` 函数设置学习率的策略。([#38250](https://github.com/PaddlePaddle/Paddle/pull/38250))
 
 - 新增分布式相关 API
   
@@ -416,6 +418,12 @@ AssertionError: elu_ only support alpha >= 0, please use elu instead.
 
 - 新增 `paddle.incubate.multiprocessing`模块，支持 Tensor（CPU/GPU）在 python 进程间传输。([#37302](https://github.com/PaddlePaddle/Paddle/pull/37302), [#41339](https://github.com/PaddlePaddle/Paddle/pull/41339)) 
 
+- 新增 `paddle.incubate.autotune.set_config` API，支持多版本 Kernel 自动选择、混合精度数据布局自动转换、DataLoader 的 num_workers 自动选择，以自动提升模型性能。([#42301](https://github.com/PaddlePaddle/Paddle/pull/42301))
+
+- 新增 `paddle.incubate.nn.FusedMultiTransformer` 和 `paddle.incubate.nn.functional.fused_multi_transformer` API，可将多层 transformer 融合到一个 op 中，提升模型推理性能，注意：仅支持前向推理。([#42311](https://github.com/PaddlePaddle/Paddle/pull/42311))
+
+- 新增动静统一的 einsum_v2 op，兼容原有 python 端 `paddle.einsum` 实现的同时支持动转静导出和更加完备的 Infershape 推导。([#42495](https://github.com/PaddlePaddle/Paddle/pull/42495), [#42327](https://github.com/PaddlePaddle/Paddle/pull/42327), [#42397](https://github.com/PaddlePaddle/Paddle/pull/42397), [#42105](https://github.com/PaddlePaddle/Paddle/pull/42105))
+
 #### IR(Intermediate Representation)
 
 - 动态图转静态图
@@ -503,7 +511,7 @@ AssertionError: elu_ only support alpha >= 0, please use elu instead.
 - **自定义算子机制与 Phi 整合并完善**：支持在自定义算子编写时调用 Phi 自动生成的200余个C++运算类 API，降低自定义算子开发成本，并进行一系列问题修复。([#37122](https://github.com/PaddlePaddle/Paddle/pull/37122), [#37276](https://github.com/PaddlePaddle/Paddle/pull/37276), [#37281](https://github.com/PaddlePaddle/Paddle/pull/37281), [#37262](https://github.com/PaddlePaddle/Paddle/pull/37281), [#37415](https://github.com/PaddlePaddle/Paddle/pull/37415), [#37423](https://github.com/PaddlePaddle/Paddle/pull/37423), [#37583](https://github.com/PaddlePaddle/Paddle/pull/37683), [#38776](https://github.com/PaddlePaddle/Paddle/pull/38776), [#39353](https://github.com/PaddlePaddle/Paddle/pull/39353), [#41072](https://github.com/PaddlePaddle/Paddle/pull/41072))
 
 - **算子规模化迁移改写**：迁移了约250个高频算子的前、反向算子内核 Kernel 至新算子库，改写为函数式，支持在 C++端通过调用多个基础 Kernel 函数封装，快速组合实现高性能算子；同时，添加相应的 yaml 算子定义，并接入新动态图执行体系，提升 python API 调度性能。迁移改写的算子包括：
-  
+
   - sqrt （[#40727](https://github.com/PaddlePaddle/Paddle/pull/40727)）
   
   - square（[#40727](https://github.com/PaddlePaddle/Paddle/pull/40727)）
@@ -1006,6 +1014,8 @@ AssertionError: elu_ only support alpha >= 0, please use elu instead.
   
   - hard_sigmoid ([#40626](https://github.com/PaddlePaddle/Paddle/pull/40626))
 
+  - exp, det, assign, gaussian_random, matrix_rank, eye, deformable_conv。([#41755]exp, det, assign, gaussian_random, matrix_rank, eye, deformable_conv。([#41755](https://github.com/PaddlePaddle/Paddle/pull/41755), [#41737](https://github.com/PaddlePaddle/Paddle/pull/41737)
+
 #### 新动态图执行机制
 
 针对飞桨原动态图执行机制的调度性能、二次开发能力差的问题，我们重构了动态图的底层执行机制。通过全新的调用执行方式，配合 Phi 算子库进行高效的运行时执行，对于 Phi 算子库支持的算子，切换到新动态图模式能体验到调度性能有较大幅度的提升。但是由于整体框架执行机制升级的工作量巨大，且该部分工作耦合了大量 Phi 算子库的工作， 因此在这个版本下我们仍未默认使用该执行方式。如果想要试用可以通过设置环境变量 `FLAGS_enable_eager_mode=1` 来切换使用。具体包括如下内容：
@@ -1034,28 +1044,38 @@ AssertionError: elu_ only support alpha >= 0, please use elu instead.
 
 - **动态图重构后支持 inplace 策略**：输入与输出为同一个 Tensor。
   
-  - - 为动态图重构中间态适配 inplace 策略。([#40400](https://github.com/PaddlePaddle/Paddle/pull/40400))
-    
-    - 为动态图重构最终态适配 inplace 策略。([#40695](https://github.com/PaddlePaddle/Paddle/pull/40695))
+  - 为动态图重构中间态适配 inplace 策略。([#40400](https://github.com/PaddlePaddle/Paddle/pull/40400))
     
-    - 动态图重构后，为 PyLayer 功能添加 inplace 策略。([#41043](https://github.com/PaddlePaddle/Paddle/pull/41043))
-    
-    - 动态图重构后，为 Tensor 的 setitem 功能添加 inplace 策略。([#40915](https://github.com/PaddlePaddle/Paddle/pull/40915))
-    
-    - 动态图重构后添加`_reset_grad_inplace_version`接口，将 Tensor 的梯度的 inplace version 置为0。([#41101](https://github.com/PaddlePaddle/Paddle/pull/41101))
-    
-    - 反向计算过程中如果不需要前向 Tensor 的值（no need buffer 属性），则不需要对该 Tensor 进行 inplace version 的检测操作。 为 no_need_buffer 的 Tensor 跳过 inplace version 的检查。([#41350](https://github.com/PaddlePaddle/Paddle/pull/41350))
-    
-    - 统一动态图重构后与重构前对 inplace version 检查的报错信息。([#41209](https://github.com/PaddlePaddle/Paddle/pull/41209))
+  - 为动态图重构最终态适配 inplace 策略。([#40695](https://github.com/PaddlePaddle/Paddle/pull/40695))
+
+  - 动态图重构后，为 PyLayer 功能添加 inplace 策略。([#41043](https://github.com/PaddlePaddle/Paddle/pull/41043))
+
+  - 动态图重构后，为 Tensor 的 setitem 功能添加 inplace 策略。([#40915](https://github.com/PaddlePaddle/Paddle/pull/40915))
+
+  - 动态图重构后添加`_reset_grad_inplace_version`接口，将 Tensor 的梯度的 inplace version 置为0。([#41101](https://github.com/PaddlePaddle/Paddle/pull/41101))
+
+  - 反向计算过程中如果不需要前向 Tensor 的值（no need buffer 属性），则不需要对该 Tensor 进行 inplace version 的检测操作。 为 no_need_buffer 的 Tensor 跳过 inplace version 的检查。([#41350](https://github.com/PaddlePaddle/Paddle/pull/41350))
+
+  - 统一动态图重构后与重构前对 inplace version 检查的报错信息。([#41209](https://github.com/PaddlePaddle/Paddle/pull/41209))
 
 - **动态图重构后支持 view 策略**：输入与输出 Tensor 共享底层数据。
   
-  - - 为动态图重构中间态适配 view 机制。包括`reshape`、`squeeze`、`unsqueeze`、`flatten` API。([#40830](https://github.com/PaddlePaddle/Paddle/pull/40830))
+  - 为动态图重构中间态适配 view 机制。包括`reshape`、`squeeze`、`unsqueeze`、`flatten` API。([#40830](https://github.com/PaddlePaddle/Paddle/pull/40830))
     
-    - 为动态图重构最终态适配 view 机制。包括`reshape` API。([#40891](https://github.com/PaddlePaddle/Paddle/pull/40891))
+  - 为动态图重构最终态适配 view 机制。包括`reshape` API。([#40891](https://github.com/PaddlePaddle/Paddle/pull/40891))
 
-#### 全新静态图执行器
+- **添加支持新动态图 eager Tensor 在 python 端的 weakref**。([#41797](https://github.com/PaddlePaddle/Paddle/pull/41797))
+
+- **增强新动态图 DoubleGrad 功能**，支持基础的 DoubleGrad 功能。([#41893](https://github.com/PaddlePaddle/Paddle/pull/41893), [#41894](https://github.com/PaddlePaddle/Paddle/pull/41894), [#41895](https://github.com/PaddlePaddle/Paddle/pull/41895))
+
+- **新增 `core.eager.StringTensor` 接口**，支持在 python 端构造 StringTensor 以及使用 StringTensor 相关 API。([#41039](https://github.com/PaddlePaddle/Paddle/pull/41039))
+
+- **为 `core.eager.Tensor` 新增 `*grad_name` 和 `_grad_value` API**，返回梯度的名称和值。([#41990](https://github.com/PaddlePaddle/Paddle/pull/41990))
+
+- **为动态图中间态添加对 no_need_buffer 属性的处理**。在 inplace 反向检查操作中，会跳过具有 no_need_buffer 属性的 Tensor 的检查。([#41720](https://github.com/PaddlePaddle/Paddle/pull/41720))
 
+
+#### 全新静态图执行器
 为了解决飞桨原静态图执行器在部分场景下调度性能不够理想，不便于扩展多 stream 等问题，我们实现了全新的性能优越，易于扩展的静态图执行器，充分利用了多 stream、多线程的异步调度能力。新执行器相当于原执行器是兼容升级，目前已在单机单卡场景下默认使用，用户不需要在训练代码中做任何修改即可自动使用。当然，我们也提供了接口来切换回原执行器，用户可以通过设置环境变量 `FLAGS_USE_STANDALONE_EXECUTOR=false` 来切换回原执行器。([#41179](https://github.com/PaddlePaddle/Paddle/pull/41179)) 主要内容如下：
 
 - 基础组件：用于执行器中多线程算子调度的高性能线程池 ([#35470](https://github.com/PaddlePaddle/Paddle/pull/35470), [#35930](https://github.com/PaddlePaddle/Paddle/pull/35930), [#36030](https://github.com/PaddlePaddle/Paddle/pull/36030), [#36480](https://github.com/PaddlePaddle/Paddle/pull/36480), [#36688](https://github.com/PaddlePaddle/Paddle/pull/36688), [#36740](https://github.com/PaddlePaddle/Paddle/pull/36740), [#38335](https://github.com/PaddlePaddle/Paddle/pull/38335), [#40770](https://github.com/PaddlePaddle/Paddle/pull/40770)) 及线程协同组件 ([#38779](https://github.com/PaddlePaddle/Paddle/pull/38779), [#40876](https://github.com/PaddlePaddle/Paddle/pull/40876), [#40912](https://github.com/PaddlePaddle/Paddle/pull/40912))，算子执行后及时地显存回收 ([#37642](https://github.com/PaddlePaddle/Paddle/pull/37642), [#39617](https://github.com/PaddlePaddle/Paddle/pull/39617), [#40859](https://github.com/PaddlePaddle/Paddle/pull/40859))，并行执行器新依赖分析算法 ([#37231](https://github.com/PaddlePaddle/Paddle/pull/37231)) 等。
@@ -1066,6 +1086,11 @@ AssertionError: elu_ only support alpha >= 0, please use elu instead.
 
 - 增强多线程场景下调试和报错功能，将子线程的报错捕获到主线程中统一抛出，以提升用户体验。([#36692](https://github.com/PaddlePaddle/Paddle/pull/36692)，[#36802](https://github.com/PaddlePaddle/Paddle/pull/36802))
 
+- 修复新执行器通信流重置 Allocator 中 stream 缓存信息的问题，减少跨 stream 场景下的 RecordStream 开销，优化后 DeepFM 模型性能提升约8%。([#42046](https://github.com/PaddlePaddle/Paddle/pull/42046))
+
+- 优化新执行器算子间的依赖分析方法，提升运行性能；为 send/recv 通信算子建立正确依赖以支持流水线并行。([#42009](https://github.com/PaddlePaddle/Paddle/pull/42009))
+
+
 #### 分布式训练
 
 - 集合通信多机多卡训练基础功能
@@ -1147,6 +1172,8 @@ AssertionError: elu_ only support alpha >= 0, please use elu instead.
   - 统一参数服务器下，重构通信、存储等各个模块基类，提升各个模块的易二次开发性。([#41207](https://github.com/PaddlePaddle/Paddle/pull/41207), [#41022](https://github.com/PaddlePaddle/Paddle/pull/41022), [#40702](https://github.com/PaddlePaddle/Paddle/pull/40702), [#39341](https://github.com/PaddlePaddle/Paddle/pull/39341) [#39377](https://github.com/PaddlePaddle/Paddle/pull/39377), [#39191](https://github.com/PaddlePaddle/Paddle/pull/39191), [#39064](https://github.com/PaddlePaddle/Paddle/pull/39064))
   
   - 统一参数服务器下，新增评估指标模块，支持 AUC/WuAUC/MaskAuc 等评估指标计算及可自定义扩展。 ([#38789](https://github.com/PaddlePaddle/Paddle/pull/38789)) 
+  
+  - 支持在昆仑2芯片上的 XPU 参数服务器训练。 ([#41917](https://github.com/PaddlePaddle/Paddle/pull/41917), [#42266](https://github.com/PaddlePaddle/Paddle/pull/42266), [#41916](https://github.com/PaddlePaddle/Paddle/pull/41916))
 
 #### Profiler
 
@@ -1182,6 +1209,90 @@ AssertionError: elu_ only support alpha >= 0, please use elu instead.
   
   - Profiler 支持分级。（[#39926](https://github.com/PaddlePaddle/Paddle/pull/39926)）
 
+- 修改新动态图下 op 的打点名称和类型。（[#41771](https://github.com/PaddlePaddle/Paddle/pull/41771/)
+
+- 添加 Kernel 表单，以及优化表单内容的展示方式。 （[#41989](https://github.com/PaddlePaddle/Paddle/pull/41989)
+
+- 消除 Profiler 关闭情况下对模型前向计算造成性能下降的影响。（[#42142](https://github.com/PaddlePaddle/Paddle/pull/42142)）
+
+#### CINN 编译器接入
+
+飞桨的编译器功能在逐步丰富中，针对 CINN（[GitHub - PaddlePaddle/CINN: Compiler Infrastructure for Neural Networks](https://github.com/PaddlePaddle/CINN)） 的变更，Paddle 侧接入也进行了相对应的更改，以适配编译器 CINN 的功能。其中主要包括增加Paddle-CINN 运行流程的子图管理相关功能，显存和速度性能的优化、开发过程发现的 bug 修复。
+
+- 功能开发：
+  
+  - 子图 op 相关：
+    
+    - 添加从计算图中找到并生成 CINN 子图的功能。（[#36345](https://github.com/PaddlePaddle/Paddle/pull/36345)）
+    
+    - 新增 cinn_launch op 作为运行时接入 CINN 的入口，负责调度 CINN 对子图进行编译、初始化数据空间、调度生成 Kernel 的执行。（[#36600](https://github.com/PaddlePaddle/Paddle/pull/36600)）
+    
+    - 为 cinn_launch op 的 Kernel 实现添加辅助类 CinnLaunchContext 管理子图编译、运行的中间数据，提升可扩展性和代码可读性。（[#37938](https://github.com/PaddlePaddle/Paddle/pull/37938)）
+    
+    - 为 CINN 子图添加额外的 fetch 结点，从而保证 CINN 外部结点能取到待fetch变量的值。（[#37172](https://github.com/PaddlePaddle/Paddle/pull/37172), [#37190](https://github.com/PaddlePaddle/Paddle/pull/37190)）
+    
+    - 添加对 CINN 子图符号化的功能，符号化用于拓扑排序子图并返回 CINN 执行序列。（[#36417](https://github.com/PaddlePaddle/Paddle/pull/36417)
+    
+    - 新增 CinnCompiler 类，用于调用 CINN 编译模型中可使用 CINN 算子替换的子图。 （[#36562](https://github.com/PaddlePaddle/Paddle/pull/36562), [#36975](https://github.com/PaddlePaddle/Paddle/pull/36975)）
+    
+    - 为 CINN 符号化类新增获取子图 fetch 变量名的接口，防止编译优化中将 fetch 变量融合消除。（[#37218](https://github.com/PaddlePaddle/Paddle/pull/37218)）
+  
+  - 程序开发检查、debug、API 变更相关：
+    
+    - 同步更新 CINN 中 NetBuilder API 名称的变化。（[#40392](https://github.com/PaddlePaddle/Paddle/pull/40392)）
+    
+    - 为 Paddle-CINN 添加必要的用于 debug 的日志信息。（[#36867](https://github.com/PaddlePaddle/Paddle/pull/36867)）
+    
+    - 添加 Paddle desc 与 CINN desc 互转函数。（[#36100](https://github.com/PaddlePaddle/Paddle/pull/36100)）
+    
+    - 相比 Paddle，CINN 中实现的算子可能存在未使用到某些输入变量，因此在 cinn_launch op 中去除对输入变量必须被使用的检查。（[#37119](https://github.com/PaddlePaddle/Paddle/pull/37119)）
+    
+    - 新增 cinn_instruction_run op 用于调用 CINN 执行单个生成指令，便于 Paddle 侧构建 Graph 调度运行子图。（[#39435](https://github.com/PaddlePaddle/Paddle/pull/39435), [#39576](https://github.com/PaddlePaddle/Paddle/pull/39576)）
+    
+    - 在 Paddle 中添加编译 CINN 所需的 CUDA/CUBLAS/MKL/CINN pass 应用等控制宏。（[#37066](https://github.com/PaddlePaddle/Paddle/pull/37066), [#36660](https://github.com/PaddlePaddle/Paddle/pull/36660)）
+    
+    - 增加 FLAGS_allow_cinn_ops 和 FLAGS_deny_cinn_ops 两个控制标记，用于控制 Paddle 训练中使用 CINN 算子代替原生算子的种类。（[#36842](https://github.com/PaddlePaddle/Paddle/pull/36842)）
+
+- 性能优化：
+  
+  - 速度优化
+    
+    - 优化 CinnCacheKey 的计算耗时。（[#37786](https://github.com/PaddlePaddle/Paddle/pull/37786), [#37317](https://github.com/PaddlePaddle/Paddle/pull/37317)）
+    
+    - 缓存 CINN 编译子图的变量 scope，降低运行参数构造开销。（[#37983](https://github.com/PaddlePaddle/Paddle/pull/37983)）
+    
+    - 子图编译时接入 CINN 自动调优，支持通过 flag 启用，便于后续进一步调优训练性能。（[#41795](https://github.com/PaddlePaddle/Paddle/pull/41795)）
+    
+    - 重构子图编译时对编译结果的正确性校验，避免运行时重复检查，降低调度开销。（[#41777](https://github.com/PaddlePaddle/Paddle/pull/41777)）
+    
+    - 在 Paddle-CINN 训练功能中默认启用 TransposeFolding 和 GemmRewriter 优化 pass。（[#41084](https://github.com/PaddlePaddle/Paddle/pull/41084)）
+    
+    - 将 Paddle 中创建的 cuda stream 传入 CINN，使得 Paddle 和 CINN 执行计算时共用同一个 CUDA stream。（[#37337](https://github.com/PaddlePaddle/Paddle/pull/37337)）
+    
+    - 将 CINN 优化 pass 应用逻辑从 Paddle 中移动到 CINN 中。（[#42047](https://github.com/PaddlePaddle/Paddle/pull/42047), [#42070](https://github.com/PaddlePaddle/Paddle/pull/42070)）
+  
+  - 显存优化
+    
+    - 为 cinn_launch op 添加 NoNeedBufferVars 声明无须 buffer 的输入变量列表，以便显存优化提前释放无效空间。（[#38367](https://github.com/PaddlePaddle/Paddle/pull/38367)）
+    
+    - 传入子图外部变量的引用计数信息，便于 cinn_launch 内子图复用显存优化 pass，降低使用 CINN 的显存开销。（[#39209](https://github.com/PaddlePaddle/Paddle/pull/39209), [#39622](https://github.com/PaddlePaddle/Paddle/pull/39622)）
+    
+    - 添加 CINN 编译生成的可执行指令集合转换为 Paddle Graph 的功能，支持复用 Paddle 调度器及显存优化 pass，进一步降低使用 CINN 的显存开销。（[#39724](https://github.com/PaddlePaddle/Paddle/pull/39724), [#39911](https://github.com/PaddlePaddle/Paddle/pull/39911)）
+    
+    - 添加 cinn_instruction_run op 的 Kernel 支持根据编译结果推断的数据类型动态申请空间。（[#40920](https://github.com/PaddlePaddle/Paddle/pull/40920)）
+
+- 问题修复：
+  
+  - 修复并优化 CINN 子图的生成逻辑。（[#36503](https://github.com/PaddlePaddle/Paddle/pull/36503)）
+  
+  - 修复 Paddle-CINN 不支持无输入子图的问题。（[#40814](https://github.com/PaddlePaddle/Paddle/pull/40814)）
+  
+  - 修复由于 CINN 无法处理 batch_norm 等算子中存在的无用输出而报错的问题。（[#36996](https://github.com/PaddlePaddle/Paddle/pull/36996)）
+  
+  - 修复若干 CINN 子图划分以及符号化中存在的 bug，解决 Paddle 训练接入 CINN 全流程打通过程中遇到的问题。 ([#36739](https://github.com/PaddlePaddle/Paddle/pull/36739), [#36698](https://github.com/PaddlePaddle/Paddle/pull/36698) )
+  
+  - CINN 尚不支持控制流，添加遇控制流跳过的逻辑。（[#40812](https://github.com/PaddlePaddle/Paddle/pull/40812)）
+
 #### 其他
 
 - 模型量化
@@ -1274,6 +1385,10 @@ AssertionError: elu_ only support alpha >= 0, please use elu instead.
   
   - LayerNorm ([#40418](https://github.com/PaddlePaddle/Paddle/pull/40418))
 
+- 增加基于 SSD-内存-GPU显存 的3级存储图检索引擎，支持大规模图神经网络训练。([#42472](https://github.com/PaddlePaddle/Paddle/pull/42472), [#42321](https://github.com/PaddlePaddle/Paddle/pull/42321), [#42027](https://github.com/PaddlePaddle/Paddle/pull/42027))
+
+- 增加异构多云训练通信模块 switch，实现 Send/Recv 接口，支持多云异构通信。（[#40965](https://github.com/PaddlePaddle/Paddle/pull/40965) [40911](https://github.com/PaddlePaddle/Paddle/pull/40911)）
+
 ### （2）功能优化
 
 #### API
@@ -1346,7 +1461,13 @@ AssertionError: elu_ only support alpha >= 0, please use elu instead.
 
 - 完善`paddle.amp.GradScaler`调用 check_finite_and_unscale op 的逻辑，消除该处创建 bool 变量所引入的 cudaMemcpy。([#37770](https://github.com/PaddlePaddle/Paddle/pull/37770))
 
-- 新增对 unstack 和 unique op 元素个数为0的 Tensor 增加检查。([#36021](https://github.com/PaddlePaddle/Paddle/pull/36021)) 
+- 新增对 unstack 和 unique op 元素个数为0的 Tensor 增加检查。([#36021](https://github.com/PaddlePaddle/Paddle/pull/36021))
+
+- 新增支持昆仑2的多层、双向 LSTM 功能，完善 RNN 前反向 op，支持时序类模型训练使用。([#](https://github.com/PaddlePaddle/Paddle/pull/41781)[42076](https://github.com/PaddlePaddle/Paddle/pull/42076))
+
+- 新增支持昆仑2的 bce_loss 前反向 op。([#41610](https://github.com/PaddlePaddle/Paddle/pull/41610))
+
+- 添加 `paddle.linalg.det` 的反向实现。([#36013](https://github.com/PaddlePaddle/Paddle/pull/36013))
 
 #### IR(Intermediate Representation)
 
@@ -1408,7 +1529,23 @@ AssertionError: elu_ only support alpha >= 0, please use elu instead.
   
   - 利用模型中 embedding op 的拓扑关系，优化 embedding op 的合并逻辑以提升性能。[(#35942)](https://github.com/PaddlePaddle/Paddle/pull/35942) 
 
-- 通信库：重构通信库，提升通信库的易扩展性和二次开发性，支持异构通信。 ([#41398](https://github.com/PaddlePaddle/Paddle/pull/41398), [#39720](https://github.com/PaddlePaddle/Paddle/pull/39720), [#40911](https://github.com/PaddlePaddle/Paddle/pull/40911), [#40579](https://github.com/PaddlePaddle/Paddle/pull/40579), [#40629](https://github.com/PaddlePaddle/Paddle/pull/40629), [#40437](https://github.com/PaddlePaddle/Paddle/pull/40437), [#40430](https://github.com/PaddlePaddle/Paddle/pull/40430), [#40228](https://github.com/PaddlePaddle/Paddle/pull/40228), [#40181](https://github.com/PaddlePaddle/Paddle/pull/40181), [#40100](https://github.com/PaddlePaddle/Paddle/pull/40100), [#40097](https://github.com/PaddlePaddle/Paddle/pull/40097), [#39892](https://github.com/PaddlePaddle/Paddle/pull/39892), [#39384](https://github.com/PaddlePaddle/Paddle/pull/39384), [#39737](https://github.com/PaddlePaddle/Paddle/pull/39737), [#40040](https://github.com/PaddlePaddle/Paddle/pull/40040)) 
+- 通信库：重构通信库，提升通信库的易扩展性和二次开发性，支持异构通信。 ([#41398](https://github.com/PaddlePaddle/Paddle/pull/41398), [#39720](https://github.com/PaddlePaddle/Paddle/pull/39720), [#40911](https://github.com/PaddlePaddle/Paddle/pull/40911), [#40579](https://github.com/PaddlePaddle/Paddle/pull/40579), [#40629](https://github.com/PaddlePaddle/Paddle/pull/40629), [#40437](https://github.com/PaddlePaddle/Paddle/pull/40437), [#40430](https://github.com/PaddlePaddle/Paddle/pull/40430), [#40228](https://github.com/PaddlePaddle/Paddle/pull/40228), [#40181](https://github.com/PaddlePaddle/Paddle/pull/40181), [#40100](https://github.com/PaddlePaddle/Paddle/pull/40100), [#40097](https://github.com/PaddlePaddle/Paddle/pull/40097), [#39892](https://github.com/PaddlePaddle/Paddle/pull/39892), [#39384](https://github.com/PaddlePaddle/Paddle/pull/39384), [#39737](https://github.com/PaddlePaddle/Paddle/pull/39737), [#40040](https://github.com/PaddlePaddle/Paddle/pull/40040))
+
+- 支持 `paddle.incubate.distributed.models.moe`中 MoE 相关接口(`moe.GShardGate`, `moe.BaseGate`, `moe.SwitchGate`, `moe.MoELayer`, `moe.ClipGradForMOEByGlobalNorm` )的公开。([#42300](https://github.com/PaddlePaddle/Paddle/pull/42300))
+
+- 修复 `paddle.incubate.distributed.models.moe.MoELayer` 中使用 recomputing 可能报错的问题。([#42128](https://github.com/PaddlePaddle/Paddle/pull/42128))
+
+- 修复新动态图流水线并行因为数据类型不同导致的报错 ([#41937](https://github.com/PaddlePaddle/Paddle/pull/41937) [#42053](https://github.com/PaddlePaddle/Paddle/pull/42053))
+
+- 修复新动态图张量模型并行因为数据类型不同导致的报错（[#41960](https://github.com/PaddlePaddle/Paddle/pull/41960)）
+
+#### 自定义算子
+
+- 增强 C++自定义算子机制对二阶反向算子编写功能，支持为二阶反向算子的梯度输入变量添加后缀作为输出使用。([#41781](https://github.com/PaddlePaddle/Paddle/pull/41781)) 
+
+- 移除 Tensor API 成员方法中对废弃的枚举类型 PlaceType 的使用，进行相应兼容处理，并添加 deprecated warning 提示。([#41882](https://github.com/PaddlePaddle/Paddle/pull/41882)) 
+
+- 为原 Tensor API 的一系列废弃接口，包括不完整构造函数、reshape、mutable_data、copy_to 方法添加 deprecated warning 提示。([#41882](https://github.com/PaddlePaddle/Paddle/pull/41882)) 
 
 #### 其他
 
@@ -1496,6 +1633,26 @@ AssertionError: elu_ only support alpha >= 0, please use elu instead.
 
 - 优化 `Categorical`的 `probs`计算，简化计算逻辑，性能提升 4 ~ 5 倍。([#42178](https://github.com/PaddlePaddle/Paddle/pull/42178)) 
 
+- `paddle.sum` 性能优化，性能相比优化前提升约20%。([#42309](https://github.com/PaddlePaddle/Paddle/pull/42309))
+
+#### 自动调优
+
+新增训练全流程硬件感知性能自动调优功能，在图像分类、分割、检测和图像生成任务上与模型默认参数配置下的性能相比提升约3%～50%以上。通过 `paddle.incubate.autotune.set_config` API设置自动调优状态，当前默认关闭。自动调优具体包括三个层次：
+
+- `paddle.io.DataLoader` 新增自动调优功能，根据训练数据和设备资源选择最佳的模型 num_workers。 ([#42004](https://github.com/PaddlePaddle/Paddle/pull/42004))
+
+- 新增混合精度训练数据布局自动调优功能，根据设备类型和数据类型选择最佳数据布局，并在运行时自动转换。([#41964](https://github.com/PaddlePaddle/Paddle/pull/41964))
+
+- 新增 Conv 运行时所需 workspace size 阈值自动调整功能，根据 GPU 当前可申请显存资源情况来自动设置；基于通用的 AlgorithmCache 设计和 Kernel 计时组件，新增 Conv cuDNN 算法自动选择功能，支持数据变长模型。（[#41833](https://github.com/PaddlePaddle/Paddle/pull/41833)）
+
+#### 调度优化
+
+- 移除 `paddle.nn.ClipGradByGlobalNorm` 中的 CudaStreamSync 隐藏操作，减少执行时的调度开销，在 ptb 模型上有5%的性能提升。([#42170](https://github.com/PaddlePaddle/Paddle/pull/42170))
+
+- 优化一系列底层数据结构及原动态图执行体系中的细节实现，提升原动态图的调度性能。([#42010](https://github.com/PaddlePaddle/Paddle/pull/42010), [#42171](https://github.com/PaddlePaddle/Paddle/pull/42171), [#42224](https://github.com/PaddlePaddle/Paddle/pull/42224), [#42256](https://github.com/PaddlePaddle/Paddle/pull/42256), [#42306](https://github.com/PaddlePaddle/Paddle/pull/42306), [#42329](https://github.com/PaddlePaddle/Paddle/pull/42329)[, #42340](https://github.com/PaddlePaddle/Paddle/pull/42340), [#42368](https://github.com/PaddlePaddle/Paddle/pull/42368), [#42425](https://github.com/PaddlePaddle/Paddle/pull/42425)）
+
+- 简化 `paddle.distribution.Categorical`的 probs 计算逻辑，提升性能 4 到 5 倍。 ([#42178](https://github.com/PaddlePaddle/Paddle/pull/42178))
+
 ### （4）问题修复
 
 #### API
@@ -1548,7 +1705,7 @@ AssertionError: elu_ only support alpha >= 0, please use elu instead.
 
 - 修复高阶微分 `gradients` 接口在指定 target_grad 时未按预期生效的问题。([#40940](https://github.com/PaddlePaddle/Paddle/pull/40940/)) 
 
-- 修复动态图 op`_BatchNormBase` 基类中修改了 default_dtype，导致后续组网参数类型错误的问题，受影响的API有 `paddle.nn.BatchNorm1D`，`paddle.nn.BatchNorm2D`，`paddle.nn.BatchNorm3D`，`paddle.nn.SyncBatchNorm`。具体原因是当 `get_default_dtype() == 'float16'` 时，通过 `set_default_dtype('float32')`修改默认参数数据类型，动态图组网的参数类型是通过 default_dtype 来创建的，因此当默认参数类型被修改后导致后续的组网参数类型错误。 ([#36376](https://github.com/PaddlePaddle/Paddle/pull/36376)) 
+- 修复动态图 op`_BatchNormBase` 基类中修改了 default_dtype，导致后续组网参数类型错误的问题，受影响的API有 `paddle.nn.BatchNorm1D`，`paddle.nn.BatchNorm2D`，`paddle.nn.BatchNorm3D`，`paddle.nn.SyncBatchNorm`。具体原因是当 `get_default_dtype() == 'float16'` 时，通过 `set_default_dtype('float32')`修改默认参数数据类型，动态图组网的参数类型是通过 default_dtype 来创建的，因此当默认参数类型被修改后导致后续的组网参数类型错误。 ([#36376](https://github.com/PaddlePaddle/Paddle/pull/36376)) 
 
 - 修复 batchnorm op 中，当数据类型为 FP32 ，且数据维度 `dims = 2，data_layout = NHWC` 时，反向 op 内中间变量未定义问题。 ([#37020](https://github.com/PaddlePaddle/Paddle/pull/37020)) 
 
@@ -1612,7 +1769,29 @@ AssertionError: elu_ only support alpha >= 0, please use elu instead.
 
 - 修复 `paddle.ifftshit` , `paddle.fftshift` 计算错误问题。([#36834](https://github.com/PaddlePaddle/Paddle/pull/36834), [#36748](https://github.com/PaddlePaddle/Paddle/pull/36748)) 
 
-- 修复 `paddle.fft` 系列 API 中的 `axis` 计算错误。 ([#36321](https://github.com/PaddlePaddle/Paddle/pull/36321)) 
+- 修复 `paddle.fft` 系列 API 中的 `axis` 计算错误。 ([#36321](https://github.com/PaddlePaddle/Paddle/pull/36321))
+
+- 修复 batch_norm_grad op 在 FP16 数据类型时输出数据类型注册的 bug，该 bug 会导致部分场景下编译失败，并且对 FP16 计算精度会有一定影响。([#42461](https://github.com/PaddlePaddle/Paddle/pull/42461))
+
+- 修复 `paddle.nn.functional.pad` API 在模型动转静时，padding 为 Tensor 条件下的 Infershape 信息错误问题。([#42414](https://github.com/PaddlePaddle/Paddle/pull/42414))
+
+- 修复 `paddle.distribution.StickBreakingTransform` 输入维度超过2时异常的问题。([#41762](https://github.com/PaddlePaddle/Paddle/pull/41672))
+
+- 修复 fused_attention op 中 QK^T 计算出 nan/inf 的问题。([#42032](https://github.com/PaddlePaddle/Paddle/pull/42032))
+
+- 修复 fused_attention op 中 FusedResidualDropoutBias 在V100上计算出 nan/inf 问题。([#42398](https://github.com/PaddlePaddle/Paddle/pull/42398))
+
+- 修复 full_like op 在执行时引入的多余的 data transform 问题。([#41973](https://github.com/PaddlePaddle/Paddle/pull/41973)) 
+
+- 修复 p_norm op 在 GPU 环境上计算 nan 的问题。([#41804](https://github.com/PaddlePaddle/Paddle/pull/41804)) 
+
+- 修复 split op 在参数 sections 存在为0的 size 情况下，段错误的问题。（[#41755](https://github.com/PaddlePaddle/Paddle/pull/41755)）
+
+- 修复6个 elementwise op（pow、complex、divide_double、multiply_double、fmax、fmin）在需要 broadcast 的情况下，多卡训练时报Place(gpu:0) 不支持的问题。([#42332](https://github.com/PaddlePaddle/Paddle/pull/42332))
+
+- 修复 import paddle 时由于 PIL 版本升级导致的废弃接口报 warning 的问题。([#42307](https://github.com/PaddlePaddle/Paddle/pull/42307))
+
+- 修复静态图下 `paddle.linalg.matrix_rank`不支持 tol 为 FP64 Tensor 的问题。([#42085](https://github.com/PaddlePaddle/Paddle/pull/42085)) 
 
 #### IR(Intermediate Representation)
 
@@ -1638,7 +1817,11 @@ AssertionError: elu_ only support alpha >= 0, please use elu instead.
   
   - 修复控制流 For 中返回单值时代码转换的问题。([#40683](https://github.com/PaddlePaddle/Paddle/pull/40683)) 
   
-  - 修复控制流 cond 的输入包含 LoDTensorArray 时，生成反向 op 会报错的问题。([#39585](https://github.com/PaddlePaddle/Paddle/pull/39585)) 
+  - 修复控制流 cond 的输入包含 LoDTensorArray 时，生成反向 op 会报错的问题。([#39585](https://github.com/PaddlePaddle/Paddle/pull/39585))
+  
+  - 修复 `padddle.jit.save`在导出动转静模型时丢失顶层 Layer 的 forward_pre_hook 和 forward_post_hook 的问题。([#42273](https://github.com/PaddlePaddle/Paddle/pull/42273))
+
+  - 修复 `paddle.expand`中 shape 参数包含 Tensor 在动转静时会转换报错的问题。([#41973](https://github.com/PaddlePaddle/Paddle/pull/41973))
 
 #### 分布式训练
 
@@ -1794,17 +1977,14 @@ AssertionError: elu_ only support alpha >= 0, please use elu instead.
 
 - 修复 Expand_As op 计算输出 shape 时逻辑的错误。([#38677](https://github.com/PaddlePaddle/Paddle/pull/38677))
 
-- 框架功能修复
   
-  - 修复 `core.VarDesc.VarType.STRINGS` 类型的变量获取 `lod_level` 属性报错的问题，并且设置其 `lod_level` 为None。([#39077](https://github.com/PaddlePaddle/Paddle/pull/39077))
+- 修复 `core.VarDesc.VarType.STRINGS` 类型的变量获取 `lod_level` 属性报错的问题，并且设置其 `lod_level` 为None。([#39077](https://github.com/PaddlePaddle/Paddle/pull/39077))
   
-  - 修复框架功能 `Pylayer` 不支持不同 dtype 的问题。 ([#37974](https://github.com/PaddlePaddle/Paddle/pull/37974))
+- 修复框架功能 `PyLayer` 不支持不同 dtype 的问题。 ([#37974](https://github.com/PaddlePaddle/Paddle/pull/37974))
 
-- API修复
-  
-  - 修复了学习率衰减 API `paddle.optimizer.lr.PolynomialDecay` 的零除问题。 ([#38782](https://github.com/PaddlePaddle/Paddle/pull/38782)) 
-  
-  - 修复调用 DisableGlogInfo() 接口后依旧残留部分日志的问题。 ([#36356](https://github.com/PaddlePaddle/Paddle/pull/36356)) 
+- 修复了学习率衰减 API `paddle.optimizer.lr.PolynomialDecay` 的零除问题。 ([#38782](https://github.com/PaddlePaddle/Paddle/pull/38782)) 
+
+- 修复调用 DisableGlogInfo() 接口后依旧残留部分日志的问题。 ([#36356](https://github.com/PaddlePaddle/Paddle/pull/36356)) 
 
 - 修复 SimpleRNN、GRU和LSTM API CPU训练时多层RNN（dropout 设置为0时）反向计算出错的问题。 ([#37080](https://github.com/PaddlePaddle/Paddle/pull/37080)) 
 
@@ -1812,7 +1992,11 @@ AssertionError: elu_ only support alpha >= 0, please use elu instead.
 
 - 使 `paddle.roll` 的 shifts 参数支持传入 Tensor。 ([#36727](https://github.com/PaddlePaddle/Paddle/pull/36727)) 
 
-- 为 fft 添加 onemkl 作为可选的计算后端。 ([#36414](https://github.com/PaddlePaddle/Paddle/pull/36414)) 
+- 为 fft 添加 onemkl 作为可选的计算后端。 ([#36414](https://github.com/PaddlePaddle/Paddle/pull/36414))
+
+- 修复 mamtul_v2 和 elementwise_div 两个 op 在 bfloat16 类型下的精度问题。([#42479](https://github.com/PaddlePaddle/Paddle/pull/42479)) 
+
+- 修复显存回收时 LoDTensorArray 只清理内部 Tensor 而未清空 Array 导致的下个 step 可能出错的问题。([#42398](https://github.com/PaddlePaddle/Paddle/pull/42398)) 
 
 ## 4. 部署方向（Paddle Inference）
 
@@ -1902,6 +2086,10 @@ AssertionError: elu_ only support alpha >= 0, please use elu instead.
 
 - 将 matmul 融合相关的 pass 基于不同的后端（GPU、CPU、TensorRT）拆开，支持 FC 权重的转置功能。([#39369](https://github.com/PaddlePaddle/Paddle/pull/39369))
 
+- 新增 roll、strided_slice、slice op 在动态 shape 的情况下对 TensorRT 的支持。([#41913](https://github.com/PaddlePaddle/Paddle/pull/41913), [#41573](https://github.com/PaddlePaddle/Paddle/pull/41573), [#41467](https://github.com/PaddlePaddle/Paddle/pull/41467))
+
+- 新增 div op 对 TensorRT 的支持。([#41243](https://github.com/PaddlePaddle/Paddle/pull/41243))
+
 - 量化支持
   
   - `PostTrainingQuantization` API新增支持`paddle.io.DataLoader` 对象或者 `Python Generator`的输入。([#38686](https://github.com/PaddlePaddle/Paddle/pull/38686))
@@ -1948,6 +2136,8 @@ AssertionError: elu_ only support alpha >= 0, please use elu instead.
 
 - TensorRT 动态 shape 参数自动生成接口增加文件存在性检查。([#36628](https://github.com/PaddlePaddle/Paddle/pull/36628))
 
+- 修复 MKLDNN 不支持 conv3d 的问题。([#42055](https://github.com/PaddlePaddle/Paddle/pull/42055))
+
 #### 后端能力修复
 
 - 修复预测时 cuDNN 默认算法选择配置，使用非 deterministic 策略。 ([#41491](https://github.com/PaddlePaddle/Paddle/pull/41491))
@@ -2063,7 +2253,7 @@ AssertionError: elu_ only support alpha >= 0, please use elu instead.
 
 ### 编译安装
 
-- 从2.3.0-rc0版本开始，飞桨对框架支持的 GPU 架构种类进行了调整和升级。(更多请参考: [飞桨支持的 GPU 架构](https://www.paddlepaddle.org.cn/documentation/docs/zh/2.3rc/install/Tables.html#gpu))
+- 从2.3.0 版本开始，飞桨对框架支持的 GPU 架构种类进行了调整和升级。(更多请参考: [飞桨支持的 GPU 架构](https://www.paddlepaddle.org.cn/documentation/docs/zh/2.3rc/install/Tables.html#gpu))
 
 备注：
 
@@ -2087,12 +2277,15 @@ AssertionError: elu_ only support alpha >= 0, please use elu instead.
     
     - CUDA11 : 3.5, 5.0, 6.0, 6.1, 7.0, 7.5, 8.0。
 
+- 支持 Python 3.10，修复 Windows 下某些 PythonC API 变化导致的编译 bug。([#41180](https://github.com/PaddlePaddle/Paddle/pull/42180))
+
 - Windows 平台支持 Visual Studio 2019 编译。 ([#38719](https://github.com/PaddlePaddle/Paddle/pull/38719)) 
 
 - 消除 Windows 平台编译时出现的各种 warning。 ([#38034](https://github.com/PaddlePaddle/Paddle/pull/38034), [#37890](https://github.com/PaddlePaddle/Paddle/pull/37890), [#37442](https://github.com/PaddlePaddle/Paddle/pull/37442), [#37439](https://github.com/PaddlePaddle/Paddle/pull/37439), [#36857](https://github.com/PaddlePaddle/Paddle/pull/36857)) 
 
 - 修复底层数据结构升级引入的 jetson 编译问题。 ([#39669](https://github.com/PaddlePaddle/Paddle/pull/39669), [#39441](https://github.com/PaddlePaddle/Paddle/pull/39441))
 
+
 ### 新硬件适配
 
 - 自定义新硬件接入：提供一种插件式扩展 PaddlePaddle 硬件后端的方式。通过该功能，开发者无需为特定硬件修改 PaddlePaddle 代码，只需实现标准接口，并编译成动态链接库，则可作为插件供 PaddlePaddle 调用。降低为 PaddlePaddle 添加新硬件后端的开发难度。当前支持自定义 Runtime 接入和自定义 Kernel 接入。
@@ -2107,6 +2300,6 @@ AssertionError: elu_ only support alpha >= 0, please use elu instead.
 
 ## Thanks to our Contributors
 
-This release contains contributions from the project core team as well as :
+This release contains contributions from the project core team as well as :
 
 Adam Osewski, Allen Guo, arlesniak, chenenquan, chenyanlann, fengkuangxiaxia, fuqianya, fwenguang, guguguzi, helen88, houj04, Jacek Czaja, jakpiase, jianghaicheng, joanna.wozna.intel, joeqiao12, Leo Chen, Leo Guo, Li-fAngyU, lidanqing, Liyulingyue, Matsumoto GAO, maxhuiy, Ming-Xu Huang, Nyakku Shigure, piotrekobi, piotrekobiIntel, QingshuChen, qipengh, Skr Bang, Sylwester Fraczek, Sławomir Siwek, taixiurong, tanzhipeng, Tomasz Socha, TTerror, Webbley, yaozhixin, ykkk2333, yujun, Zhangjingyu06, zhangxiaoci, zhangyikun02, zhangyk0314, zlsh80826, zn, Zuza
diff --git a/docs/release_note_en.md b/docs/release_note_en.md
index e9e51ea79a8..4969c710d50 100644
--- a/docs/release_note_en.md
+++ b/docs/release_note_en.md
@@ -1,9 +1,9 @@
 
-# 2.3.0-rc0 Release Note
+# 2.3.0 Release Note
 
 ## 1. **Important Updates**
 
-We are excited to release the PaddlePaddle Framework V2.3.0-rc0. This version contains the following highlights.
+We are excited to release the PaddlePaddle Framework V2.3.0. This version contains the following highlights.
 
 ### API
 
@@ -15,7 +15,7 @@ We are excited to release the PaddlePaddle Framework V2.3.0-rc0. This version co
   
 - Added 9 new framework performance analysis APIs. The new performance profiling APIs, centered around Paddle.Profiler.Profiler, help users collect and analyze performance statistics during training and inference.
   
-- Added 7 APIs for device management, facilitating hardware information acquistion.
+- Added 7 APIs for device management, facilitating hardware information acquistion.
   
 - Added several visual and text domain APIs to facilitate ~~the~~ reusability of MobileNetV3, ResNeXt and other backbone networks, to achieve the fast networking.
   
@@ -35,7 +35,7 @@ We are excited to release the PaddlePaddle Framework V2.3.0-rc0. This version co
 
 ### **Compile and Install**
   
-- From version 2.3.0-rc0, PaddlePaddle upgrades GPU architectures supported.
+- From version 2.3.0, PaddlePaddle upgrades GPU architectures supported.
   
 
 ### **Inference Deployment**
@@ -49,7 +49,7 @@ We are excited to release the PaddlePaddle Framework V2.3.0-rc0. This version co
 
 - Add custom device support: provide a plug-in way to extend PaddlePaddle hardware backend.
   
-- Add training/inference support for multiple heterogeneous chips such as HUAWEI Ascend 910 / GraphCore IPU / Cambricon MLU / Kunlunxin 2.
+- Add training/inference support for multiple heterogeneous chips such as HUAWEI Ascend 910 / GraphCore IPU / Cambricon MLU / KUNLUNXIN 2.
   
 
 ### **Framework Architecture**
@@ -58,6 +58,8 @@ We are excited to release the PaddlePaddle Framework V2.3.0-rc0. This version co
 
 ## **2. Incompatibility Upgrade**
 
+- Due to limitation of the binary size, sm35 CUDA ARCH is dropped in pre-compiled binaries. ([#41754](https://github.com/PaddlePaddle/Paddle/pull/41754))
+
 - When `paddle.to_tensor` converts a python int scalar to a Tensor, the default data type on Windows changes from int32 to int64, thus alignment with Linux/Mac. ([#39662](https://github.com/PaddlePaddle/Paddle/pull/39662))
   
 - To keep consistency with division behavior under python3, the division symbol `/` has been changed from “rounding divide” to “true divide”, and the data type of the computed output has been switched from int to float. ([#40890](https://github.com/PaddlePaddle/Paddle/pull/40890))
@@ -69,7 +71,7 @@ We are excited to release the PaddlePaddle Framework V2.3.0-rc0. This version co
 2.2
 </th>
 <th>
-2.3.0-rc0
+2.3.0
 </th>
 </tr>
 
@@ -111,7 +113,7 @@ Tensor(shape=[1], dtype=float32, place=Place(gpu:0), stop_gradient=True,
 2.2
 </th>
 <th>
-2.3.0-rc0
+2.3.0
 </th>
 </tr>
 
@@ -419,6 +421,12 @@ AssertionError: elu_ only support alpha >= 0, please use elu instead.
   - `paddle.incubate.optimizer.functional.minimize_lbfgs`，add second-order optimizer L-BFGS.
     
 - Add `paddle.incubate.multiprocessing` module, to provide Tensor (CPU/GPU) data transfer between python processes. ([#37302](https://github.com/PaddlePaddle/Paddle/pull/37302), [#41339](https://github.com/PaddlePaddle/Paddle/pull/41339))
+
+- Add `paddle.incubate.autotune.set_config` API, to support multi-version Kernel auto-selection, mixed precision data layout auto-conversion, and num_workers auto-selection for DataLoader to automatically improve model performance. ([#42301](https://github.com/PaddlePaddle/Paddle/pull/42301))
+
+- Add `paddle.incubate.nn.FusedMultiTransformer` and `paddle.incubate.nn.functional.fused_multi_transformer` API, to fuse multiple layers of transformers into a single op to improve model inference performance. It should be noted that only forward is supported.  ([#42311](https://github.com/PaddlePaddle/Paddle/pull/42311))
+
+- Add einsum_v2 operators for consistent interface between imperative and static mode. It is compatible with the `paddle.einsum` implementation at the original python side, while supporting dynamic to static export and more complete Infershape inference. ([#42495](https://github.com/PaddlePaddle/Paddle/pull/42495), [#42327](https://github.com/PaddlePaddle/Paddle/pull/42327), [#42397](https://github.com/PaddlePaddle/Paddle/pull/42397), [#42105](https://github.com/PaddlePaddle/Paddle/pull/42105))
   
 
 #### IR(Intermediate Representation)
@@ -496,7 +504,7 @@ AssertionError: elu_ only support alpha >= 0, please use elu instead.
 
 #### **Paddle HIgh reusability operator library**
 
-We anounce PHI as the new Paddle HIgh reusability operator library. PHI provides Primitive API, enabling kernel reuse for operator development. As a refactored functional operator library, PHI aims to solve legacy problems that harm the framework's performance and reusability, in particular on the operator development. Such problems include inefficient ways of cross using operators, unclear operator interfaces and lacking direct calls to the operator library in C++. With PHI, new operators can be easily implemented by composing functions available in the functional library. The library provides over 200 C++ operator class APIs and nearly 500 kernels. Composing new operators through these built-in functions can greatly reduce the user's development effort. PHI supports different types of hardware (e.g., GPU and XPU). In addition, PHI is extensible with plugins for accommodating third party accelerators (such as NPU) in a low cost and reusable fashion. In short, PHI supports low level operator composabilty, the reuse of kernels through Primitives, and accelerators through plugins.The main contents include six parts as below:
+We announce PHI as the new Paddle HIgh reusability operator library. PHI provides Primitive API, enabling kernel reuse for operator development. As a refactored functional operator library, PHI aims to solve legacy problems that harm the framework's performance and reusability, in particular on the operator development. Such problems include inefficient ways of cross using operators, unclear operator interfaces and lacking direct calls to the operator library in C++. With PHI, new operators can be easily implemented by composing functions available in the functional library. The library provides over 200 C++ operator class APIs and nearly 500 kernels. Composing new operators through these built-in functions can greatly reduce the user's development effort. PHI supports different types of hardware (e.g., GPU and XPU). In addition, PHI is extensible with plugins for accommodating third party accelerators (such as NPU) in a low cost and reusable fashion. In short, PHI supports low level operator composabilty, the reuse of kernels through Primitives, and accelerators through plugins.The main contents include six parts as below:
 
 - **The implementation of the operator library infrastructure, core components and mechanisms** : The directory structure of the new operator library is reasonably planned, design and implement the common base data structure of the new operator library, the new functional InferMeta and Kernel development paradigm and the corresponding registration and management components. Support the automated compilation object generation and compilation dependency generation of Kernel files, allowing developers to focus only on the functional Kernel implementation, and making the development paradigm clear and concise. ([#34425](https://github.com/PaddlePaddle/Paddle/pull/34425), [#37107](https://github.com/PaddlePaddle/Paddle/pull/37107), [#36946](https://github.com/PaddlePaddle/Paddle/pull/36946), [#36948](https://github.com/PaddlePaddle/Paddle/pull/36948), [#37876](https://github.com/PaddlePaddle/Paddle/pull/37876), [#37916](https://github.com/PaddlePaddle/Paddle/pull/37916), [#37977](https://github.com/PaddlePaddle/Paddle/pull/37977), [38078](https://github.com/PaddlePaddle/Paddle/pull/38078), [#38861](https://github.com/PaddlePaddle/Paddle/pull/38861), [#39123](https://github.com/PaddlePaddle/Paddle/pull/39123), [#39131](https://github.com/PaddlePaddle/Paddle/pull/39131), [#39748](https://github.com/PaddlePaddle/Paddle/pull/39748), [#39790](https://github.com/PaddlePaddle/Paddle/pull/39790), [#39941](https://github.com/PaddlePaddle/Paddle/pull/39941), [#40239](https://github.com/PaddlePaddle/Paddle/pull/40239), [#40635](https://github.com/PaddlePaddle/Paddle/pull/40635), [#41091](https://github.com/PaddlePaddle/Paddle/pull/41091), [#37409](https://github.com/PaddlePaddle/Paddle/pull/37409), [#37942](https://github.com/PaddlePaddle/Paddle/pull/37942), [#39002](https://github.com/PaddlePaddle/Paddle/pull/39002), [#38109](https://github.com/PaddlePaddle/Paddle/pull/38109), [#37881](https://github.com/PaddlePaddle/Paddle/pull/37881), [#37517](https://github.com/PaddlePaddle/Paddle/pull/37517), [#39870](https://github.com/PaddlePaddle/Paddle/pull/39870), [#40975](https://github.com/PaddlePaddle/Paddle/pull/40975), [#39475](https://github.com/PaddlePaddle/Paddle/pull/39475), [#37304](https://github.com/PaddlePaddle/Paddle/pull/37304), #36910, #37120, #37146, #37215, #37255, #37369, #38258, #38257, #38355, #38853, #38937, #38977, #38946, #39085, #39153, #39228, #38301, #38275, #38506, #38607, #38473, #38632, #38811, #38880, #38996, #38914, #39101)
   
@@ -1012,6 +1020,7 @@ We anounce PHI as the new Paddle HIgh reusability operator library. PHI provides
     
   - hard_sigmoid ([#40626](https://github.com/PaddlePaddle/Paddle/pull/40626))
     
+  - exp, det, assign, gaussian_random, matrix_rank, eye, and deformable_conv. ([#41755](https://github.com/PaddlePaddle/Paddle/pull/41755), [#41737](https://github.com/PaddlePaddle/Paddle/pull/41737))
 
 #### **New Dynamic Graph Execution Mechanism**
 
@@ -1041,25 +1050,35 @@ To improve scheduling performance and custom development capability of the dynam
     
 - **Support inplace after dynamic graph reconstruction**: input and output are the same Tensor.
   
-  - - Adapt the inplace strategy for dynamic graph reconstruction intermediate states.([#40400](https://github.com/PaddlePaddle/Paddle/pull/40400))
-      
-    - Adapt the inplace strategy to the final state of the dynamic graph reconstruction. ([#40695](https://github.com/PaddlePaddle/Paddle/pull/40695))
-      
-    - Add inplace strategy to PyLayer function after dynamical graph reconstruction. ([#41043](https://github.com/PaddlePaddle/Paddle/pull/41043))
-      
-    - Add inplace strategy for Tensor's setitem function after dynamical graph reconstruction. ([#40915](https://github.com/PaddlePaddle/Paddle/pull/40915))
-      
-    - Add `_reset_grad_inplace_version` interface after dynamic graph reconstruction, to set the inplace version of the Tensor's gradient to 0. ([#41101](https://github.com/PaddlePaddle/Paddle/pull/41101))
-      
-    - If the value of the forward Tensor is not needed during the inverse computation (no need buffer property), the inplace version detection operation is not needed for that Tensor. For Tensor with no_need_buffer, skip the inplace version check. ([#41350](https://github.com/PaddlePaddle/Paddle/pull/41350))
-      
-    - Unify error messages for inplace version checks after and before reconstruction of dynamic graphs. ([#41209](https://github.com/PaddlePaddle/Paddle/pull/41209))
+  - Adapt the inplace strategy for dynamic graph reconstruction intermediate states.([#40400](https://github.com/PaddlePaddle/Paddle/pull/40400))
+    
+  - Adapt the inplace strategy to the final state of the dynamic graph reconstruction. ([#40695](https://github.com/PaddlePaddle/Paddle/pull/40695))
+    
+  - Add inplace strategy to PyLayer function after dynamical graph reconstruction. ([#41043](https://github.com/PaddlePaddle/Paddle/pull/41043))
+    
+  - Add inplace strategy for Tensor's setitem function after dynamical graph reconstruction. ([#40915](https://github.com/PaddlePaddle/Paddle/pull/40915))
+    
+  - Add `_reset_grad_inplace_version` interface after dynamic graph reconstruction, to set the inplace version of the Tensor's gradient to 0. ([#41101](https://github.com/PaddlePaddle/Paddle/pull/41101))
+    
+  - If the value of the forward Tensor is not needed during the inverse computation (no need buffer property), the inplace version detection operation is not needed for that Tensor. For Tensor with no_need_buffer, skip the inplace version check. ([#41350](https://github.com/PaddlePaddle/Paddle/pull/41350))
+    
+  - Unify error messages for inplace version checks after and before reconstruction of dynamic graphs. ([#41209](https://github.com/PaddlePaddle/Paddle/pull/41209))
       
 - **Support view strategy after dynamical graph reconstruction**: input and output Tensor share underlying data.
   
-  - - Adapt the view strategy for dynamic graph reconstruction intermediate states. Include `reshape` , `squeeze` , `unsqueeze` , and `flatten` APIs. ([#40830](https://github.com/PaddlePaddle/Paddle/pull/40830))
+  - Adapt the view strategy for dynamic graph reconstruction intermediate states. Include `reshape` , `squeeze` , `unsqueeze` , and `flatten` APIs. ([#40830](https://github.com/PaddlePaddle/Paddle/pull/40830))
       
-    - Adapt the view strategy for dynamic graph reconstruction final state. Include `reshape` API. ([#40891](https://github.com/PaddlePaddle/Paddle/pull/40891))
+  - Adapt the view strategy for dynamic graph reconstruction final state. Include `reshape` API. ([#40891](https://github.com/PaddlePaddle/Paddle/pull/40891))
+
+- **Add support for weakref on the python side of the new dynamic graph eager Tensor.** ([#41797](https://github.com/PaddlePaddle/Paddle/pull/41797))
+
+- **Enhance the new dynamic graph DoubleGrad function** to support the basic DoubleGrad feature.([#41893](https://github.com/PaddlePaddle/Paddle/pull/41893), [#41894](https://github.com/PaddlePaddle/Paddle/pull/41894), [#41895](https://github.com/PaddlePaddle/Paddle/pull/41895))
+
+- **Add `core.eager.StringTensor` interface**, to support the construction of StringTensor on python side and the use of the StringTensor related APIs. ([#41039](https://github.com/PaddlePaddle/Paddle/pull/41039))
+
+- **Add `_grad_name` and `_grad_value`*to `core.eager.Tensor` to return the name and value of a gradient.  ([#41990](https://github.com/PaddlePaddle/Paddle/pull/41990))
+
+- **Add the processing of the no_need_buffer attribute for dynamic graph intermediate state.** The Tensor with the no_need_buffer attribute is skipped in the inplace backward check operation. ([#41720](https://github.com/PaddlePaddle/Paddle/pull/41720))
       
 
 #### **New Static Graph Executor**
@@ -1073,6 +1092,11 @@ In order to solve the problem that the original static graph executor of the Pad
 - Interface compatibility: Compatible with the user interface and functionality of the original executor, such as alignment with python interface Executor.run(), support for managing Tensor in Scope, etc. This ensures that users can switch to the new executor without perception. ([#37278](https://github.com/PaddlePaddle/Paddle/pull/37278), [#37379](https://github.com/PaddlePaddle/Paddle/pull/37379), [#37445](https://github.com/PaddlePaddle/Paddle/pull/37445), [#37510](https://github.com/PaddlePaddle/Paddle/pull/37510), [#40955](https://github.com/PaddlePaddle/Paddle/pull/40955), [#41778](https://github.com/PaddlePaddle/Paddle/pull/41178), [#41058](https://github.com/PaddlePaddle/Paddle/pull/41058), [#38584](https://github.com/PaddlePaddle/Paddle/pull/38584), [#37957](https://github.com/PaddlePaddle/Paddle/pull/37957), [#37672](https://github.com/PaddlePaddle/Paddle/pull/37672), [#37474](https://github.com/PaddlePaddle/Paddle/pull/37474), [#37085](https://github.com/PaddlePaddle/Paddle/pull/37085), [#37061](https://github.com/PaddlePaddle/Paddle/pull/37061), [#36945](https://github.com/PaddlePaddle/Paddle/pull/36945))
   
 - Enhance debugging and error reporting in multi-threaded scenarios by capturing error reports from sub-threads and throwing them uniformly in the main thread. This can improve user experience. ([#36692](https://github.com/PaddlePaddle/Paddle/pull/36692)，[#36802](https://github.com/PaddlePaddle/Paddle/pull/36802))
+
+- Fix the bug with the new executor communication flow resetting stream cache information in the allocator, to reduce RecordStream overhead in cross-stream scenarios. This improves performance of DeepFM models by about 8% after optimization. ([#42046](https://github.com/PaddlePaddle/Paddle/pull/42046))
+
+- Optimize the dependency analysis method between new executor operators to improve runtime performance. Establish correct dependencies for send/recv communication operators to support pipeline parallel. ([#42009](https://github.com/PaddlePaddle/Paddle/pull/42009))
+
   
 
 #### **Distributed Training**
@@ -1157,6 +1181,7 @@ In order to solve the problem that the original static graph executor of the Pad
     
   - Add evaluation metrics module under the Unified Parameter Server, to support AUC/WuAUC/MaskAUC and other evaluation metrics calculation and customizable extensions. ([#38789](https://github.com/PaddlePaddle/Paddle/pull/38789))
     
+  - Supports XPU parameter server training on KUNLUNXIN 2. ([#41917](https://github.com/PaddlePaddle/Paddle/pull/41917), [#42266](https://github.com/PaddlePaddle/Paddle/pull/42266), [#41916](https://github.com/PaddlePaddle/Paddle/pull/41916))  
 
 #### Profiler
 
@@ -1192,6 +1217,89 @@ In order to solve the problem that the original static graph executor of the Pad
     
   - Profiler support for grading.（[#39926](https://github.com/PaddlePaddle/Paddle/pull/39926)）
     
+- Modify the name and type of logging for op under new dynamic graph.（[#41771](https://github.com/PaddlePaddle/Paddle/pull/41771/)
+
+- Add Kernel running statistics into profilers' summarization and optimize the summarization.（[#41989](https://github.com/PaddlePaddle/Paddle/pull/41989)
+
+- Remove side-effect to performance in forward computing forward when Profiler is off. （[#42142](https://github.com/PaddlePaddle/Paddle/pull/42142)）
+
+#### **CINN compiler adoption**
+
+With the recent development of PaddlePaddle's compiler, a.k.a, CINN（[GitHub - PaddlePaddle/CINN: Compiler Infrastructure for Neural Networks](https://github.com/PaddlePaddle/CINN)）, paddle framework has also been changed to adapt the compiler CINN features. These include the subgraph management related functions for the Paddle-CINN runtime, optimization of memory and speed performance, and bug fixing during development.
+
+- Functions developed:
+  
+  - Subgraph op related functions:
+    
+    - Add the function to find and generate CINN subgraphs from computational graphs.（[#36345](https://github.com/PaddlePaddle/Paddle/pull/36345)）
+    
+    - Add cinn_launch op as a runtime entry point to CINN. It is responsible for scheduling CINN to compile the subgraph, to initialize the data, and to execute the generated kernels.（[#36600](https://github.com/PaddlePaddle/Paddle/pull/36600)）
+    
+    - Add a helper class `CinnLaunchContext` to the kernel implementation of cinn_launch op to manage the intermediate data for compiling and running subgraphs, to improve scalability and code readability.（[#37938](https://github.com/PaddlePaddle/Paddle/pull/37938)）
+    
+    - Add additional fetch nodes to CINN subgraphs, thus ensuring that CINN external nodes can fetch the values of variables.（[#37172](https://github.com/PaddlePaddle/Paddle/pull/37172), [#37190](https://github.com/PaddlePaddle/Paddle/pull/37190)）
+    
+    - Add the function to symbolize a CINN subgraph, which is used to topologically sort the subgraphs and return the CINN execution sequence.（[#36417](https://github.com/PaddlePaddle/Paddle/pull/36417)
+    
+    - Add `CinnCompiler` class for involking subgraphs in the CINN compiled graph that can be replaced by using CINN operators. （[#36562](https://github.com/PaddlePaddle/Paddle/pull/36562), [#36975](https://github.com/PaddlePaddle/Paddle/pull/36975)）
+    
+    - Add the interface to CINN symbolization class to get the names of subgraph fetched variables to prevent fetched variables from being eliminated in compilation optimizations.（[#37218](https://github.com/PaddlePaddle/Paddle/pull/37218)）
+  
+  - Checking, debugging, and PI changes related:
+    
+    - Synchronize the update of NetBuilder API name changes in CINN.（[#40392](https://github.com/PaddlePaddle/Paddle/pull/40392)）
+    
+    - Add necessary log information to Paddle-CINN for better debugging.（[#36867](https://github.com/PaddlePaddle/Paddle/pull/36867)）
+    
+    - Add the bidirectional conversion function between Paddle desc and CINN desc.（[#36100](https://github.com/PaddlePaddle/Paddle/pull/36100)）
+    
+    - The operator implemented in CINN may not use some input variables compared to Paddle. Therefore, remove the check that the input variables must be used in the cinn_launch op.（[#37119](https://github.com/PaddlePaddle/Paddle/pull/37119)）
+    
+    - Added cinn_instruction_run op for invoking CINN to execute a single generation instruction, facilitating the construction of scheduling run subgraphs on the Paddle side.（[#39435](https://github.com/PaddlePaddle/Paddle/pull/39435), [#39576](https://github.com/PaddlePaddle/Paddle/pull/39576)）
+    
+    - Add control macros to Paddle for CUDA/CUBLAS/MKL/CINN pass application required to compile CINN.（[#37066](https://github.com/PaddlePaddle/Paddle/pull/37066), [#36660](https://github.com/PaddlePaddle/Paddle/pull/36660)）
+    
+    - Add two control flags FLAGS_allow_cinn_ops and FLAGS_deny_cinn_ops to control the categories of CINN operators used to replace native operators during Paddle training.（[#36842](https://github.com/PaddlePaddle/Paddle/pull/36842)）
+
+- Performance optimization:
+  
+  - Speed optimization
+    
+    - Optimize the computational time consumed by CinnCacheKey.（[#37786](https://github.com/PaddlePaddle/Paddle/pull/37786), [#37317](https://github.com/PaddlePaddle/Paddle/pull/37317)）
+    
+    - Cache variable scope for CINN compiled subgraphs to reduce runtime parameter construction overhead.（[#37983](https://github.com/PaddlePaddle/Paddle/pull/37983)）
+    
+    - Utilize CINN's auto-tuning in case of subgraph compilation, could be enabled by flag, for further tuning of training performance.（[#41795](https://github.com/PaddlePaddle/Paddle/pull/41795)）
+    
+    - Refactor the correctness check of compilation results in case of subgraph compilation to avoid repeated checks at runtime and reduce the scheduling overhead.（[#41777](https://github.com/PaddlePaddle/Paddle/pull/41777)）
+    
+    - Enable TransposeFolding and GemmRewriter optimization passes by default in Paddle-CINN training.（[#41084](https://github.com/PaddlePaddle/Paddle/pull/41084)）
+    
+    - Pass the cuda stream created in Paddle into CINN so that Paddle and CINN can use the same CUDA stream in cuda computing.（[#37337](https://github.com/PaddlePaddle/Paddle/pull/37337)）
+    
+    - Move CINN optimization pass application logic from Paddle to CINN.（[#42047](https://github.com/PaddlePaddle/Paddle/pull/42047), [#42070](https://github.com/PaddlePaddle/Paddle/pull/42070)）
+  
+  - Device memory optimization
+    
+    - Add NoNeedBufferVars to cinn_launch op to declare a list of input variables that do not require a buffer, so that the memory can be freed in advance.（[#38367](https://github.com/PaddlePaddle/Paddle/pull/38367)）
+    
+    - Pass in reference count information for external variables to the subgraph, so that subgraphs within cinn_launch can reuse memory optimization passes and reduce the memory overhead in using CINN.（[#39209](https://github.com/PaddlePaddle/Paddle/pull/39209), [#39622](https://github.com/PaddlePaddle/Paddle/pull/39622)）
+    
+    - Add the function to convert a collection of executable instructions generated by CINN compilation to a Paddle Graph, supporting reuse of the Paddle scheduler and memory optimization pass, further reducing the memory overhead in using CINN. （[#39724](https://github.com/PaddlePaddle/Paddle/pull/39724), [#39911](https://github.com/PaddlePaddle/Paddle/pull/39911)）
+    
+    - Add Kernel of cinn_instruction_run op, to support dynamic device memory requests based on data types inferred from compilation results.（[#40920](https://github.com/PaddlePaddle/Paddle/pull/40920)）
+
+- Bug fixing:
+  
+  - Fix and optimize the generation logic of CINN subgraphs.（[#36503](https://github.com/PaddlePaddle/Paddle/pull/36503)）
+  
+  - Fix the bug that Paddle-CINN does not support no-input subgraphs.（[#40814](https://github.com/PaddlePaddle/Paddle/pull/40814)）
+  
+  - Fix an error reported due to CINN not being able to handle useless outputs in operators such as batch_norm.（[#36996](https://github.com/PaddlePaddle/Paddle/pull/36996)）
+  
+  - Fix several bugs in CINN subgraph partitioning and symbolization, and solve problems with Paddle training accessing the CINN. ([#36739](https://github.com/PaddlePaddle/Paddle/pull/36739), [#36698](https://github.com/PaddlePaddle/Paddle/pull/36698) )
+  
+  - CINN does not yet support the control flow yet. Add logic to skip control flow when encountered.（[#40812](https://github.com/PaddlePaddle/Paddle/pull/40812)）
 
 #### **Other**
 
@@ -1283,12 +1391,17 @@ In order to solve the problem that the original static graph executor of the Pad
   - conv2d ([#38507](https://github.com/PaddlePaddle/Paddle/pull/38507)，[#38938](https://github.com/PaddlePaddle/Paddle/pull/38938)，[#36284](https://github.com/PaddlePaddle/Paddle/pull/36284))
     
   - LayerNorm ([#40418](https://github.com/PaddlePaddle/Paddle/pull/40418))
-    
+
+- Add the 3-stage storage graph retrieval engine based on SSD - host memory - GPU device memory, to support large-scale graph neural network training. ([#42472](https://github.com/PaddlePaddle/Paddle/pull/42472), [#42321](https://github.com/PaddlePaddle/Paddle/pull/42321), [#42027](https://github.com/PaddlePaddle/Paddle/pull/42027))
+
+- Add heterogeneous multi-cloud training communication module switch, implement the Send/Recv interface function, and support multiple heterogeneous cloud communication.（[#40965](https://github.com/PaddlePaddle/Paddle/pull/40965) [40911](https://github.com/PaddlePaddle/Paddle/pull/40911)）
 
 ### **(2) Function optimization**
 
 #### API
 
+- Add backward implementation of `paddle.linalg.det `. ([#36013](https://github.com/PaddlePaddle/Paddle/pull/36013))
+
 - Add support for mixed precision training O2 mode for `paddle.Model`, i.e., support for Pure FP16 training mode of the original dynamic/static graphs. ([#36441](https://github.com/PaddlePaddle/Paddle/pull/40962441))
   
 - Support for self chain calls for `paddle.nn.Layer`. ([#36609](https://github.com/PaddlePaddle/Paddle/pull/36609))
@@ -1359,6 +1472,11 @@ In order to solve the problem that the original static graph executor of the Pad
   
 - Add check for unstack and unique op in case of input Tensor with 0 elements. ([#36021](https://github.com/PaddlePaddle/Paddle/pull/36021))
   
+- Add new multi-layer, bi-directional LSTM function that supports KUNLUNXIN 2, to improve RNN forward/backward ops, and support the use of temporal model training. ([#](https://github.com/PaddlePaddle/Paddle/pull/41781)[42076](https://github.com/PaddlePaddle/Paddle/pull/42076))
+
+- Add bce_loss forward/backward ops for KUNLUNXIN 2. ([#41610](https://github.com/PaddlePaddle/Paddle/pull/41610))
+
+- Add backward implementation of `paddle.linalg.det `. ([#36013](https://github.com/PaddlePaddle/Paddle/pull/36013))
 
 #### IR(Intermediate Representation)
 
@@ -1423,7 +1541,22 @@ In order to solve the problem that the original static graph executor of the Pad
   - Optimize the merge logic of embedding op to improve performance by exploiting the topological relationship of embedding op in the model. [(#35942)](https://github.com/PaddlePaddle/Paddle/pull/35942)
     
 - Communication library: restructure the communication library to improve the scalability and development of the communication library, and support heterogeneous communication. ([#41398](https://github.com/PaddlePaddle/Paddle/pull/41398), [#39720](https://github.com/PaddlePaddle/Paddle/pull/39720), [#40911](https://github.com/PaddlePaddle/Paddle/pull/40911), [#40579](https://github.com/PaddlePaddle/Paddle/pull/40579), [#40629](https://github.com/PaddlePaddle/Paddle/pull/40629), [#40437](https://github.com/PaddlePaddle/Paddle/pull/40437), [#40430](https://github.com/PaddlePaddle/Paddle/pull/40430), [#40228](https://github.com/PaddlePaddle/Paddle/pull/40228), [#40181](https://github.com/PaddlePaddle/Paddle/pull/40181), [#40100](https://github.com/PaddlePaddle/Paddle/pull/40100), [#40097](https://github.com/PaddlePaddle/Paddle/pull/40097), [#39892](https://github.com/PaddlePaddle/Paddle/pull/39892), [#39384](https://github.com/PaddlePaddle/Paddle/pull/39384), [#39737](https://github.com/PaddlePaddle/Paddle/pull/39737), [#40040](https://github.com/PaddlePaddle/Paddle/pull/40040))
-  
+
+- Support the publication of MoE-related interfaces in `paddle.incubate.distributed.models.moe ` (`moe.GShardGate `, `moe.BaseGate `, `moe.SwitchGate `, `moe.MoELayer `, and `moe. ClipGradForMOEByGlobalNorm `). ([#42300](https://github.com/PaddlePaddle/Paddle/pull/42300))
+
+- Fix the error report in the use of recomputing in `paddle.incubate.distributed.models.moe.MoELayer `. ([#42128](https://github.com/PaddlePaddle/Paddle/pull/42128))
+
+- Fix the error report in the new dynamic graph pipeline parallel caused by different data types  ([#41937](https://github.com/PaddlePaddle/Paddle/pull/41937) [#42053](https://github.com/PaddlePaddle/Paddle/pull/42053))
+
+- Fix the error report in the new dynamic graph tensor model parallel due to different data types（[#41960](https://github.com/PaddlePaddle/Paddle/pull/41960)）
+
+#### **Custom operator**
+
+- Enhance the C++ custom operator mechanism for writing second-order gradient operators, to support adding suffixes to the gradient input variables of second-order gradient operators for use as outputs. ([#41781](https://github.com/PaddlePaddle/Paddle/pull/41781)) 
+
+- Remove the use of the deprecated enumeration type `PlaceType` from the Tensor API member methods, make it compatible, and add a deprecation warning. ([#41882](https://github.com/PaddlePaddle/Paddle/pull/41882)) 
+
+- Add deprecated warning for a number of deprecated interfaces of the original Tensor API, including the incomplete constructor, reshape, mutable_data, and copy_to methods. ([#41882](https://github.com/PaddlePaddle/Paddle/pull/41882)) 
 
 #### **Other**
 
@@ -1454,6 +1587,15 @@ In order to solve the problem that the original static graph executor of the Pad
   
 - CPU parameter server streaming training optimization: support for automatic statistics of sparse parameter statistics, incremental saving of sparse parameters, etc. The training performance improves by 20%. ([#36465](https://github.com/PaddlePaddle/Paddle/pull/36465), [#36601](https://github.com/PaddlePaddle/Paddle/pull/36601), [#36734](https://github.com/PaddlePaddle/Paddle/pull/36734), [#36909](https://github.com/PaddlePaddle/Paddle/pull/36909), [#36943](https://github.com/PaddlePaddle/Paddle/pull/36943), [#37181](https://github.com/PaddlePaddle/Paddle/pull/37181), [#37194](https://github.com/PaddlePaddle/Paddle/pull/37194), [#37515](https://github.com/PaddlePaddle/Paddle/pull/37515), [#37626](https://github.com/PaddlePaddle/Paddle/pull/37626), [#37995](https://github.com/PaddlePaddle/Paddle/pull/37995), [#38582](https://github.com/PaddlePaddle/Paddle/pull/38582), [#39250](https://github.com/PaddlePaddle/Paddle/pull/39250), [#40762](https://github.com/PaddlePaddle/Paddle/pull/40762), [#41234](https://github.com/PaddlePaddle/Paddle/pull/41234), [#41320](https://github.com/PaddlePaddle/Paddle/pull/41320), [#41400](https://github.com/PaddlePaddle/Paddle/pull/41400))
   
+#### **Auto-tuning**
+
+Add hardware-aware automatic performance tuning for the full training process, with performance improvements of about 3% to 50% or more on image classification, segmentation, detection, and image generation tasks compared to the model's default configuration. The auto-tuning status is set via the `paddle.incubate.autotune.set_config ` API. By default, it is currently disabled. Auto-tuning has three specific levels:
+
+- Add the auto-tuning function to `paddle.io.DataLoader `, to select the best num_workers based on training data and device resources.  ([#42004](https://github.com/PaddlePaddle/Paddle/pull/42004))
+
+- Add mixed-precision training data layout auto-tuning feature, to select the best data layout based on device type and data type, and automatically convert it at runtime. ([#41964](https://github.com/PaddlePaddle/Paddle/pull/41964))
+
+- Add the automatic tuning of the required workspace size threshold for Conv, which is automatically set based on the GPU's currently available requested device memory resources. Add the automatic selection of Conv cuDNN algorithms based on the generic AlgorithmCache design and Kernel timing component, which supports data variation length models.（[#41833](https://github.com/PaddlePaddle/Paddle/pull/41833)）
 
 #### **Operator Optimization**
 
@@ -1512,7 +1654,14 @@ In order to solve the problem that the original static graph executor of the Pad
 - Optimize `Elementwise` computation for multivariate output, improving performance by up to 15% over pre-optimization. （[#38329](https://github.com/PaddlePaddle/Paddle/pull/38329), [#38410](https://github.com/PaddlePaddle/Paddle/pull/38410)）
   
 - Optimize `Categorical`the probs computation, simplify the computation logic, and improve the performance by 4x to 5x. ([#42178](https://github.com/PaddlePaddle/Paddle/pull/42178))
-  
+
+- Optimize the `paddle.sum ` performance, with performance improvement by about 20%.  ([#42309](https://github.com/PaddlePaddle/Paddle/pull/42309))
+
+- Remove CudaStreamSync operation from `paddle.nn.ClipGradByGlobalNorm ` to reduce scheduling overhead during execution, with 5% performance improvement on ptb models. ([#42170](https://github.com/PaddlePaddle/Paddle/pull/42170))
+
+- Optimize a series of underlying data structures and detailed implementations in the original dynamic graph execution system to improve the scheduling performance of the original dynamic graph. ([#42010](https://github.com/PaddlePaddle/Paddle/pull/42010), [#42171](https://github.com/PaddlePaddle/Paddle/pull/42171), [#42224](https://github.com/PaddlePaddle/Paddle/pull/42224), [#42256](https://github.com/PaddlePaddle/Paddle/pull/42256), [#42306](https://github.com/PaddlePaddle/Paddle/pull/42306), [#42329](https://github.com/PaddlePaddle/Paddle/pull/42329)[, #42340](https://github.com/PaddlePaddle/Paddle/pull/42340), [#42368](https://github.com/PaddlePaddle/Paddle/pull/42368), [#42425](https://github.com/PaddlePaddle/Paddle/pull/42425)）
+
+- Simplify the probs calculation logics of `paddle.distribution.Categorical `, to improve performance by 4x to 5x.  ([#42178](https://github.com/PaddlePaddle/Paddle/pull/42178))
 
 ### **(4) Bug fixing**
 
@@ -1632,6 +1781,27 @@ In order to solve the problem that the original static graph executor of the Pad
   
 - Fix the `axis` computation error in `paddle.fft` series of APIs. ([#36321](https://github.com/PaddlePaddle/Paddle/pull/36321))
   
+- Fix an output data type registration bug of batch_norm_grad op in case of FP16 data type. This bug causes the compilation failure in some scenarios. There is also the impact on FP16 computational precision. ([#42461](https://github.com/PaddlePaddle/Paddle/pull/42461))
+
+- Fix the incorrect Infershape information bug in the `paddle.nn.functional.pad ` API when the padding is Tensor in dynamic to static conversion. ([#42414](https://github.com/PaddlePaddle/Paddle/pull/42414))
+
+- Fix an exception in `paddle.distribution.StickBreakingTransform ` when the input dimension exceeds 2. ([#41762](https://github.com/PaddlePaddle/Paddle/pull/41672))
+
+- Fix a nan/inf bug calculated with QK^T in fused_attention op. ([#42032](https://github.com/PaddlePaddle/Paddle/pull/42032))
+
+- Fix a nan/inf bug calculated in fused_attention op with FusedResidualDropoutBias on V100. ([#42398](https://github.com/PaddlePaddle/Paddle/pull/42398))
+
+- Fix a redundant data transform bug introduced by the full_like op during execution. ([#41973](https://github.com/PaddlePaddle/Paddle/pull/41973)) 
+
+- Fix a problem with p_norm op calculating nan on GPU environments. ([#41804](https://github.com/PaddlePaddle/Paddle/pull/41804)) 
+
+- Fix a section error of split op when the sections parameter has a size of 0. （[#41755](https://github.com/PaddlePaddle/Paddle/pull/41755)）
+
+- Fix the bug of reporting not supporting Place (gpu:0) in multi-card training when broadcast is required in 6 elementwise ops (pow, complex, divide_double, multiply_double, fmax, and fmin). ([#42332](https://github.com/PaddlePaddle/Paddle/pull/42332))
+
+- Fix the bug that the deprecated interface reports a warning in case of `import paddle` due to a PIL version update. ([#42307](https://github.com/PaddlePaddle/Paddle/pull/42307))
+
+- Fix the bug that `paddle.linalg.matrix_rank ` does not support tol as FP64 Tensor under static graph. ([#42085](https://github.com/PaddlePaddle/Paddle/pull/42085)) 
 
 #### IR(Intermediate Representation)
 
@@ -1658,6 +1828,10 @@ In order to solve the problem that the original static graph executor of the Pad
   - Fix the code conversion bug when returning a single value in control flow For. ([#40683](https://github.com/PaddlePaddle/Paddle/pull/40683))
     
   - Fix the bug when generating a reverse op when the input to conditional_block op contains LoDTensorArray. ([#39585](https://github.com/PaddlePaddle/Paddle/pull/39585))
+  
+  - Fix the bug that `padddle.jit.save ` loses the forward_pre_hook and forward_post_hook of the top Layer in case of the export of a dynamic-to-static model. ([#42273](https://github.com/PaddlePaddle/Paddle/pull/42273))
+
+  - Fix the dynamic to static conversion error report where the shape parameter in `paddle.expand ` contains a Tensor. ([#41973](https://github.com/PaddlePaddle/Paddle/pull/41973))
     
 
 #### **Distributed Training**
@@ -1814,17 +1988,13 @@ In order to solve the problem that the original static graph executor of the Pad
   
 - Fix the logic error when Expand_As op computes the output shape. ([#38677](https://github.com/PaddlePaddle/Paddle/pull/38677))
   
-- Frame function fixing
-  
-  - Fix the bug that the variables of the `core.VarDesc.VarType.STRINGS` type report error when getting the `lod_level` property and setting its `lod_level` to None. ([#39077](https://github.com/PaddlePaddle/Paddle/pull/39077))
-    
-  - Fix an issue where the framework function `Pylayer` does not support different dtypes. ([#37974](https://github.com/PaddlePaddle/Paddle/pull/37974))
-    
-- API fixing
+- Fix the bug that the variables of the `core.VarDesc.VarType.STRINGS` type report error when getting the `lod_level` property and setting its `lod_level` to None. ([#39077](https://github.com/PaddlePaddle/Paddle/pull/39077))
+
+- Fix an issue where the framework function `Pylayer` does not support different dtypes. ([#37974](https://github.com/PaddlePaddle/Paddle/pull/37974))
   
-  - Fix the bug of division by zero of the learning rate decay API `paddle.optimizer.lr.PolynomialDecay`. ([#38782](https://github.com/PaddlePaddle/Paddle/pull/38782))
-    
-  - Fix the issue where some logs remained after calling the DisableGlogInfo() interface. ([#36356](https://github.com/PaddlePaddle/Paddle/pull/36356))
+- Fix the bug of division by zero of the learning rate decay API `paddle.optimizer.lr.PolynomialDecay`. ([#38782](https://github.com/PaddlePaddle/Paddle/pull/38782))
+
+- Fix the issue where some logs remained after calling the DisableGlogInfo() interface. ([#36356](https://github.com/PaddlePaddle/Paddle/pull/36356))
     
 - Fix an error in backward of multi-layer RNN (when dropout is set to 0) in the training of SimpleRNN, GRU and LSTM API CPU. ([#37080](https://github.com/PaddlePaddle/Paddle/pull/37080))
   
@@ -1833,6 +2003,10 @@ In order to solve the problem that the original static graph executor of the Pad
 - Enable the shifts parameter of `paddle.roll` to support transfer in Tensor. ([#36727](https://github.com/PaddlePaddle/Paddle/pull/36727))
   
 - Add onemkl to fft as an optional computation backend. ([#36414](https://github.com/PaddlePaddle/Paddle/pull/36414))
+
+- Fix the precision bug in the bfloat16 type under two mamtul_v2 and elementwise_div ops. ([#42479](https://github.com/PaddlePaddle/Paddle/pull/42479)) 
+
+- Fix a possible error in the next step caused by LoDTensorArray clearing only the internal Tensor and not clearing the Array during device memory recycling. ([#42398](https://github.com/PaddlePaddle/Paddle/pull/42398))
   
 
 ## **4. Deployment Direction (Paddle Inference)**
@@ -1925,6 +2099,10 @@ In order to solve the problem that the original static graph executor of the Pad
 - Add TensorRT fuse pass: preln_embedding_eltwise_layernorm_fuse_pass, preln_skip_layernorm_fuse_pass, for ERNIE-like model performance optimization. ([#39508](https://github.com/PaddlePaddle/Paddle/pull/39508))
   
 - Split matmul fusion-related passes based on different backends (GPU, CPU, TensorRT), to support transpose function for FC weights. ([#39369](https://github.com/PaddlePaddle/Paddle/pull/39369))
+
+- Add the support to TensorRT by roll, strided_slice, and slice op in case of dynamic shapes.  ([#41913](https://github.com/PaddlePaddle/Paddle/pull/41913), [#41573](https://github.com/PaddlePaddle/Paddle/pull/41573), [#41467](https://github.com/PaddlePaddle/Paddle/pull/41467))
+
+- Add div op support for TensorRT.  ([#41243](https://github.com/PaddlePaddle/Paddle/pull/41243))
   
 - Quantization support
   
@@ -1974,6 +2152,7 @@ In order to solve the problem that the original static graph executor of the Pad
   
 - The TensorRT dynamic shape parameter automatically generate the interface, to add the file existence check. ([#36628](https://github.com/PaddlePaddle/Paddle/pull/36628))
   
+- Fix the bug that the MKLDNN does not support conv3d. ([#42055](https://github.com/PaddlePaddle/Paddle/pull/42055))  
 
 #### **Backend Capability Fixing**
 
@@ -2090,7 +2269,7 @@ In order to solve the problem that the original static graph executor of the Pad
 
 ### **Compile and Install**
   
-- From version 2.3.0-rc0, PaddlePaddle has adjusted and upgraded the types of GPU architectures supported by the framework. (For more information, please refer to: [GPU architectures supported by PaddlePaddle](https://www.paddlepaddle.org.cn/documentation/docs/zh/2.3rc/install/Tables.html#gpu))
+- From version 2.3.0, PaddlePaddle has adjusted and upgraded the types of GPU architectures supported by the framework. (For more information, please refer to: [GPU architectures supported by PaddlePaddle](https://www.paddlepaddle.org.cn/documentation/docs/zh/2.3rc/install/Tables.html#gpu))
   
 
 Notes:
@@ -2115,6 +2294,8 @@ Notes:
       
     - CUDA11 : 3.5, 5.0, 6.0, 6.1, 7.0, 7.5, 8.0。
       
+- Support Python 3.10. Fix compilation bugs caused by some PythonC API changes on Windows. ([#41180](https://github.com/PaddlePaddle/Paddle/pull/42180))
+
 - The Windows platform supports the compilation through Visual Studio 2019. ([#38719](https://github.com/PaddlePaddle/Paddle/pull/38719))
   
 - Eliminate various warnings when compiling on the Windows platform. ([#38034](https://github.com/PaddlePaddle/Paddle/pull/38034), [#37890](https://github.com/PaddlePaddle/Paddle/pull/37890), [#37442](https://github.com/PaddlePaddle/Paddle/pull/37442), [#37439](https://github.com/PaddlePaddle/Paddle/pull/37439), [#36857](https://github.com/PaddlePaddle/Paddle/pull/36857))
@@ -2132,7 +2313,7 @@ Notes:
   
 - Support cambricon MLU chip (MLU370x4) training/inference. Support models such as ResNet50. Support static graph + dynamic graph training. Support auto-mixed precision training. Support single card, and distribute training across multiple cards, multiple machines.
   
-- Support KUNLUNXIN 2 chips (Kunlunxin AI acceleration cards R200, R300) training/inference. Support ResNet50, YoloV3, OCR-DB, SSD, MobilnetV3, UNet, BERT, Transformer, GPT-2, Wide&Deep, and DeepFM. Support static graph + dynamic graph training. Support auto-mixed precision training. Support single card, and distribute training across multiple cards, multiple machines.
+- Support KUNLUNXIN 2 chips (KUNLUNXIN AI acceleration cards R200, R300) training/inference. Support ResNet50, YoloV3, OCR-DB, SSD, MobilnetV3, UNet, BERT, Transformer, GPT-2, Wide&Deep, and DeepFM. Support static graph + dynamic graph training. Support auto-mixed precision training. Support single card, and distribute training across multiple cards, multiple machines.
   
 
 ## Thanks to our Contributors