Skip to content

Commit d28bf47

Browse files
TCChenlongjzhang533
andcommitted
Add 2 3 1 release note (PaddlePaddle#5068)
* Update release_note_cn.md * Update release_note_en.md * Update release_note_en.md * Update release_note_cn.md * Update release_note_cn.md * Update release_note_en.md * add 2.3.1 release note * Update release_note_cn.md Co-authored-by: jzhang533 <[email protected]>
1 parent 100248f commit d28bf47

File tree

2 files changed

+232
-0
lines changed

2 files changed

+232
-0
lines changed

docs/release_note_cn.md

Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,120 @@
11

2+
# 2.3.1 Release Note
3+
4+
## 1. 重要更新
5+
6+
- 2.3.1 版本是在 2.3 版本的基础上修复了已知问题,并且发布了支持 CUDA 11.6 的安装包。
7+
8+
## 2. 训练框架(含分布式)
9+
10+
### (1)功能优化
11+
12+
#### API
13+
14+
- 修改`paddle.nn.initializer.KaimingUniform``paddle.nn.initializer.KaimingNormal` 两种初始化方式,使其支持多种类型的激活函数。([#43721](https://github.com/PaddlePaddle/Paddle/pull/43721), [#43827](https://github.com/PaddlePaddle/Paddle/pull/43827))
15+
- 优化 `paddle.io.DataLoader` 的数据预读取功能,使其支持设置了 `prefetch_factor` 设定的预读取数据的缓存数量,避免在读取大块数据时出现 IO 阻塞。([#43674](https://github.com/PaddlePaddle/Paddle/pull/43674) )
16+
17+
#### 新动态图执行机制
18+
19+
- 修改新动态图 API 逻辑中 optional 类型 Tensor 的初始化方法,防止被提前析构导致数据异常。([#42561](https://github.com/PaddlePaddle/Paddle/pull/42561))
20+
21+
#### 全新静态图执行器
22+
23+
- 延迟初始化执行器中的线程池,避免只执行一轮的 `program`(如 `save、load、startup_program`等)创建线程池。([#43768](https://github.com/PaddlePaddle/Paddle/pull/43768))
24+
25+
#### 混合精度训练
26+
27+
- 设置 `paddle.nn.Layer``set_state_dict`中禁用 `state_dict` hook。([#43407](https://github.com/PaddlePaddle/Paddle/pull/43407))
28+
29+
#### 分布式训练
30+
31+
- 使 `paddle.incubate.nn.functional.fused_attention``paddle.incubate.nn.functional.fused_feedforward`支持张量模型并行。([#43505](https://github.com/PaddlePaddle/Paddle/pull/43505))
32+
33+
#### 其他
34+
35+
- 调整框架算子内核打印字符串的格式,便于进行自动化拆分解析。([#42931](https://github.com/PaddlePaddle/Paddle/pull/42931))
36+
- 更新模型量化 API,支持`rounding to nearest ties to even`的四舍五入方式,支持量化取值范围 [-128, 127]。([#43829](https://github.com/PaddlePaddle/Paddle/pull/43829))
37+
- 量化感知训练适配支持 AMP 混合精度训练。([#43689](https://github.com/PaddlePaddle/Paddle/pull/43689))
38+
- 量化感知训练在启动时新增 `progress bar`,便于查看量化初始化进度,统计 out_threshold 时跳过 scale op,加速初始化过程。([#43454](https://github.com/PaddlePaddle/Paddle/pull/43454))
39+
- 动态图量化训练支持 `conv``bn` 融合,静态图离线量化支持设置 `skip_tensor_list` 来跳过某些层不做量化。([#43301](https://github.com/PaddlePaddle/Paddle/pull/43301))
40+
41+
### (2)性能优化
42+
43+
- 优化`paddle.incubate.nn.functional.fused_attention``paddle.incubate.nn.functional.fused_feedforward`算子,增加`add_residual`属性,用以控制最后一步是否进行加`residual`操作,CAE 模型性能提升 7.7%。([#43719](https://github.com/PaddlePaddle/Paddle/pull/43719))
44+
- 优化 `linspace` 算子,将 `start``stop``num`三个输入 Tensor 初始化在 CPU 上,避免在算子中进行 GPU -> CPU 拷贝,SOLOv2 模型性能提升6%。([#43746](https://github.com/PaddlePaddle/Paddle/pull/43746))
45+
46+
### (3)问题修复
47+
48+
#### API
49+
50+
- 修复 `paddle.io.DataLoader``return_list=True` 时因多线程冲突小概率报错问题。([#43691](https://github.com/PaddlePaddle/Paddle/pull/43691))
51+
- 修复 `paddle.nn.Layer`的参数存在 `None`类型参数时 `to`方法报 NoneType 不存在 device 属性的错误。([#43597](https://github.com/PaddlePaddle/Paddle/pull/43597))
52+
- 修复 cumsum op 在某些 `shape`下计算结果出错的问题。 ([#42500](https://github.com/PaddlePaddle/Paddle/pull/42500), [#43777](https://github.com/PaddlePaddle/Paddle/pull/43777))
53+
- 修复静态图下 `Tensor.__getitem__`在使用 `bool`索引时组网阶段输出结果维度为 0 的问题。 ([#43246](https://github.com/PaddlePaddle/Paddle/pull/43246))
54+
- 修复 `paddle.slice``paddle.strided_slice` 处理参数为负数时出现异常的问题。([#43432](https://github.com/PaddlePaddle/Paddle/pull/43432))
55+
- 修复 set_value op 在处理切片 `step`为负数时赋值结果异常的问题。 ([#43694](https://github.com/PaddlePaddle/Paddle/pull/43694))
56+
- 修复 C++ 端 `copy`接口不能在多卡设备间拷贝的问题。([#43728](https://github.com/PaddlePaddle/Paddle/pull/43728))
57+
- 修改 `paddle.incubate.nn.functional.fused_attention``paddle.incubate.nn.functional.fused_feedforward` 中属性命名引发的推理时的问题。([#43505](https://github.com/PaddlePaddle/Paddle/pull/43505))
58+
- 修复 ConditionalBlockGrad op 处理不需要 `grad`的 Tensor 时异常的问题。([#43034](https://github.com/PaddlePaddle/Paddle/pull/43034))
59+
- 解决 C++ 的 einsum op 反向速度优化引起的显存增加问题,并将反向优化默认打开。([#43397](https://github.com/PaddlePaddle/Paddle/pull/43397))
60+
- 修复单卡下 `paddle.io.DataLoader`多进程数据读取在固定随机种子时数据无法固定的问题。([#43702](https://github.com/PaddlePaddle/Paddle/pull/43702))
61+
- 修复 softmax op 在 Tensor 元素超过 2G 时,触发 CUDNN_STATUS_NOT_SUPPORT 的错误。([#43719](https://github.com/PaddlePaddle/Paddle/pull/43719))
62+
- 修复 trace op `Event` 字符串在不同算子无区分,导致性能分析不便利的问题。([#42789](https://github.com/PaddlePaddle/Paddle/pull/42789))
63+
64+
#### 其他
65+
66+
- 修复动转静多次 deepcopy 并保存导致的显存溢出问题。([#43141](https://github.com/PaddlePaddle/Paddle/pull/43141))
67+
- 修复自定义算子中使用的 PlaceType 类型升级引入的 device id 在多卡场景中出错的问题。([#43830](https://github.com/PaddlePaddle/Paddle/pull/43830))
68+
- 优化 `paddle.profiler.Profiler` timeline 可视化逻辑,将在 python 脚本中自定义的事件从 C++ 折叠层显示移动至 python 折叠层显示。([#42790](https://github.com/PaddlePaddle/Paddle/pull/42790))
69+
70+
## 3. 部署方向(Paddle Inference)
71+
72+
### (1)新增特性
73+
74+
#### 新增功能
75+
76+
- CPU 上 ONNX Runtime 后端新增 PaddleSlim 量化模型支持。 ([#43774](https://github.com/PaddlePaddle/Paddle/pull/43774), [#43796](https://github.com/PaddlePaddle/Paddle/pull/43796))
77+
78+
### (2)底层优化
79+
80+
#### CPU性能优化
81+
82+
- EnableMkldnn 配置中移除 `gpu_cpu_reshape2_matmul_fuse_pass`,修复 ResNet50 性能下降的问题。 ([#43750](https://github.com/PaddlePaddle/Paddle/pull/43750))
83+
84+
#### GPU 性能优化
85+
86+
- 添加 `bilinear_interp_v2` TensorRT convert 支持。 ([#43618](https://github.com/PaddlePaddle/Paddle/pull/43618))
87+
- 添加 `matmul_scale_fuse_pass``multihead_matmul_fuse_pass_v3`到 GPU pass,并添加单测。([#43765](https://github.com/PaddlePaddle/Paddle/pull/43765))
88+
- 添加 GPU handle 延迟初始化支持。 ([#43661](https://github.com/PaddlePaddle/Paddle/pull/43661))
89+
90+
### (3)问题修复
91+
92+
#### 框架及API修复
93+
94+
- 修复联编 Paddle-Lite XPU 时的编译报错问题。([#43178](https://github.com/PaddlePaddle/Paddle/pull/43178))
95+
- 修复 ERNIE 3.0 pass误触发的问题。([#43948](https://github.com/PaddlePaddle/Paddle/pull/43948))
96+
- 修复 multihead op 中 int8 量化属性读不到的问题。([#43020](https://github.com/PaddlePaddle/Paddle/pull/43020))
97+
98+
#### 后端能力修复
99+
100+
- 修复 MKLDNN 中 elementwise_mul 和 matmul 两个 op 在运行量化推理过程中崩溃的问题。 ([#43725](https://github.com/PaddlePaddle/Paddle/pull/43725))
101+
- 修复同一模型在推理时 TensorRT 子图序列化文件反复生成的问题。([#42945](https://github.com/PaddlePaddle/Paddle/pull/43945), [#42633](https://github.com/PaddlePaddle/Paddle/pull/42633))
102+
- 修复 ONNX Runtime 后端与外部使用的 protobuf 冲突问题。([#43159](https://github.com/PaddlePaddle/Paddle/pull/43159), [#43742](https://github.com/PaddlePaddle/Paddle/pull/43742))
103+
- 修复 python 预测库 ONNX Runtime 后端在多输入情况下推理报错问题。 ([#43621](https://github.com/PaddlePaddle/Paddle/pull/43621))
104+
105+
## 4. 环境适配
106+
107+
### 编译安装
108+
109+
- 完成对 CUDA 11.6 的验证和适配,并在官网发布 CUDA 11.6 的安装包。([#43935](https://github.com/PaddlePaddle/Paddle/pull/43935), [#44005](https://github.com/PaddlePaddle/Paddle/pull/44005))
110+
- 修复在 Windows 上使用 CUDA 11.6 编译时的 cub 报错问题。([#43935](https://github.com/PaddlePaddle/Paddle/pull/43935), [#44005](https://github.com/PaddlePaddle/Paddle/pull/44005))
111+
- 修复 elementwise、reduce op 编译时间较长的问题。([#43202](https://github.com/PaddlePaddle/Paddle/pull/43202), [#42779](https://github.com/PaddlePaddle/Paddle/pull/42779), [#43205](https://github.com/PaddlePaddle/Paddle/pull/43205))
112+
113+
### 新硬件适配
114+
115+
- 寒武纪 MLU 支持飞桨 Profiler。([#42115](https://github.com/PaddlePaddle/Paddle/pull/42115))
116+
- GraphCore IPU 支持显示编译进度。([#42078](https://github.com/PaddlePaddle/Paddle/pull/42078))
117+
2118
# 2.3.0 Release Note
3119

4120
## 1. 重要更新

docs/release_note_en.md

Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,120 @@
11

2+
# 2.3.1 Release Note
3+
4+
## **1. Important Updates**
5+
6+
- V2.3.1 is built on V2.3 by fixing known issues and releasing precompiled binary that supports CUDA 11.6.
7+
8+
## **2. Training Framework (distributed included)**
9+
10+
### **(1) Function Optimization**
11+
12+
#### API
13+
14+
- Modify two initialization modes of `paddle.nn.initializer.KaimingUniform` and `paddle.nn.initializer.KaimingNormal`, to support multiple types of activation functions. ([#43721](https://github.com/PaddlePaddle/Paddle/pull/43721), [#43827](https://github.com/PaddlePaddle/Paddle/pull/43827))
15+
- Optimize the data pre-fetching function of `paddle.io.DataLoader`, so that it can support the setting of the `prefetch_factor` to set the cache size of pre-fetched data. This can avoid IO blocking when reading large blocks of data. ([#43674](https://github.com/PaddlePaddle/Paddle/pull/43674))
16+
17+
#### **New dynamic graph execution mechanism**
18+
19+
- Modify the initialization method of optional type Tensor in the new dynamic graph API logic to prevent data exceptions caused by early destruction. ([#42561](https://github.com/PaddlePaddle/Paddle/pull/42561))
20+
21+
#### **New static graph executor**
22+
23+
- Defer initialization of the thread pools in the executor, to avoid creating thread pools for `programs` that execute only once (e.g.,`save, load, startup_program`, etc.). ([#43768](https://github.com/PaddlePaddle/Paddle/pull/43768))
24+
25+
#### **Mixed precision training**
26+
27+
- Disabling `state_dict` hook in `set_state_dict` in `paddle.nn.Layer`. ([#43407](https://github.com/PaddlePaddle/Paddle/pull/43407))
28+
29+
#### **Distributed training**
30+
31+
- Enabling tensor parallelism in `paddle.incubate.nn.functional.fused_attention` and `paddle.incubate.nn.functional.fused_feedforward`. ([#43505](https://github.com/PaddlePaddle/Paddle/pull/43505))
32+
33+
#### **Others**
34+
35+
- Adjust print format of the framework operator kernels to facilitate automated splitting and parsing. ([#42931](https://github.com/PaddlePaddle/Paddle/pull/42931))
36+
- Update the model quantization API to support the round-off in `rounding to nearest ties to even`, and support quantization in the range [-128, 127]. ([#43829](https://github.com/PaddlePaddle/Paddle/pull/43829))
37+
- Support AMP mixed precision training in quantization-aware training. ([#43689](https://github.com/PaddlePaddle/Paddle/pull/43689))
38+
- Add the `progress bar` at the beginning of quantization-aware training, so that it is easy to check the progress of quantization initialization. Skip the scale op when counting out_threshold to speed up the initialization process. ([#43454](https://github.com/PaddlePaddle/Paddle/pull/43454))
39+
- Support `conv` and `bn` fusion in the dynamic graph quantization training. Support the settings of skip_tensor_list in the static graph offline quantization, to skip some layers without quantization. ([#43301](https://github.com/PaddlePaddle/Paddle/pull/43301))
40+
41+
### **(2) Performance Optimization**
42+
43+
- Optimize`paddle.incubate.nn.functional.fused_attention` and `paddle.incubate.nn.functional.fused_feedforward`operators. Add `add_residual` property to control whether to perform add-`residual` operation in the last step. The performance of CAE model is improved by 7.7%. ([#43719](https://github.com/PaddlePaddle/Paddle/pull/43719))
44+
- Optimize `linspace` operator. Initialize three input Tensor of `start`,`stop` and `num` on CPU, to avoid GPU->CPU copy in the operator. This can speed up SOLOv2 model performance by 6%. ([#43746](https://github.com/PaddlePaddle/Paddle/pull/43746))
45+
46+
### **(3) Bug Fix**
47+
48+
#### API
49+
50+
- Fix the error reported by `paddle.io.DataLoader` when `return_list=True` due to multi-thread conflict. ([#43691](https://github.com/PaddlePaddle/Paddle/pull/43691))
51+
- Fix the error that the `to` method reports NoneType does not have the device attribute when the `paddle.nn.Layer` parameter has the `None` type parameter. ([#43597](https://github.com/PaddlePaddle/Paddle/pull/43597))
52+
- Fix the bug that the calculation result of cumsum op is wrong in some `shape` settings. ([#42500](https://github.com/PaddlePaddle/Paddle/pull/42500), [#43777](https://github.com/PaddlePaddle/Paddle/pull/43777))
53+
- Fix the bug that the output result dimension of `Tensor.__getitem__` is 0 in the networking stage when using `bool` index in the static graph.([#43246](https://github.com/PaddlePaddle/Paddle/pull/43246))
54+
- Fix the bug occurred when `paddle.slice` and `paddle.strided_slice` handle negative parameters. ([#43432](https://github.com/PaddlePaddle/Paddle/pull/43432))
55+
- Fix the bug that the assignment result of set_value op is abnormal when the processing slice `step` is negative. ([#43694](https://github.com/PaddlePaddle/Paddle/pull/43694))
56+
- Fix the bug that the `copy` interface in C++ cannot copy between multiple cards. ([#43728](https://github.com/PaddlePaddle/Paddle/pull/43728))
57+
- Fix the bug in inference stage caused by attribute naming in `paddle.incubate.nn.functional.fused_attention`and `paddle.incubate.nn.functional.fused_feedforward` . ([#43505](https://github.com/PaddlePaddle/Paddle/pull/43505))
58+
- Fix an exception in ConditionalBlockGrad op when processing Tensor that does not require `grad`. ([#43034](https://github.com/PaddlePaddle/Paddle/pull/43034))
59+
- Fix the bug of device memory increase caused by einsum op in the speed optimization of backward computation. By default, this optimization is enabled. ([#43397](https://github.com/PaddlePaddle/Paddle/pull/43397))
60+
- Fix the bug that data fails to be fixed when `paddle.io.DataLoader` multi-process data reads the fixing random seeds under a single card. ([#43702](https://github.com/PaddlePaddle/Paddle/pull/43702))
61+
- Fix the bug that softmax op triggers CUDNN_STATUS_NOT_SUPPORT when the Tensor exceeds 2G. ([#43719](https://github.com/PaddlePaddle/Paddle/pull/43719))
62+
- Fix the bug that the trace op `Event` string is indistinguishable among different operators that cause the inconvenient performance analysis. ([#42789](https://github.com/PaddlePaddle/Paddle/pull/42789))
63+
64+
#### **Others**
65+
66+
- Fix the bug of overflowing device memory caused by multiple deepcopy and saving in case of dynamic-to-static. ([#43141](https://github.com/PaddlePaddle/Paddle/pull/43141))
67+
- Fix the bug that the device id introduced by the upgrade of PlaceType used in the custom operator is wrong in the multi-card scenario.([#43830](https://github.com/PaddlePaddle/Paddle/pull/43830))
68+
- Optimize the `paddle.profiler.Profiler` timeline visualization logic, move events customized in python scripts from C++ folding display to python folding display. ([#42790](https://github.com/PaddlePaddle/Paddle/pull/42790))
69+
70+
## **3.** Deployment Direction (Paddle Inference)
71+
72+
### **(1) New Features**
73+
74+
#### **New functions**
75+
76+
- Add the support of the PaddleSlim quantization model for ONNX Runtime backends on CPUs. ([#43774](https://github.com/PaddlePaddle/Paddle/pull/43774), [#43796](https://github.com/PaddlePaddle/Paddle/pull/43796))
77+
78+
### **(2) Underlying Optimization**
79+
80+
#### **CPU performance optimization**
81+
82+
- Remove `gpu_cpu_reshape2_matmul_fuse_pass` from EnableMkldnn configuration to fix the bug of ResNet50 performance degradation. ([#43750](https://github.com/PaddlePaddle/Paddle/pull/43750))
83+
84+
#### **GPU performance optimization**
85+
86+
- Add the support of `bilinear_interp_v2` TensorRT convert. ([#43618](https://github.com/PaddlePaddle/Paddle/pull/43618))
87+
- Add `matmul_scale_fuse_pass` and `multihead_matmul_fuse_pass_v3` to GPU pass. ([#43765](https://github.com/PaddlePaddle/Paddle/pull/43765))
88+
- Add the support of the GPU handle deferred initialization. ([#43661](https://github.com/PaddlePaddle/Paddle/pull/43661))
89+
90+
### **(3) Bug Fixing**
91+
92+
#### **Framework and API fixing**
93+
94+
- Fix the compile error problem when binding Paddle-Lite XPU. ([#43178](https://github.com/PaddlePaddle/Paddle/pull/43178))
95+
- Fix the bug of false trigger of ERNIE 3.0 pass. ([#43948](https://github.com/PaddlePaddle/Paddle/pull/43948))
96+
- Fix the bug that int8 quantization attribute in multihead op cannot be read. ([#43020](https://github.com/PaddlePaddle/Paddle/pull/43020))
97+
98+
#### **Backend capability fixing**
99+
100+
- Fix the bug that two ops of elementwise_mul and matmul in MKLDNN are crashed during quantitative inference. ([#43725](https://github.com/PaddlePaddle/Paddle/pull/43725))
101+
- Fix a bug where TensorRT subgraph serialization files are repeatedly generated for the same model during inference. ([#42945](https://github.com/PaddlePaddle/Paddle/pull/43945), [#42633](https://github.com/PaddlePaddle/Paddle/pull/42633))
102+
- Fix a conflict between the ONNX Runtime backend and the externally use of protobuf. ([#43159](https://github.com/PaddlePaddle/Paddle/pull/43159), [#43742](https://github.com/PaddlePaddle/Paddle/pull/43742))
103+
- Fix an error reported by python prediction library when using ONNX Runtime backend in case of multiple inputs. ([#43621](https://github.com/PaddlePaddle/Paddle/pull/43621))
104+
105+
## **4. Environment Adaptation**
106+
107+
### **Compile and install**
108+
109+
- Complete verification and adaptation of CUDA 11.6, and release CUDA 11.6 precompiled binary. ([#43935](https://github.com/PaddlePaddle/Paddle/pull/43935), [#44005](https://github.com/PaddlePaddle/Paddle/pull/44005))
110+
- Fix a cub error when compiling with CUDA 11.6 on Windows. ([#43935](https://github.com/PaddlePaddle/Paddle/pull/43935), [#44005](https://github.com/PaddlePaddle/Paddle/pull/44005))
111+
- Fix the bug of long compilation time for elementwise and reduce op. ([#43202](https://github.com/PaddlePaddle/Paddle/pull/43202), [#42779](https://github.com/PaddlePaddle/Paddle/pull/42779), [#43205](https://github.com/PaddlePaddle/Paddle/pull/43205))
112+
113+
### **New hardware adaptation**
114+
115+
- Cambricon MLU supports PaddlePaddle Profiler. ([#42115](https://github.com/PaddlePaddle/Paddle/pull/42115))
116+
- GraphCore IPU supports visualization of compilation progress. ([#42078](https://github.com/PaddlePaddle/Paddle/pull/42078))
117+
2118
# 2.3.0 Release Note
3119

4120
## 1. **Important Updates**

0 commit comments

Comments
 (0)