Skip to content

Commit 0e26f5f

Browse files
authored
Merge pull request #6 from PaddlePaddle/develop
mgl
2 parents 45bbc6e + 4c9c3d7 commit 0e26f5f

File tree

200 files changed

+4956
-2120
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

200 files changed

+4956
-2120
lines changed

doc/fluid/advanced_usage/best_practice/dist_training_gpu.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,8 @@ PaddlePaddle Fluid可以支持在现代GPU [#]_ 服务器集群上完成高性
2424
:header: "调节项", "可选值说明", "配置方法"
2525
:widths: 3, 3, 5
2626
27-
"通信模式", "pserver模式;NCCL2模式(collective [#]_ )", "配置方法参考: `这里 <../../user_guides/howto/training/cluster_howto.html#permalink-8--nccl2->`_ "
28-
"执行模式", "单进程;单进程ParallelGraph;多进程", "配置方法参考: `这里 <../../user_guides/howto/training/cluster_howto.html#permalink-9--nccl2->`_ "
27+
"通信模式", "pserver模式;NCCL2模式(collective [#]_ )", "配置方法参考::ref:`cluster_howto`"
28+
"执行模式", "单进程;单进程ParallelGraph;多进程", "配置方法参考::ref:`cluster_howto`"
2929
"同步AllReduce操作", "开启则使每次调用等待AllReduce同步", "设置环境变量 :code:`FLAGS_sync_nccl_allreduce`"
3030
"CPU线程数", "int值,配置使用的CPU线程数", "参考本片后续说明"
3131
"预先分配足够的显存", "0~1之间的float值,预先分配显存的占比", "设置环境变量 :code:`FLAGS_fraction_of_gpu_memory_to_use`"
@@ -41,7 +41,7 @@ PaddlePaddle Fluid可以支持在现代GPU [#]_ 服务器集群上完成高性
4141
选择通信模式和执行模式
4242
+++++++++++++++++++
4343

44-
GPU分布式训练场景,使用多进程+NCCL2模式(collective)通常可以获得最好的性能。参考 `这里 <../../user_guides/howto/training/cluster_howto.html#permalink-8--nccl2->`_ 配置您的程序使用多进程NCCL2模式训练。
44+
GPU分布式训练场景,使用多进程+NCCL2模式(collective)通常可以获得最好的性能。参考 :ref:`cluster_howto` 配置您的程序使用多进程NCCL2模式训练。
4545

4646
在进程模式下,每台服务器的每个GPU卡都会对应启动一个训练进程,
4747
集群中的所有进程之间会互相通信完成训练。以此方式最大限度的降低进程内部资源抢占的开销。

doc/fluid/advanced_usage/deploy/inference/index_cn.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ PaddlePaddle Fluid 提供了 C++ API 来支持模型的部署上线
99

1010
build_and_install_lib_cn.rst
1111
native_infer.md
12+
python_infer_cn.md
1213
paddle_tensorrt_infer.md
1314
paddle_gpu_benchmark.md
1415
windows_cpp_inference.md

doc/fluid/advanced_usage/deploy/inference/paddle_tensorrt_infer.md

Lines changed: 26 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -12,20 +12,22 @@ NVIDIA TensorRT 是一个高性能的深度学习预测库,可为深度学习
1212

1313
## <a name="编译Paddle-TRT预测库">编译Paddle-TRT预测库</a>
1414

15-
**使用Docker编译预测库**
15+
**使用Docker编译预测库**
16+
17+
TensorRT预测库目前仅支持使用GPU编译。
1618

1719
1. 下载Paddle
18-
20+
1921
```
2022
git clone https://github.com/PaddlePaddle/Paddle.git
2123
```
22-
24+
2325
2. 获取docker镜像
24-
26+
2527
```
2628
nvidia-docker run --name paddle_trt -v $PWD/Paddle:/Paddle -it hub.baidubce.com/paddlepaddle/paddle:latest-dev /bin/bash
2729
```
28-
30+
2931
3. 编译Paddle TensorRT
3032

3133
```
@@ -43,15 +45,15 @@ NVIDIA TensorRT 是一个高性能的深度学习预测库,可为深度学习
4345
-DWITH_PYTHON=OFF \
4446
-DTENSORRT_ROOT=/usr \
4547
-DON_INFER=ON
46-
48+
4749
# 编译
4850
make -j
4951
# 生成预测库
5052
make inference_lib_dist -j
5153
```
52-
54+
5355
编译后的库的目录如下:
54-
56+
5557
```
5658
fluid_inference_install_dir
5759
├── paddle
@@ -61,12 +63,12 @@ NVIDIA TensorRT 是一个高性能的深度学习预测库,可为深度学习
6163
├── third_party
6264
├── boost
6365
├── install
64-
└── engine3
66+
└── engine3
6567
```
66-
68+
6769
`fluid_inference_install_dir`下, paddle目录包含了预测库的头文件和预测库的lib, version.txt 中包含了lib的版本和配置信息,third_party 中包含了预测库依赖的第三方库
6870

69-
## <a name="Paddle-TRT接口使用">Paddle-TRT接口使用</a>
71+
## <a name="Paddle-TRT接口使用">Paddle-TRT接口使用</a>
7072

7173
[`paddle_inference_api.h`]('https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/inference/api/paddle_inference_api.h') 定义了使用TensorRT的所有接口。
7274

@@ -89,13 +91,13 @@ void RunTensorRT(int batch_size, std::string model_dirname) {
8991
AnalysisConfig config(model_dirname);
9092
// config->SetModel(model_dirname + "/model",
9193
// model_dirname + "/params");
92-
94+
9395
config->EnableUseGpu(100, 0 /*gpu_id*/);
9496
config->EnableTensorRtEngine(1 << 20 /*work_space_size*/, batch_size /*max_batch_size*/);
95-
97+
9698
// 2. 根据config 创建predictor
9799
auto predictor = CreatePaddlePredictor(config);
98-
// 3. 创建输入 tensor
100+
// 3. 创建输入 tensor
99101
int height = 224;
100102
int width = 224;
101103
float data[batch_size * 3 * height * width] = {0};
@@ -114,13 +116,13 @@ void RunTensorRT(int batch_size, std::string model_dirname) {
114116

115117
const size_t num_elements = outputs.front().data.length() / sizeof(float);
116118
auto *data = static_cast<float *>(outputs.front().data.data());
117-
for (size_t i = 0; i < num_elements; i++) {
119+
for (size_t i = 0; i < num_elements; i++) {
118120
std::cout << "output: " << data[i] << std::endl;
119121
}
120122
}
121123
} // namespace paddle
122124

123-
int main() {
125+
int main() {
124126
// 模型下载地址 http://paddle-inference-dist.cdn.bcebos.com/tensorrt_test/mobilenet.tar.gz
125127
paddle::RunTensorRT(1, "./mobilenet");
126128
return 0;
@@ -133,9 +135,9 @@ int main() {
133135
```
134136
wget http://paddle-inference-dist.cdn.bcebos.com/tensorrt_test/paddle_trt_samples.tar.gz
135137
```
136-
138+
137139
解压后的目录如下:
138-
140+
139141
```
140142
sample
141143
├── CMakeLists.txt
@@ -146,12 +148,12 @@ int main() {
146148
│ └── params
147149
└── run_impl.sh
148150
```
149-
151+
150152
- `mobilenet_test.cc` 为单线程的程序文件
151153
- `thread_mobilenet_test.cc` 为多线程的程序文件
152154
- `mobilenetv1` 为模型文件
153155
154-
在这里假设预测库的路径为 ``BASE_DIR/fluid_inference_install_dir/`` ,样例所在的目录为 ``SAMPLE_BASE_DIR/sample``
156+
在这里假设预测库的路径为 ``BASE_DIR/fluid_inference_install_dir/`` ,样例所在的目录为 ``SAMPLE_BASE_DIR/sample``
155157
156158
2. 编译样例
157159
@@ -181,10 +183,10 @@ int main() {
181183
# sh run_impl.sh {预测库的地址} {测试脚本的名字} {模型目录}
182184
# 我们随机生成了500个输入来模拟这一过程,建议大家用真实样例进行实验。
183185
sh run_impl.sh BASE_DIR/fluid_inference_install_dir/ fluid_generate_calib_test SAMPLE_BASE_DIR/sample/mobilenetv1
184-
186+
185187
```
186188
运行结束后,在 `SAMPLE_BASE_DIR/sample/build/mobilenetv1` 模型目录下会多出一个名字为trt_calib_*的文件,即校准表。
187-
189+
188190
``` shell
189191
# 执行INT8预测
190192
# 将带校准表的模型文件拷贝到特定地址
@@ -193,7 +195,7 @@ int main() {
193195
```
194196
195197
## <a name="Paddle-TRT子图运行原理">Paddle-TRT子图运行原理</a>
196-
198+
197199
PaddlePaddle采用子图的形式对TensorRT进行集成,当模型加载后,神经网络可以表示为由变量和运算节点组成的计算图。Paddle TensorRT实现的功能是能够对整个图进行扫描,发现图中可以使用TensorRT优化的子图,并使用TensorRT节点替换它们。在模型的推断期间,如果遇到TensorRT节点,Paddle会调用TensoRT库对该节点进行优化,其他的节点调用Paddle的原生实现。TensorRT在推断期间能够进行Op的横向和纵向融合,过滤掉冗余的Op,并对特定平台下的特定的Op选择合适的kenel等进行优化,能够加快模型的预测速度。
198200
199201
下图使用一个简单的模型展示了这个过程:
@@ -208,12 +210,5 @@ int main() {
208210
<img src="https://raw.githubusercontent.com/NHZlX/FluidDoc/add_trt_doc/doc/fluid/user_guides/howto/inference/image/model_graph_trt.png" width="600">
209211
</p>
210212
211-
212-
我们可以在原始模型网络中看到,绿色节点表示可以被TensorRT支持的节点,红色节点表示网络中的变量,黄色表示Paddle只能被Paddle原生实现执行的节点。那些在原始网络中的绿色节点被提取出来汇集成子图,并由一个TensorRT节点代替,成为转换网络中的`block-25` 节点。在网络运行过程中,如果遇到该节点,Paddle将调用TensorRT库来对其执行。
213-
214-
215-
216-
217-
218-
219213
214+
我们可以在原始模型网络中看到,绿色节点表示可以被TensorRT支持的节点,红色节点表示网络中的变量,黄色表示Paddle只能被Paddle原生实现执行的节点。那些在原始网络中的绿色节点被提取出来汇集成子图,并由一个TensorRT节点代替,成为转换网络中的`block-25` 节点。在网络运行过程中,如果遇到该节点,Paddle将调用TensorRT库来对其执行。

doc/fluid/advanced_usage/deploy/inference/paddle_tensorrt_infer_en.md

Lines changed: 21 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -9,23 +9,25 @@ Subgraph is used in PaddlePaddle to preliminarily integrate TensorRT, which enab
99
- [Paddle-TRT example compiling test](#Paddle-TRT example compiling test)
1010
- [Paddle-TRT INT8 usage](#Paddle-TRT_INT8 usage)
1111
- [Paddle-TRT subgraph operation principle](#Paddle-TRT subgraph operation principle)
12-
12+
1313
## <a name="compile Paddle-TRT inference libraries">compile Paddle-TRT inference libraries</a>
1414

15-
**Use Docker to build inference libraries**
15+
**Use Docker to build inference libraries**
16+
17+
TRT inference libraries can only be compiled using GPU.
1618

1719
1. Download Paddle
18-
20+
1921
```
2022
git clone https://github.com/PaddlePaddle/Paddle.git
2123
```
22-
24+
2325
2. Get docker image
24-
26+
2527
```
2628
nvidia-docker run --name paddle_trt -v $PWD/Paddle:/Paddle -it hub.baidubce.com/paddlepaddle/paddle:latest-dev /bin/bash
2729
```
28-
30+
2931
3. Build Paddle TensorRT
3032

3133
```
@@ -41,16 +43,16 @@ Subgraph is used in PaddlePaddle to preliminarily integrate TensorRT, which enab
4143
-DWITH_PYTHON=OFF \
4244
-DTENSORRT_ROOT=/usr \
4345
-DON_INFER=ON
44-
46+
4547
# build
4648
make -j
4749
# generate inference library
4850
make inference_lib_dist -j
4951
```
5052

51-
## <a name="Paddle-TRT interface usage">Paddle-TRT interface usage</a>
53+
## <a name="Paddle-TRT interface usage">Paddle-TRT interface usage</a>
5254

53-
[`paddle_inference_api.h`]('https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/inference/api/paddle_inference_api.h') defines all APIs of TensorRT.
55+
[`paddle_inference_api.h`]('https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/inference/api/paddle_inference_api.h') defines all APIs of TensorRT.
5456

5557
General steps are as follows:
5658
1. Create appropriate AnalysisConfig.
@@ -71,13 +73,13 @@ void RunTensorRT(int batch_size, std::string model_dirname) {
7173
AnalysisConfig config(model_dirname);
7274
// config->SetModel(model_dirname + "/model",
7375
// model_dirname + "/params");
74-
76+
7577
config->EnableUseGpu(100, 0 /*gpu_id*/);
7678
config->EnableTensorRtEngine(1 << 20 /*work_space_size*/, batch_size /*max_batch_size*/);
77-
79+
7880
// 2. Create predictor based on config
7981
auto predictor = CreatePaddlePredictor(config);
80-
// 3. Create input tensor
82+
// 3. Create input tensor
8183
int height = 224;
8284
int width = 224;
8385
float data[batch_size * 3 * height * width] = {0};
@@ -96,13 +98,13 @@ void RunTensorRT(int batch_size, std::string model_dirname) {
9698

9799
const size_t num_elements = outputs.front().data.length() / sizeof(float);
98100
auto *data = static_cast<float *>(outputs.front().data.data());
99-
for (size_t i = 0; i < num_elements; i++) {
101+
for (size_t i = 0; i < num_elements; i++) {
100102
std::cout << "output: " << data[i] << std::endl;
101103
}
102104
}
103105
} // namespace paddle
104106

105-
int main() {
107+
int main() {
106108
// Download address of the model http://paddle-inference-dist.cdn.bcebos.com/tensorrt_test/mobilenet.tar.gz
107109
paddle::RunTensorRT(1, "./mobilenet");
108110
return 0;
@@ -120,11 +122,11 @@ The parameters of the neural network are redundant to some extent. In many tasks
120122
```shell
121123
cd SAMPLE_BASE_DIR/sample
122124
# sh run_impl.sh {the address of inference libraries} {the name of test script} {model directories}
123-
# We generate 500 input data to simulate the process, and it's suggested that you use real example for experiment.
125+
# We generate 500 input data to simulate the process, and it's suggested that you use real example for experiment.
124126
sh run_impl.sh BASE_DIR/fluid_inference_install_dir/ fluid_generate_calib_test SAMPLE_BASE_DIR/sample/mobilenetv1
125-
127+
126128
```
127-
129+
128130
After the running period, there will be a new file named trt_calib_* under the `SAMPLE_BASE_DIR/sample/build/mobilenetv1` model directory, which is the calibration table.
129131
130132
``` shell
@@ -137,8 +139,8 @@ The parameters of the neural network are redundant to some extent. In many tasks
137139
## <a name="Paddle-TRT subgraph operation principle">Paddle-TRT subgraph operation principle</a>
138140
139141
Subgraph is used to integrate TensorRT in PaddlePaddle. After model is loaded, neural network can be represented as a computing graph composed of variables and computing nodes. Functions Paddle TensorRT implements are to scan the whole picture, discover subgraphs that can be optimized with TensorRT and replace them with TensorRT nodes. During the inference of model, Paddle will call TensorRT library to optimize TensorRT nodes and call native library of Paddle to optimize other nodes. During the inference, TensorRT can integrate Op horizonally and vertically to filter redundant Ops and is able to choose appropriate kernel for specific Op in specific platform to speed up the inference of model.
140-
141-
A simple model expresses the process :
142+
143+
A simple model expresses the process :
142144
143145
**Original Network**
144146
<p align="center">
@@ -151,5 +153,3 @@ A simple model expresses the process :
151153
</p>
152154
153155
We can see in the Original Network that the green nodes represent nodes supported by TensorRT, the red nodes represent variables in network and yellow nodes represent nodes which can only be operated by native functions in Paddle. Green nodes in original network are extracted to compose subgraph which is replaced by a single TensorRT node to be transformed into `block-25` node in network. When such nodes are encountered during the runtime, TensorRT library will be called to execute them.
154-
155-

0 commit comments

Comments
 (0)