PaddlePaddle
diff --git a/‎doc/fluid/advanced_usage/best_practice/dist_training_gpu.rst‎
Lines changed: 3 additions & 3 deletions b/‎doc/fluid/advanced_usage/best_practice/dist_training_gpu.rst‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎doc/fluid/advanced_usage/deploy/inference/index_cn.rst‎
Lines changed: 1 addition & 0 deletions b/‎doc/fluid/advanced_usage/deploy/inference/index_cn.rst‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎doc/fluid/advanced_usage/deploy/inference/paddle_tensorrt_infer.md‎
Lines changed: 26 additions & 31 deletions b/‎doc/fluid/advanced_usage/deploy/inference/paddle_tensorrt_infer.md‎
Lines changed: 26 additions & 31 deletions
diff --git a/‎doc/fluid/advanced_usage/deploy/inference/paddle_tensorrt_infer_en.md‎
Lines changed: 21 additions & 21 deletions b/‎doc/fluid/advanced_usage/deploy/inference/paddle_tensorrt_infer_en.md‎
Lines changed: 21 additions & 21 deletions
@@ -24,8 +24,8 @@ PaddlePaddle Fluid可以支持在现代GPU [#]_ 服务器集群上完成高性
     :header: "调节项", "可选值说明", "配置方法"
     :widths: 3, 3, 5
 
-    "通信模式", "pserver模式；NCCL2模式（collective [#]_ ）", "配置方法参考： `这里 <../../user_guides/howto/training/cluster_howto.html#permalink-8--nccl2->`_ "
-    "执行模式", "单进程；单进程ParallelGraph；多进程", "配置方法参考： `这里 <../../user_guides/howto/training/cluster_howto.html#permalink-9--nccl2->`_ "
+    "通信模式", "pserver模式；NCCL2模式（collective [#]_ ）", "配置方法参考：:ref:`cluster_howto`"
+    "执行模式", "单进程；单进程ParallelGraph；多进程", "配置方法参考：:ref:`cluster_howto`"
     "同步AllReduce操作", "开启则使每次调用等待AllReduce同步", "设置环境变量 :code:`FLAGS_sync_nccl_allreduce`"
     "CPU线程数", "int值，配置使用的CPU线程数", "参考本片后续说明"
     "预先分配足够的显存", "0~1之间的float值，预先分配显存的占比", "设置环境变量 :code:`FLAGS_fraction_of_gpu_memory_to_use`"
@@ -41,7 +41,7 @@ PaddlePaddle Fluid可以支持在现代GPU [#]_ 服务器集群上完成高性
 选择通信模式和执行模式
 +++++++++++++++++++
 
-GPU分布式训练场景，使用多进程+NCCL2模式（collective）通常可以获得最好的性能。参考 `这里 <../../user_guides/howto/training/cluster_howto.html#permalink-8--nccl2->`_ 配置您的程序使用多进程NCCL2模式训练。
+GPU分布式训练场景，使用多进程+NCCL2模式（collective）通常可以获得最好的性能。参考 :ref:`cluster_howto` 配置您的程序使用多进程NCCL2模式训练。
 
 在进程模式下，每台服务器的每个GPU卡都会对应启动一个训练进程，
 集群中的所有进程之间会互相通信完成训练。以此方式最大限度的降低进程内部资源抢占的开销。
 
@@ -9,6 +9,7 @@ PaddlePaddle Fluid 提供了 C++ API 来支持模型的部署上线
 
    build_and_install_lib_cn.rst
    native_infer.md
+   python_infer_cn.md
    paddle_tensorrt_infer.md
    paddle_gpu_benchmark.md
    windows_cpp_inference.md
@@ -12,20 +12,22 @@ NVIDIA TensorRT 是一个高性能的深度学习预测库，可为深度学习
 
 ## <a name="编译Paddle-TRT预测库">编译Paddle-TRT预测库</a>
 
-**使用Docker编译预测库**         
+**使用Docker编译预测库**
+
+TensorRT预测库目前仅支持使用GPU编译。        
 
 1. 下载Paddle  
- 
+
 	```
 	git clone https://github.com/PaddlePaddle/Paddle.git
 	```
-	
+
 2. 获取docker镜像
-  
+
 	```
 	nvidia-docker run --name paddle_trt -v $PWD/Paddle:/Paddle -it hub.baidubce.com/paddlepaddle/paddle:latest-dev /bin/bash
 	```
- 
+
 3. 编译Paddle TensorRT       
 
 	```
@@ -43,15 +45,15 @@ NVIDIA TensorRT 是一个高性能的深度学习预测库，可为深度学习
 	      -DWITH_PYTHON=OFF   \
 	      -DTENSORRT_ROOT=/usr \
 	      -DON_INFER=ON
-	
+
 	# 编译    
 	make -j
 	# 生成预测库
 	make inference_lib_dist -j
 	```
-	
+
 	编译后的库的目录如下：
-	
+
 	```
 	fluid_inference_install_dir
 	├── paddle
@@ -61,12 +63,12 @@ NVIDIA TensorRT 是一个高性能的深度学习预测库，可为深度学习
 	├── third_party
 	    ├── boost
 	    ├── install
-	    └── engine3 
+	    └── engine3
 	```
-   
+
 	`fluid_inference_install_dir`下， paddle目录包含了预测库的头文件和预测库的lib， version.txt 中包含了lib的版本和配置信息，third_party 中包含了预测库依赖的第三方库      
 
-## <a name="Paddle-TRT接口使用">Paddle-TRT接口使用</a> 
+## <a name="Paddle-TRT接口使用">Paddle-TRT接口使用</a>
 
 [`paddle_inference_api.h`]('https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/inference/api/paddle_inference_api.h') 定义了使用TensorRT的所有接口。  
 
@@ -89,13 +91,13 @@ void RunTensorRT(int batch_size, std::string model_dirname) {
   AnalysisConfig config(model_dirname);
   // config->SetModel(model_dirname + "/model",                                                                                             
   //                     model_dirname + "/params");
- 
+
   config->EnableUseGpu(100, 0 /*gpu_id*/);
   config->EnableTensorRtEngine(1 << 20 /*work_space_size*/, batch_size /*max_batch_size*/);
-  
+
   // 2. 根据config 创建predictor
   auto predictor = CreatePaddlePredictor(config);
-  // 3. 创建输入 tensor 
+  // 3. 创建输入 tensor
   int height = 224;
   int width = 224;
   float data[batch_size * 3 * height * width] = {0};
@@ -114,13 +116,13 @@ void RunTensorRT(int batch_size, std::string model_dirname) {
 
   const size_t num_elements = outputs.front().data.length() / sizeof(float);
   auto *data = static_cast<float *>(outputs.front().data.data());
-  for (size_t i = 0; i < num_elements; i++) { 
+  for (size_t i = 0; i < num_elements; i++) {
     std::cout << "output: " << data[i] << std::endl;
   }
 }
 }  // namespace paddle
 
-int main() { 
+int main() {
   // 模型下载地址 http://paddle-inference-dist.cdn.bcebos.com/tensorrt_test/mobilenet.tar.gz
   paddle::RunTensorRT(1, "./mobilenet");
   return 0;
@@ -133,9 +135,9 @@ int main() {
 	```
 	wget http://paddle-inference-dist.cdn.bcebos.com/tensorrt_test/paddle_trt_samples.tar.gz
 	```
-	
+
 	解压后的目录如下：
-	
+
 	```
 	sample
 	├── CMakeLists.txt
@@ -146,12 +148,12 @@ int main() {
 	│   └── params
 	└── run_impl.sh
 	```
-	
+
 	- `mobilenet_test.cc` 为单线程的程序文件  
 	- `thread_mobilenet_test.cc` 为多线程的程序文件  
 	- `mobilenetv1` 为模型文件   
 
-	在这里假设预测库的路径为 ``BASE_DIR/fluid_inference_install_dir/`` ，样例所在的目录为 ``SAMPLE_BASE_DIR/sample`` 
+	在这里假设预测库的路径为 ``BASE_DIR/fluid_inference_install_dir/`` ，样例所在的目录为 ``SAMPLE_BASE_DIR/sample``
 
 2. 编译样例   
 
@@ -181,10 +183,10 @@ int main() {
 	# sh run_impl.sh {预测库的地址} {测试脚本的名字} {模型目录}
 	# 我们随机生成了500个输入来模拟这一过程，建议大家用真实样例进行实验。
 	sh run_impl.sh BASE_DIR/fluid_inference_install_dir/  fluid_generate_calib_test SAMPLE_BASE_DIR/sample/mobilenetv1
-	
+
 	```
 	运行结束后，在 `SAMPLE_BASE_DIR/sample/build/mobilenetv1` 模型目录下会多出一个名字为trt_calib_*的文件，即校准表。
-	
+
 	``` shell
 	# 执行INT8预测
 	# 将带校准表的模型文件拷贝到特定地址
@@ -193,7 +195,7 @@ int main() {
 	```
 
 ## <a name="Paddle-TRT子图运行原理">Paddle-TRT子图运行原理</a>
- 
+
    PaddlePaddle采用子图的形式对TensorRT进行集成，当模型加载后，神经网络可以表示为由变量和运算节点组成的计算图。Paddle TensorRT实现的功能是能够对整个图进行扫描，发现图中可以使用TensorRT优化的子图，并使用TensorRT节点替换它们。在模型的推断期间，如果遇到TensorRT节点，Paddle会调用TensoRT库对该节点进行优化，其他的节点调用Paddle的原生实现。TensorRT在推断期间能够进行Op的横向和纵向融合，过滤掉冗余的Op，并对特定平台下的特定的Op选择合适的kenel等进行优化，能够加快模型的预测速度。  
 
 下图使用一个简单的模型展示了这个过程：   
@@ -208,12 +210,5 @@ int main() {
  <img src="https://raw.githubusercontent.com/NHZlX/FluidDoc/add_trt_doc/doc/fluid/user_guides/howto/inference/image/model_graph_trt.png" width="600">
 </p>
 
-    
-   我们可以在原始模型网络中看到，绿色节点表示可以被TensorRT支持的节点，红色节点表示网络中的变量，黄色表示Paddle只能被Paddle原生实现执行的节点。那些在原始网络中的绿色节点被提取出来汇集成子图，并由一个TensorRT节点代替，成为转换网络中的`block-25` 节点。在网络运行过程中，如果遇到该节点，Paddle将调用TensorRT库来对其执行。
-
-
-
-
-
-
 
+   我们可以在原始模型网络中看到，绿色节点表示可以被TensorRT支持的节点，红色节点表示网络中的变量，黄色表示Paddle只能被Paddle原生实现执行的节点。那些在原始网络中的绿色节点被提取出来汇集成子图，并由一个TensorRT节点代替，成为转换网络中的`block-25` 节点。在网络运行过程中，如果遇到该节点，Paddle将调用TensorRT库来对其执行。
@@ -9,23 +9,25 @@ Subgraph is used in PaddlePaddle to preliminarily integrate TensorRT, which enab
  - [Paddle-TRT example compiling test](#Paddle-TRT example compiling test)
  - [Paddle-TRT INT8 usage](#Paddle-TRT_INT8 usage)
  - [Paddle-TRT subgraph operation principle](#Paddle-TRT subgraph operation principle)
- 
+
 ## <a name="compile Paddle-TRT inference libraries">compile Paddle-TRT inference libraries</a>
 
-**Use Docker to build inference libraries**         
+**Use Docker to build inference libraries**
+
+TRT inference libraries can only be compiled using GPU.         
 
 1. Download Paddle  
- 
+
 	```
 	git clone https://github.com/PaddlePaddle/Paddle.git
 	```
-	
+
 2. Get docker image
-  
+
 	```
 	nvidia-docker run --name paddle_trt -v $PWD/Paddle:/Paddle -it hub.baidubce.com/paddlepaddle/paddle:latest-dev /bin/bash
 	```
- 
+
 3. Build Paddle TensorRT       
 
 	```
@@ -41,16 +43,16 @@ Subgraph is used in PaddlePaddle to preliminarily integrate TensorRT, which enab
 	      -DWITH_PYTHON=OFF   \
 	      -DTENSORRT_ROOT=/usr \
 	      -DON_INFER=ON
-	
+
 	# build    
 	make -j
 	# generate inference library
 	make inference_lib_dist -j
 	```
 
-## <a name="Paddle-TRT interface usage">Paddle-TRT interface usage</a> 
+## <a name="Paddle-TRT interface usage">Paddle-TRT interface usage</a>
 
-[`paddle_inference_api.h`]('https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/inference/api/paddle_inference_api.h') defines all APIs of TensorRT. 
+[`paddle_inference_api.h`]('https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/inference/api/paddle_inference_api.h') defines all APIs of TensorRT.
 
 General steps are as follows:
 1. Create appropriate AnalysisConfig.    
@@ -71,13 +73,13 @@ void RunTensorRT(int batch_size, std::string model_dirname) {
   AnalysisConfig config(model_dirname);
   // config->SetModel(model_dirname + "/model",                                                                                                   
   //                     model_dirname + "/params");
-  
+
   config->EnableUseGpu(100, 0 /*gpu_id*/);
   config->EnableTensorRtEngine(1 << 20 /*work_space_size*/, batch_size /*max_batch_size*/);
-  
+
   // 2. Create predictor based on config
   auto predictor = CreatePaddlePredictor(config);
-  // 3. Create input tensor 
+  // 3. Create input tensor
   int height = 224;
   int width = 224;
   float data[batch_size * 3 * height * width] = {0};
@@ -96,13 +98,13 @@ void RunTensorRT(int batch_size, std::string model_dirname) {
 
   const size_t num_elements = outputs.front().data.length() / sizeof(float);
   auto *data = static_cast<float *>(outputs.front().data.data());
-  for (size_t i = 0; i < num_elements; i++) { 
+  for (size_t i = 0; i < num_elements; i++) {
     std::cout << "output: " << data[i] << std::endl;
   }
 }
 }  // namespace paddle
 
-int main() { 
+int main() {
   // Download address of the model http://paddle-inference-dist.cdn.bcebos.com/tensorrt_test/mobilenet.tar.gz
   paddle::RunTensorRT(1, "./mobilenet");
   return 0;
@@ -120,11 +122,11 @@ The parameters of the neural network are redundant to some extent. In many tasks
   	```shell
  	cd SAMPLE_BASE_DIR/sample
  	# sh run_impl.sh {the address of inference libraries} {the name of test script} {model directories}
- 	# We generate 500 input data to simulate the process, and it's suggested that you use real example for experiment. 
+ 	# We generate 500 input data to simulate the process, and it's suggested that you use real example for experiment.
  	sh run_impl.sh BASE_DIR/fluid_inference_install_dir/  fluid_generate_calib_test SAMPLE_BASE_DIR/sample/mobilenetv1
- 	
+
  	```
- 	
+
         After the running period, there will be a new file named trt_calib_* under the `SAMPLE_BASE_DIR/sample/build/mobilenetv1` model directory, which is the calibration table.
 
   	``` shell
@@ -137,8 +139,8 @@ The parameters of the neural network are redundant to some extent. In many tasks
 ## <a name="Paddle-TRT subgraph operation principle">Paddle-TRT subgraph operation principle</a>
 
 Subgraph is used to integrate TensorRT in PaddlePaddle. After model is loaded, neural network can be represented as a computing graph composed of variables and computing nodes. Functions Paddle TensorRT implements are to scan the whole picture, discover subgraphs that can be optimized with TensorRT and replace them with TensorRT nodes. During the inference of model, Paddle will call TensorRT library to optimize TensorRT nodes and call native library of Paddle to optimize other nodes. During the inference, TensorRT can integrate Op horizonally and vertically to filter redundant Ops and is able to choose appropriate kernel for specific Op in specific platform to speed up the inference of model.
-   
-A simple model expresses the process : 
+
+A simple model expresses the process :
 
 **Original Network**
 <p align="center">
@@ -151,5 +153,3 @@ A simple model expresses the process :
 </p>
 
 We can see in the Original Network that the green nodes represent nodes supported by TensorRT, the red nodes represent variables in network and yellow nodes represent nodes which can only be operated by native functions in Paddle. Green nodes in original network are extracted to compose subgraph which is replaced by a single TensorRT node to be transformed into `block-25` node in network. When such nodes are encountered during the runtime, TensorRT library will be called to execute them.
-   
-