You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[`paddle_inference_api.h`]('https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/inference/api/paddle_inference_api.h') defines all APIs of TensorRT.
55
+
[`paddle_inference_api.h`]('https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/inference/api/paddle_inference_api.h') defines all APIs of TensorRT.
auto *data = static_cast<float *>(outputs.front().data.data());
99
-
for (size_t i = 0; i < num_elements; i++) {
101
+
for (size_t i = 0; i < num_elements; i++) {
100
102
std::cout << "output: " << data[i] << std::endl;
101
103
}
102
104
}
103
105
} // namespace paddle
104
106
105
-
int main() {
107
+
int main() {
106
108
// Download address of the model http://paddle-inference-dist.cdn.bcebos.com/tensorrt_test/mobilenet.tar.gz
107
109
paddle::RunTensorRT(1, "./mobilenet");
108
110
return 0;
@@ -120,11 +122,11 @@ The parameters of the neural network are redundant to some extent. In many tasks
120
122
```shell
121
123
cd SAMPLE_BASE_DIR/sample
122
124
# sh run_impl.sh {the address of inference libraries} {the name of test script} {model directories}
123
-
# We generate 500 input data to simulate the process, and it's suggested that you use real example for experiment.
125
+
# We generate 500 input data to simulate the process, and it's suggested that you use real example for experiment.
124
126
sh run_impl.sh BASE_DIR/fluid_inference_install_dir/ fluid_generate_calib_test SAMPLE_BASE_DIR/sample/mobilenetv1
125
-
127
+
126
128
```
127
-
129
+
128
130
After the running period, there will be a new file named trt_calib_* under the `SAMPLE_BASE_DIR/sample/build/mobilenetv1` model directory, which is the calibration table.
129
131
130
132
``` shell
@@ -137,8 +139,8 @@ The parameters of the neural network are redundant to some extent. In many tasks
Subgraph is used to integrate TensorRT in PaddlePaddle. After model is loaded, neural network can be represented as a computing graph composed of variables and computing nodes. Functions Paddle TensorRT implements are to scan the whole picture, discover subgraphs that can be optimized with TensorRT and replace them with TensorRT nodes. During the inference of model, Paddle will call TensorRT library to optimize TensorRT nodes and call native library of Paddle to optimize other nodes. During the inference, TensorRT can integrate Op horizonally and vertically to filter redundant Ops and is able to choose appropriate kernel for specific Op in specific platform to speed up the inference of model.
140
-
141
-
A simple model expresses the process :
142
+
143
+
A simple model expresses the process :
142
144
143
145
**Original Network**
144
146
<p align="center">
@@ -151,5 +153,3 @@ A simple model expresses the process :
151
153
</p>
152
154
153
155
We can see in the Original Network that the green nodes represent nodes supported by TensorRT, the red nodes represent variables in network and yellow nodes represent nodes which can only be operated by native functions in Paddle. Green nodes in original network are extracted to compose subgraph which is replaced by a single TensorRT node to be transformed into `block-25` node in network. When such nodes are encountered during the runtime, TensorRT library will be called to execute them.
0 commit comments