Skip to content

Commit d766eeb

Browse files
xta0Jessica Lin
andauthored
[Mobile Perf Recipe] Add the benchmarking part for iOS (#1055)
* [Mobile Perf Recipe] Add the benchmarking part for iOS * [Mobile Perf Recipe] Add the benchmarking part for iOS Co-authored-by: Jessica Lin <[email protected]>
1 parent f8465c3 commit d766eeb

File tree

1 file changed

+56
-9
lines changed

1 file changed

+56
-9
lines changed

recipes_source/mobile_perf.rst

Lines changed: 56 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ We will start with preparing to optimize your model to help decrease execution t
2323

2424

2525
Setup
26-
#######
26+
^^^^^^^
2727

2828
First we need to installed pytorch using conda or pip with version at least 1.5.0.
2929

@@ -69,7 +69,7 @@ Code your model:
6969

7070

7171
1. Fuse operators using ``torch.quantization.fuse_modules``
72-
#############################################################
72+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
7373

7474
Do not be confused that fuse_modules is in the quantization package.
7575
It works for all ``torcn.nn.Module``.
@@ -90,7 +90,7 @@ This script will fuse Convolution, Batch Normalization and Relu in previously de
9090

9191

9292
2. Quantize your model
93-
#############################################################
93+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
9494

9595
You can find more about PyTorch quantization in
9696
`the dedicated tutorial <https://pytorch.org/blog/introduction-to-quantization-on-pytorch/>`_.
@@ -115,7 +115,7 @@ This code does quantization, using stub for model calibration function, you can
115115

116116

117117
3. Use torch.utils.mobile_optimizer
118-
#############################################################
118+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
119119

120120
Torch mobile_optimizer package does several optimizations with the scripted model,
121121
which will help to conv2d and linear operations.
@@ -136,7 +136,7 @@ Next we call ``optimize_for_mobile`` and save model on the disk.
136136
torch.jit.save(torchscript_model_optimized, "model.pt")
137137

138138
4. Prefer Using Channels Last Tensor memory format
139-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
139+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
140140

141141
Channels Last(NHWC) memory format was introduced in PyTorch 1.4.0. It is supported only for four-dimensional tensors. This memory format gives a better memory locality for most operators, especially convolution. Our measurements showed a 3x speedup of MobileNetV2 model compared with the default Channels First(NCHW) format.
142142

@@ -151,10 +151,11 @@ At the moment of writing this recipe, PyTorch Android java API does not support
151151
152152
This conversion is zero cost if your input is already in Channels Last memory format. After it, all operators will work preserving ChannelsLast memory format.
153153

154-
5. Android. Reusing tensors for forward
155-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
154+
5. Android - Reusing tensors for forward
155+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
156+
157+
This part of the recipe is Android only.
156158

157-
This recipe is Android only.
158159
Memory is a critical resource for android performance, especially on old devices.
159160
Tensors can need a significant amount of memory.
160161
For example, standard computer vision tensor contains 1*3*224*224 elements,
@@ -208,7 +209,9 @@ this approach can give more stable measurements rather than testing inside the a
208209

209210

210211
Android - Benchmarking Setup
211-
#############################
212+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
213+
214+
This part of the recipe is Android only.
212215

213216
For this you first need to build benchmark binary:
214217

@@ -240,3 +243,47 @@ Now we are ready to benchmark your model:
240243
Running warmup runs.
241244
Main runs.
242245
Main run finished. Microseconds per iter: 121318. Iters per second: 8.24281
246+
247+
248+
iOS - Benchmarking Setup
249+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
250+
251+
For iOS, we'll be using our `TestApp <https://github.com/pytorch/pytorch/tree/master/ios/TestApp>`_ as the benchmarking tool.
252+
253+
To begin with, let's apply the ``optimize_for_mobile`` method to our python script located at `TestApp/benchmark/trace_mode.py <https://github.com/pytorch/pytorch/blob/master/ios/TestApp/benchmark/trace_model.py>`_. Simply modify the code as below.
254+
255+
::
256+
257+
import torch
258+
import torchvision
259+
from torch.utils.mobile_optimizer import optimize_for_mobile
260+
261+
model = torchvision.models.mobilenet_v2(pretrained=True)
262+
model.eval()
263+
example = torch.rand(1, 3, 224, 224)
264+
traced_script_module = torch.jit.trace(model, example)
265+
torchscript_model_optimized = optimize_for_mobile(traced_script_module)
266+
torch.jit.save(torchscript_model_optimized, "model.pt")
267+
268+
Now let's run ``python trace_model.py``. If everything works well, we should be able to generate our optimized model in the benchmark directory.
269+
270+
Next, we're going to build the PyTorch libraries from source.
271+
272+
::
273+
274+
BUILD_PYTORCH_MOBILE=1 IOS_ARCH=arm64 ./scripts/build_ios.sh
275+
276+
Now that we have the optimized model and PyTorch ready, it's time to generate our XCode project and do benchmarking. To do that, we'll be using a ruby script - `setup.rb` which does the heavy lifting jobs of setting up the XCode project.
277+
278+
::
279+
280+
ruby setup.rb
281+
282+
Now open the `TestApp.xcodeproj` and plug in your iPhone, you're ready to go. Below is an example result from iPhoneX
283+
284+
::
285+
286+
TestApp[2121:722447] Main runs
287+
TestApp[2121:722447] Main run finished. Milliseconds per iter: 28.767
288+
TestApp[2121:722447] Iters per second: : 34.762
289+
TestApp[2121:722447] Done.

0 commit comments

Comments
 (0)