You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Mobile Perf Recipe] Add the benchmarking part for iOS (#1055)
* [Mobile Perf Recipe] Add the benchmarking part for iOS
* [Mobile Perf Recipe] Add the benchmarking part for iOS
Co-authored-by: Jessica Lin <[email protected]>
Channels Last(NHWC) memory format was introduced in PyTorch 1.4.0. It is supported only for four-dimensional tensors. This memory format gives a better memory locality for most operators, especially convolution. Our measurements showed a 3x speedup of MobileNetV2 model compared with the default Channels First(NCHW) format.
142
142
@@ -151,10 +151,11 @@ At the moment of writing this recipe, PyTorch Android java API does not support
151
151
152
152
This conversion is zero cost if your input is already in Channels Last memory format. After it, all operators will work preserving ChannelsLast memory format.
Memory is a critical resource for android performance, especially on old devices.
159
160
Tensors can need a significant amount of memory.
160
161
For example, standard computer vision tensor contains 1*3*224*224 elements,
@@ -208,7 +209,9 @@ this approach can give more stable measurements rather than testing inside the a
208
209
209
210
210
211
Android - Benchmarking Setup
211
-
#############################
212
+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
213
+
214
+
This part of the recipe is Android only.
212
215
213
216
For this you first need to build benchmark binary:
214
217
@@ -240,3 +243,47 @@ Now we are ready to benchmark your model:
240
243
Running warmup runs.
241
244
Main runs.
242
245
Main run finished. Microseconds per iter: 121318. Iters per second: 8.24281
246
+
247
+
248
+
iOS - Benchmarking Setup
249
+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
250
+
251
+
For iOS, we'll be using our `TestApp <https://github.com/pytorch/pytorch/tree/master/ios/TestApp>`_ as the benchmarking tool.
252
+
253
+
To begin with, let's apply the ``optimize_for_mobile`` method to our python script located at `TestApp/benchmark/trace_mode.py <https://github.com/pytorch/pytorch/blob/master/ios/TestApp/benchmark/trace_model.py>`_. Simply modify the code as below.
254
+
255
+
::
256
+
257
+
import torch
258
+
import torchvision
259
+
from torch.utils.mobile_optimizer import optimize_for_mobile
260
+
261
+
model = torchvision.models.mobilenet_v2(pretrained=True)
Now that we have the optimized model and PyTorch ready, it's time to generate our XCode project and do benchmarking. To do that, we'll be using a ruby script - `setup.rb` which does the heavy lifting jobs of setting up the XCode project.
277
+
278
+
::
279
+
280
+
ruby setup.rb
281
+
282
+
Now open the `TestApp.xcodeproj` and plug in your iPhone, you're ready to go. Below is an example result from iPhoneX
283
+
284
+
::
285
+
286
+
TestApp[2121:722447] Main runs
287
+
TestApp[2121:722447] Main run finished. Milliseconds per iter: 28.767
0 commit comments