Skip to content

Commit f2fbe6b

Browse files
XiaobingSuperSvetlana Karslioglu
andauthored
update quantization tutorial by introudcing x86 backend (#2081)
Co-authored-by: Svetlana Karslioglu <[email protected]>
1 parent edf145d commit f2fbe6b

File tree

6 files changed

+17
-12
lines changed

6 files changed

+17
-12
lines changed

advanced_source/static_quantization_tutorial.rst

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -458,7 +458,8 @@ quantizing for x86 architectures. This configuration does the following:
458458
per_channel_quantized_model = load_model(saved_model_dir + float_model_file)
459459
per_channel_quantized_model.eval()
460460
per_channel_quantized_model.fuse_model()
461-
per_channel_quantized_model.qconfig = torch.ao.quantization.get_default_qconfig('fbgemm')
461+
# The old 'fbgemm' is still available but 'x86' is the recommended default.
462+
per_channel_quantized_model.qconfig = torch.ao.quantization.get_default_qconfig('x86')
462463
print(per_channel_quantized_model.qconfig)
463464
464465
torch.ao.quantization.prepare(per_channel_quantized_model, inplace=True)
@@ -534,8 +535,9 @@ We fuse modules as before
534535
qat_model = load_model(saved_model_dir + float_model_file)
535536
qat_model.fuse_model()
536537
537-
optimizer = torch.optim.SGD(qat_model.parameters(), lr = 0.0001)
538-
qat_model.qconfig = torch.ao.quantization.get_default_qat_qconfig('fbgemm')
538+
optimizer = torch.optim.SGD(qat_model.parameters(), lr = 0.0001)
539+
# The old 'fbgemm' is still available but 'x86' is the recommended default.
540+
qat_model.qconfig = torch.ao.quantization.get_default_qat_qconfig('x86')
539541
540542
Finally, ``prepare_qat`` performs the "fake quantization", preparing the model for quantization-aware training
541543

beginner_source/vt_tutorial.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -138,8 +138,8 @@
138138
# Now run the code below:
139139
#
140140

141-
# Use 'fbgemm' for server inference and 'qnnpack' for mobile inference
142-
backend = "fbgemm" # replaced with qnnpack causing much worse inference speed for quantized model on this notebook
141+
# Use 'x86' for server inference (the old 'fbgemm' is still available but 'x86' is the recommended default) and 'qnnpack' for mobile inference.
142+
backend = "x86" # replaced with qnnpack causing much worse inference speed for quantized model on this notebook
143143
model.qconfig = torch.quantization.get_default_qconfig(backend)
144144
torch.backends.quantized.engine = backend
145145

prototype_source/fx_graph_mode_ptq_dynamic.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,8 @@
1818
from torch.quantization.quantize_fx import prepare_fx, convert_fx
1919
2020
float_model.eval()
21-
qconfig = get_default_qconfig("fbgemm")
21+
# The old 'fbgemm' is still available but 'x86' is the recommended default.
22+
qconfig = get_default_qconfig("x86")
2223
qconfig_mapping = QConfigMapping().set_global(qconfig)
2324
prepared_model = prepare_fx(float_model, qconfig_mapping, example_inputs) # fuse modules and insert observers
2425
# no calibration is required for dynamic quantization
@@ -288,4 +289,4 @@ def time_model_evaluation(model, test_data):
288289
# 3. Conclusion
289290
# -------------
290291
# This tutorial introduces the api for post training dynamic quantization in FX Graph Mode,
291-
# which dynamically quantizes the same modules as Eager Mode Quantization.
292+
# which dynamically quantizes the same modules as Eager Mode Quantization.

prototype_source/fx_graph_mode_ptq_static.rst

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,8 @@ tldr; The FX Graph Mode API looks like the following:
1717
from torch.ao.quantization.quantize_fx import prepare_fx, convert_fx
1818
from torch.ao.quantization import QConfigMapping
1919
float_model.eval()
20-
qconfig = get_default_qconfig("fbgemm")
20+
# The old 'fbgemm' is still available but 'x86' is the recommended default.
21+
qconfig = get_default_qconfig("x86")
2122
qconfig_mapping = QConfigMapping().set_global(qconfig)
2223
def calibrate(model, data_loader):
2324
model.eval()
@@ -256,7 +257,8 @@ while those for ``QConfigMapping`` can be found in the `qconfig_mapping <https:/
256257

257258
.. code:: python
258259
259-
qconfig = get_default_qconfig("fbgemm")
260+
# The old 'fbgemm' is still available but 'x86' is the recommended default.
261+
qconfig = get_default_qconfig("x86")
260262
qconfig_mapping = QConfigMapping().set_global(qconfig)
261263
262264
5. Prepare the Model for Post Training Static Quantization
@@ -406,4 +408,4 @@ Running the model in AIBench (with single threading) gives the following result:
406408
407409
As we can see for resnet18 both FX graph mode and eager mode quantized model get similar speedup over the floating point model,
408410
which is around 2-4x faster than the floating point model. But the actual speedup over floating point model may vary
409-
depending on model, device, build, input batch sizes, threading etc.
411+
depending on model, device, build, input batch sizes, threading etc.

prototype_source/fx_numeric_suite_tutorial.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ def plot(xdata, ydata, xlabel, ylabel, title):
7070

7171
# create quantized model
7272
qconfig_dict = {
73-
'': torch.quantization.get_default_qconfig('fbgemm'),
73+
'': torch.quantization.get_default_qconfig('x86'), # The old 'fbgemm' is still available but 'x86' is the recommended default.
7474
# adjust the qconfig to make the results more interesting to explore
7575
'module_name': [
7676
# turn off quantization for the first couple of layers

recipes_source/quantization.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,7 @@ After this, running `print_model_size(model_static_quantized)` shows the static
9898
A complete model definition and static quantization example is `here <https://pytorch.org/docs/stable/quantization.html#quantization-api-summary>`_. A dedicated static quantization tutorial is `here <https://pytorch.org/tutorials/advanced/static_quantization_tutorial.html>`_.
9999

100100
.. note::
101-
To make the model run on mobile devices which normally have arm architecture, you need to use `qnnpack` for `backend`; to run the model on computer with x86 architecture, use `fbgemm`.
101+
To make the model run on mobile devices which normally have arm architecture, you need to use `qnnpack` for `backend`; to run the model on computer with x86 architecture, use `x86`` (the old `fbgemm` is still available but 'x86' is the recommended default).
102102

103103
4. Quantization Aware Training
104104
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

0 commit comments

Comments
 (0)