From 38545494bb834cf0839b87d178e6dc49eb168ba9 Mon Sep 17 00:00:00 2001 From: yiliu30 Date: Wed, 16 Aug 2023 10:36:36 +0800 Subject: [PATCH 1/4] clarify the fwk and op_name_dict/op_type_dict Signed-off-by: yiliu30 --- docs/source/tuning_strategies.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/source/tuning_strategies.md b/docs/source/tuning_strategies.md index 695bc40cc02..106a0aa7e39 100644 --- a/docs/source/tuning_strategies.md +++ b/docs/source/tuning_strategies.md @@ -44,7 +44,7 @@ Tuning Strategies ## Introduction Intel® Neural Compressor aims to help users quickly deploy -the low-precision inference solution on popular Deep Learning frameworks such as TensorFlow, PyTorch, ONNX, and MXNet. With built-in strategies, it automatically optimizes low-precision recipes for deep learning models to achieve optimal product objectives, such as inference performance and memory usage, with expected accuracy criteria. Currently, several strategies, including `O0`, `Basic`, `MSE`, `MSE_V2`, `HAWQ_V2`, `Bayesian`, `Exhaustive`, `Random`, `SigOpt`, `TPE`, etc are supported. By default, the `Basic` strategy is used for tuning. +the low-precision inference solution on popular Deep Learning frameworks such as TensorFlow, PyTorch, ONNX, and MXNet. With built-in strategies, it automatically optimizes low-precision recipes for deep learning models to achieve optimal product objectives, such as inference performance and memory usage, with expected accuracy criteria. Currently, several tuning strategies, including `auto`, `O0`, `O1`, `Basic`, `MSE`, `MSE_V2`, `HAWQ_V2`, `Bayesian`, `Exhaustive`, `Random`, `SigOpt`, `TPE`, etc are supported. By default, the [`quant_level="auto"`](./tuning_strategies.md#auto) is used for tuning. ## Strategy Design Before tuning, the `tuning space` was constructed according to the framework capability and user configuration. Then the selected strategy generates the next quantization configuration according to its traverse process and the previous tuning record. The tuning process stops when meeting the exit policy. The function of strategies is shown @@ -59,6 +59,8 @@ Intel® Neural Compressor supports multiple quantization modes such as Post Trai To incorporate the human experience and reduce the tuning time, user can reduce the tuning space by specifying the `op_name_dict` and `op_type_dict` in `PostTrainingQuantConfig` (`QuantizationAwareTrainingConfig`). Before tuning, the strategy will merge these configurations with framework capability to create the final tuning space. +> Note: Any options in the `op_name_dict` and `op_type_dict` that are not included in the `framework capability` will be ignored by the strategy. + ### Exit Policy User can control the tuning process by setting the exit policy by specifying the `timeout`, and `max_trials` fields in the `TuningCriterion`. From 2b7986e522ecca5e24ab7fe23f1dc6cb156668e8 Mon Sep 17 00:00:00 2001 From: yiliu30 Date: Wed, 16 Aug 2023 11:01:12 +0800 Subject: [PATCH 2/4] add sq alpha tuning docs Signed-off-by: yiliu30 --- docs/source/smooth_quant.md | 16 +++++++++++++++- docs/source/tuning_strategies.md | 7 +++++-- 2 files changed, 20 insertions(+), 3 deletions(-) diff --git a/docs/source/smooth_quant.md b/docs/source/smooth_quant.md index 31353dcfc76..015828d2ff2 100644 --- a/docs/source/smooth_quant.md +++ b/docs/source/smooth_quant.md @@ -350,12 +350,26 @@ conf = PostTrainingQuantConfig(recipes=recipes) ``` smooth_quant_args description: -"alpha": "auto" or a float value. Default is 0.5. "auto" means automatic tuning. +"alpha": "auto", a float value or a list of float values. Default is 0.5. "auto" means automatic tuning. "folding": whether to fold mul into the previous layer, where mul is required to update the input distribution during smoothing. - True: Fold inserted mul into the previous layer. IPEX will only insert mul for layers can do folding. - False: Allow inserting mul to update the input distribution and no folding. IPEX (version>=2.1) can fuse inserted mul automatically. For Stock PyTorch, setting folding=False will convert the model to a QDQ model. + +To find the best `alpha`, users can utilize the [auto-tuning]((./tuning_strategies.md)) feature. Compares to setting the alpha to `"auto"`, this tuning process uses the evaluation result on the entire dataset as the metric to find the best `alpha`. To use this feature, users need to provide a list of scalers between 0 and 1 for the `alpha` item. Here is an example: + +```python +import numpy as np +conf = PostTrainingQuantConfig( + quant_level='auto', # quant_level can also be 1 + ... + recipes={"smooth_quant": True, + "smooth_quant_args": {"alpha": np.arange(0.1, 0.5, 0.05).tolist(),} + ... + }) +``` + ## Supported Framework Matrix | Framework | Alpha | Folding | diff --git a/docs/source/tuning_strategies.md b/docs/source/tuning_strategies.md index 106a0aa7e39..45da3681b58 100644 --- a/docs/source/tuning_strategies.md +++ b/docs/source/tuning_strategies.md @@ -54,12 +54,12 @@ below: ### Tuning Space -Intel® Neural Compressor supports multiple quantization modes such as Post Training Static Quantization (PTQ static), Post Training Dynamic Quantization (PTQ dynamic), Quantization Aware Training, etc. One operator (OP) with a specific quantization mode has multiple ways to quantize, for example it may have multiple quantization scheme(symmetric/asymmetric), calibration algorithm(Min-Max/KL Divergence), etc. We use the `framework capability` to represent the methods that we have already supported. The `tuning space` includes all tuning items and their options. For example, the tuning items and options of the `Conv2D` (PyTorch) supported by Intel® Neural Compressor are as follows: +Intel® Neural Compressor supports multiple quantization modes such as Post Training Static Quantization (PTQ static), Post Training Dynamic Quantization (PTQ dynamic), Quantization Aware Training, etc. One operator (OP) with a specific quantization mode has multiple ways to quantize, for example it may have multiple quantization scheme(symmetric/asymmetric), calibration algorithm(Min-Max/KL Divergence), etc. We use the [`framework capability`](./framework_yaml.md) to represent the methods that we have already supported. The `tuning space` includes all tuning items and their options. For example, the tuning items and options of the `Conv2D` (PyTorch) supported by Intel® Neural Compressor are as follows: ![Conv2D_PyTorch_Cap](./imgs/Conv2D_PyTorch_Cap.png "Conv2D PyTorch Capability") To incorporate the human experience and reduce the tuning time, user can reduce the tuning space by specifying the `op_name_dict` and `op_type_dict` in `PostTrainingQuantConfig` (`QuantizationAwareTrainingConfig`). Before tuning, the strategy will merge these configurations with framework capability to create the final tuning space. -> Note: Any options in the `op_name_dict` and `op_type_dict` that are not included in the `framework capability` will be ignored by the strategy. +> Note: Any options in the `op_name_dict` and `op_type_dict` that are not included in the [`framework capability`](./framework_yaml.md) will be ignored by the strategy. ### Exit Policy User can control the tuning process by setting the exit policy by specifying the `timeout`, and `max_trials` fields in the `TuningCriterion`. @@ -179,6 +179,9 @@ flowchart TD > `*` INC will detect the block pattern for [transformer-like](https://arxiv.org/abs/1706.03762) model by default. +> For [smooth quantization](./smooth_quant.md), user can tuning the its's alpha by providing a list of scalers for `alpha` item. The tuning process will take place at the **start** of the tuning procedure. For details usage, please refer to the [smooth quantization example](./smooth_quant.md#Example). + + **1.** Default quantization At this stage, it attempts to quantize OPs with the default quantization configuration which is consistent with the framework's behavior. From 4a43c270aaf765d6ed9a98bfdb7d60341ec47c1a Mon Sep 17 00:00:00 2001 From: yiliu30 Date: Wed, 16 Aug 2023 11:47:00 +0800 Subject: [PATCH 3/4] fixed spelling erro Signed-off-by: yiliu30 --- .azure-pipelines/scripts/codeScan/pyspelling/inc_dict.txt | 1 + docs/source/smooth_quant.md | 2 +- docs/source/tuning_strategies.md | 2 +- 3 files changed, 3 insertions(+), 2 deletions(-) diff --git a/.azure-pipelines/scripts/codeScan/pyspelling/inc_dict.txt b/.azure-pipelines/scripts/codeScan/pyspelling/inc_dict.txt index 8314ec2c153..d6daafe4d94 100644 --- a/.azure-pipelines/scripts/codeScan/pyspelling/inc_dict.txt +++ b/.azure-pipelines/scripts/codeScan/pyspelling/inc_dict.txt @@ -2710,3 +2710,4 @@ xgb xgboost hpo HPO +arange \ No newline at end of file diff --git a/docs/source/smooth_quant.md b/docs/source/smooth_quant.md index 015828d2ff2..9905e04c1c8 100644 --- a/docs/source/smooth_quant.md +++ b/docs/source/smooth_quant.md @@ -357,7 +357,7 @@ smooth_quant_args description: - False: Allow inserting mul to update the input distribution and no folding. IPEX (version>=2.1) can fuse inserted mul automatically. For Stock PyTorch, setting folding=False will convert the model to a QDQ model. -To find the best `alpha`, users can utilize the [auto-tuning]((./tuning_strategies.md)) feature. Compares to setting the alpha to `"auto"`, this tuning process uses the evaluation result on the entire dataset as the metric to find the best `alpha`. To use this feature, users need to provide a list of scalers between 0 and 1 for the `alpha` item. Here is an example: +To find the best `alpha`, users can utilize the [auto-tuning]((./tuning_strategies.md)) feature. Compares to setting the alpha to `"auto"`, this tuning process uses the evaluation result on the entire dataset as the metric to find the best `alpha`. To use this feature, users need to provide a list of scalars between 0 and 1 for the `alpha` item. Here is an example: ```python import numpy as np diff --git a/docs/source/tuning_strategies.md b/docs/source/tuning_strategies.md index 45da3681b58..5b92798fbbf 100644 --- a/docs/source/tuning_strategies.md +++ b/docs/source/tuning_strategies.md @@ -179,7 +179,7 @@ flowchart TD > `*` INC will detect the block pattern for [transformer-like](https://arxiv.org/abs/1706.03762) model by default. -> For [smooth quantization](./smooth_quant.md), user can tuning the its's alpha by providing a list of scalers for `alpha` item. The tuning process will take place at the **start** of the tuning procedure. For details usage, please refer to the [smooth quantization example](./smooth_quant.md#Example). +> For [smooth quantization](./smooth_quant.md), user can tuning the smooth quantization's alpha by providing a list of scalars for `alpha` item. The tuning process will take place at the **start** of the tuning procedure. For details usage, please refer to the [smooth quantization example](./smooth_quant.md#Example). **1.** Default quantization From 9ae7e2c892dc275b878828e8b5c14052018671ab Mon Sep 17 00:00:00 2001 From: yiliu30 Date: Wed, 16 Aug 2023 15:24:14 +0800 Subject: [PATCH 4/4] fix typos Signed-off-by: yiliu30 --- docs/source/tuning_strategies.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/tuning_strategies.md b/docs/source/tuning_strategies.md index 5b92798fbbf..860749fc1d5 100644 --- a/docs/source/tuning_strategies.md +++ b/docs/source/tuning_strategies.md @@ -179,7 +179,7 @@ flowchart TD > `*` INC will detect the block pattern for [transformer-like](https://arxiv.org/abs/1706.03762) model by default. -> For [smooth quantization](./smooth_quant.md), user can tuning the smooth quantization's alpha by providing a list of scalars for `alpha` item. The tuning process will take place at the **start** of the tuning procedure. For details usage, please refer to the [smooth quantization example](./smooth_quant.md#Example). +> For [smooth quantization](./smooth_quant.md), users can tune the smooth quantization alpha by providing a list of scalars for the `alpha` item. The tuning process will take place at the **start stage** of the tuning procedure. For details usage, please refer to the [smooth quantization example](./smooth_quant.md#Example). **1.** Default quantization