Skip to content

Commit aa4770d

Browse files
authored
Update the strategy docs (#1154)
Signed-off-by: yiliu30 <[email protected]>
1 parent b9ce61a commit aa4770d

File tree

3 files changed

+23
-3
lines changed

3 files changed

+23
-3
lines changed

.azure-pipelines/scripts/codeScan/pyspelling/inc_dict.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2713,3 +2713,4 @@ xgb
27132713
xgboost
27142714
hpo
27152715
HPO
2716+
arange

docs/source/smooth_quant.md

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -350,12 +350,26 @@ conf = PostTrainingQuantConfig(recipes=recipes)
350350
```
351351
smooth_quant_args description:
352352

353-
"alpha": "auto" or a float value. Default is 0.5. "auto" means automatic tuning.
353+
"alpha": "auto", a float value or a list of float values. Default is 0.5. "auto" means automatic tuning.
354354

355355
"folding": whether to fold mul into the previous layer, where mul is required to update the input distribution during smoothing.
356356
- True: Fold inserted mul into the previous layer. IPEX will only insert mul for layers can do folding.
357357
- False: Allow inserting mul to update the input distribution and no folding. IPEX (version>=2.1) can fuse inserted mul automatically. For Stock PyTorch, setting folding=False will convert the model to a QDQ model.
358358

359+
360+
To find the best `alpha`, users can utilize the [auto-tuning]((./tuning_strategies.md)) feature. Compares to setting the alpha to `"auto"`, this tuning process uses the evaluation result on the entire dataset as the metric to find the best `alpha`. To use this feature, users need to provide a list of scalars between 0 and 1 for the `alpha` item. Here is an example:
361+
362+
```python
363+
import numpy as np
364+
conf = PostTrainingQuantConfig(
365+
quant_level='auto', # quant_level can also be 1
366+
...
367+
recipes={"smooth_quant": True,
368+
"smooth_quant_args": {"alpha": np.arange(0.1, 0.5, 0.05).tolist(),}
369+
...
370+
})
371+
```
372+
359373
## Supported Framework Matrix
360374

361375
| Framework | Alpha | Folding |

docs/source/tuning_strategies.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ Tuning Strategies
4444
## Introduction
4545

4646
Intel® Neural Compressor aims to help users quickly deploy
47-
the low-precision inference solution on popular Deep Learning frameworks such as TensorFlow, PyTorch, ONNX, and MXNet. With built-in strategies, it automatically optimizes low-precision recipes for deep learning models to achieve optimal product objectives, such as inference performance and memory usage, with expected accuracy criteria. Currently, several strategies, including `O0`, `Basic`, `MSE`, `MSE_V2`, `HAWQ_V2`, `Bayesian`, `Exhaustive`, `Random`, `SigOpt`, `TPE`, etc are supported. By default, the `Basic` strategy is used for tuning.
47+
the low-precision inference solution on popular Deep Learning frameworks such as TensorFlow, PyTorch, ONNX, and MXNet. With built-in strategies, it automatically optimizes low-precision recipes for deep learning models to achieve optimal product objectives, such as inference performance and memory usage, with expected accuracy criteria. Currently, several tuning strategies, including `auto`, `O0`, `O1`, `Basic`, `MSE`, `MSE_V2`, `HAWQ_V2`, `Bayesian`, `Exhaustive`, `Random`, `SigOpt`, `TPE`, etc are supported. By default, the [`quant_level="auto"`](./tuning_strategies.md#auto) is used for tuning.
4848

4949
## Strategy Design
5050
Before tuning, the `tuning space` was constructed according to the framework capability and user configuration. Then the selected strategy generates the next quantization configuration according to its traverse process and the previous tuning record. The tuning process stops when meeting the exit policy. The function of strategies is shown
@@ -54,11 +54,13 @@ below:
5454

5555
### Tuning Space
5656

57-
Intel® Neural Compressor supports multiple quantization modes such as Post Training Static Quantization (PTQ static), Post Training Dynamic Quantization (PTQ dynamic), Quantization Aware Training, etc. One operator (OP) with a specific quantization mode has multiple ways to quantize, for example it may have multiple quantization scheme(symmetric/asymmetric), calibration algorithm(Min-Max/KL Divergence), etc. We use the `framework capability` to represent the methods that we have already supported. The `tuning space` includes all tuning items and their options. For example, the tuning items and options of the `Conv2D` (PyTorch) supported by Intel® Neural Compressor are as follows:
57+
Intel® Neural Compressor supports multiple quantization modes such as Post Training Static Quantization (PTQ static), Post Training Dynamic Quantization (PTQ dynamic), Quantization Aware Training, etc. One operator (OP) with a specific quantization mode has multiple ways to quantize, for example it may have multiple quantization scheme(symmetric/asymmetric), calibration algorithm(Min-Max/KL Divergence), etc. We use the [`framework capability`](./framework_yaml.md) to represent the methods that we have already supported. The `tuning space` includes all tuning items and their options. For example, the tuning items and options of the `Conv2D` (PyTorch) supported by Intel® Neural Compressor are as follows:
5858
![Conv2D_PyTorch_Cap](./imgs/Conv2D_PyTorch_Cap.png "Conv2D PyTorch Capability")
5959

6060
To incorporate the human experience and reduce the tuning time, user can reduce the tuning space by specifying the `op_name_dict` and `op_type_dict` in `PostTrainingQuantConfig` (`QuantizationAwareTrainingConfig`). Before tuning, the strategy will merge these configurations with framework capability to create the final tuning space.
6161

62+
> Note: Any options in the `op_name_dict` and `op_type_dict` that are not included in the [`framework capability`](./framework_yaml.md) will be ignored by the strategy.
63+
6264
### Exit Policy
6365
User can control the tuning process by setting the exit policy by specifying the `timeout`, and `max_trials` fields in the `TuningCriterion`.
6466

@@ -177,6 +179,9 @@ flowchart TD
177179
178180
> `*` INC will detect the block pattern for [transformer-like](https://arxiv.org/abs/1706.03762) model by default.
179181
182+
> For [smooth quantization](./smooth_quant.md), users can tune the smooth quantization alpha by providing a list of scalars for the `alpha` item. The tuning process will take place at the **start stage** of the tuning procedure. For details usage, please refer to the [smooth quantization example](./smooth_quant.md#Example).
183+
184+
180185
**1.** Default quantization
181186

182187
At this stage, it attempts to quantize OPs with the default quantization configuration which is consistent with the framework's behavior.

0 commit comments

Comments
 (0)