Skip to content

Commit fb8e503

Browse files
authored
fix document, example readme and code (#851)
Signed-off-by: Xin He <[email protected]>
1 parent b0d4f53 commit fb8e503

File tree

4 files changed

+67
-18
lines changed

4 files changed

+67
-18
lines changed

docs/source/smooth_quant.md

Lines changed: 15 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -309,13 +309,13 @@ conv2d/linear->conv2d/linear/layernorm/batchnorm/instancenorm/t5norm/llamanorm/g
309309
```
310310

311311
## Validated Models
312-
neural_compressor: 2.1
312+
Neural Compressor: 2.1
313313

314-
IPEX: 2.0
314+
IPEX (Intel Extension for PyTorch): 2.0
315315

316316
Dataset: lambada
317317

318-
task: text-generation
318+
Task: text-generation
319319

320320
alpha [0.4, 0.6] is sweet spot region in SmoothQuant paper
321321

@@ -351,18 +351,24 @@ smooth_quant_args description:
351351

352352
"alpha": "auto" or a float value. Default is 0.5. "auto" means automatic tuning.
353353

354-
"folding":
355-
- False: Allow inserting mul to update the input distribution and not absorbing. IPEX can fuse inserted mul automatically and folding=False is recommended. And for PyTorch FBGEMM backend, folding=False setting will only convert model to QDQ model.
356-
- True: Only allow inserting mul with the input scale that can be absorbed into the last layer.
357-
- If folding not set in config, the default value is IPEX: False (True if version<2.1), Stock PyTorch: True.
354+
"folding": whether to fold mul into the previous layer, where mul is required to update the input distribution during smoothing.
355+
- True: Fold inserted mul into the previous layer. IPEX will only insert mul for layers can do folding.
356+
- False: Allow inserting mul to update the input distribution and no folding. IPEX (version>=2.1) can fuse inserted mul automatically. For Stock PyTorch, setting folding=False will convert the model to a QDQ model.
358357

358+
## Supported Framework Matrix
359+
360+
| Framework | Alpha | Folding |
361+
|:---------:|--------------|------------|
362+
| PyTorch | [0-1] / 'auto' | False |
363+
| IPEX | [0-1] / 'auto' | True / False(Version>2.1) |
364+
| ONNX | [0-1] | True |
359365

360366
## Reference
361367

362-
[^1]: Jason, Wei, et al. "Emergent Abilities of Large Language Models". Published in Transactions on Machine Learning Research (2022)
368+
[^1]: Jason, Wei, et al. "Emergent Abilities of Large Language Models". Published in Transactions on Machine Learning Research (2022).
363369

364370
[^2]: Yvinec, Edouard, et al. "SPIQ: Data-Free Per-Channel Static Input Quantization." Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2023.
365371

366372
[^3]: Wei, Xiuying, et al. "Outlier suppression: Pushing the limit of low-bit transformer language models." arXiv preprint arXiv:2209.13325 (2022).
367373

368-
[^4]: Xiao, Guangxuan, et al. "Smoothquant: Accurate and efficient post-training quantization for large language models." arXiv preprint arXiv:2211.10438 (2022)..
374+
[^4]: Xiao, Guangxuan, et al. "Smoothquant: Accurate and efficient post-training quantization for large language models." arXiv preprint arXiv:2211.10438 (2022).

examples/pytorch/image_recognition/torchvision_models/quantization/ptq/cpu/fx/README.md

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,36 +33,67 @@ train val
3333
```shell
3434
python main.py -t -a resnet50 --pretrained /path/to/imagenet
3535
```
36+
or
37+
```shell
38+
bash run_tuning.sh --input_model=resnet50 --dataset_location=/path/to/imagenet
39+
bash run_benchmark.sh --input_model=resnet50 --dataset_location=/path/to/imagenet --mode=performance/accuracy --int8=true/false
40+
```
3641

3742
### 2. ResNet18
3843

3944
```shell
4045
python main.py -t -a resnet18 --pretrained /path/to/imagenet
4146
```
47+
or
48+
```shell
49+
bash run_tuning.sh --input_model=resnet18 --dataset_location=/path/to/imagenet
50+
bash run_benchmark.sh --input_model=resnet18 --dataset_location=/path/to/imagenet --mode=performance/accuracy --int8=true/false
51+
```
4252

4353
### 3. ResNeXt101_32x8d
4454

4555
```shell
4656
python main.py -t -a resnext101_32x8d --pretrained /path/to/imagenet
4757
```
58+
or
59+
```shell
60+
bash run_tuning.sh --input_model=resnext101_32x8d --dataset_location=/path/to/imagenet
61+
bash run_benchmark.sh --input_model=resnext101_32x8d --dataset_location=/path/to/imagenet --mode=performance/accuracy --int8=true/false
62+
```
4863

4964
### 4. InceptionV3
5065

5166
```shell
5267
python main.py -t -a inception_v3 --pretrained /path/to/imagenet
5368
```
69+
or
70+
```shell
71+
bash run_tuning.sh --input_model=inception_v3 --dataset_location=/path/to/imagenet
72+
bash run_benchmark.sh --input_model=inception_v3 --dataset_location=/path/to/imagenet --mode=performance/accuracy --int8=true/false
73+
```
5474

5575
### 5. Mobilenet_v2
5676

5777
```shell
5878
python main.py -t -a mobilenet_v2 --pretrained /path/to/imagenet
5979
```
80+
or
81+
```shell
82+
bash run_tuning.sh --input_model=mobilenet_v2 --dataset_location=/path/to/imagenet
83+
bash run_benchmark.sh --input_model=mobilenet_v2 --dataset_location=/path/to/imagenet --mode=performance/accuracy --int8=true/false
84+
```
6085

6186
### 6. Efficientnet_b0
6287

6388
```shell
6489
python main.py -t -a efficientnet_b0 --pretrained /path/to/imagenet
6590
```
91+
or
92+
```shell
93+
bash run_tuning.sh --input_model=efficientnet_b0 --dataset_location=/path/to/imagenet
94+
bash run_benchmark.sh --input_model=efficientnet_b0 --dataset_location=/path/to/imagenet --mode=performance/accuracy --int8=true/false
95+
```
96+
6697
> **Note**
6798
>
6899
> To reduce tuning time and get the result faster, the `efficientnet_b0` model uses
@@ -74,6 +105,12 @@ python main.py -t -a efficientnet_b0 --pretrained /path/to/imagenet
74105
```shell
75106
python main.py -t -a efficientnet_b3 --pretrained /path/to/imagenet
76107
```
108+
or
109+
```shell
110+
bash run_tuning.sh --input_model=efficientnet_b3 --dataset_location=/path/to/imagenet
111+
bash run_benchmark.sh --input_model=efficientnet_b3 --dataset_location=/path/to/imagenet --mode=performance/accuracy --int8=true/false
112+
```
113+
77114
> **Note**
78115
>
79116
> To reduce tuning time and get the result faster, the `efficientnet_b3` model uses
@@ -83,6 +120,12 @@ python main.py -t -a efficientnet_b3 --pretrained /path/to/imagenet
83120
```shell
84121
python main.py -t -a efficientnet_b7 --pretrained /path/to/imagenet
85122
```
123+
or
124+
```shell
125+
bash run_tuning.sh --input_model=efficientnet_b7 --dataset_location=/path/to/imagenet
126+
bash run_benchmark.sh --input_model=efficientnet_b7 --dataset_location=/path/to/imagenet --mode=performance/accuracy --int8=true/false
127+
```
128+
86129
> **Note**
87130
>
88131
> To reduce tuning time and get the result faster, the `efficientnet_b7` model uses

examples/pytorch/image_recognition/torchvision_models/quantization/ptq/cpu/fx/main.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -96,10 +96,10 @@
9696
def main():
9797
args = parser.parse_args()
9898

99-
if 'efficient' in args.arch:
100-
import torchvision.models as models
101-
else:
99+
if 'mobilenet' in args.arch:
102100
import torchvision.models.quantization as models
101+
else:
102+
import torchvision.models as models
103103

104104
if args.seed is not None:
105105
random.seed(args.seed)

examples/pytorch/image_recognition/torchvision_models/quantization/ptq/cpu/ipex/README.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -71,8 +71,8 @@ python main.py -t -a resnet18 --ipex --pretrained /path/to/imagenet
7171
```
7272
or
7373
```shell
74-
bash run_tuning.sh --topology=resnet18 --dataset_location=/path/to/imagenet
75-
bash run_benchmark.sh --topology=resnet18 --dataset_location=/path/to/imagenet --mode=benchmark/accuracy --int8=true/false
74+
bash run_tuning.sh --input_model=resnet18 --dataset_location=/path/to/imagenet
75+
bash run_benchmark.sh --input_model=resnet18 --dataset_location=/path/to/imagenet --mode=performance/accuracy --int8=true/false
7676
```
7777

7878
### 2. ResNet50 With Intel PyTorch Extension
@@ -82,8 +82,8 @@ python main.py -t -a resnet50 --ipex --pretrained /path/to/imagenet
8282
```
8383
or
8484
```shell
85-
bash run_tuning.sh --topology=resnet50 --dataset_location=/path/to/imagenet
86-
bash run_benchmark.sh --topology=resnet50 --dataset_location=/path/to/imagenet --mode=benchmark/accuracy --int8=true/false
85+
bash run_tuning.sh --input_model=resnet50 --dataset_location=/path/to/imagenet
86+
bash run_benchmark.sh --input_model=resnet50 --dataset_location=/path/to/imagenet --mode=performance/accuracy --int8=true/false
8787
```
8888

8989
### 3. ResNext101_32x16d With Intel PyTorch Extension
@@ -93,8 +93,8 @@ python main.py -t -a resnext101_32x16d_wsl --hub --ipex --pretrained /path/to/im
9393
```
9494
or
9595
```shell
96-
bash run_tuning.sh --topology=resnext101_32x16d_wsl --dataset_location=/path/to/imagenet
97-
bash run_benchmark.sh --topology=resnext101_32x16d_wsl --dataset_location=/path/to/imagenet --mode=benchmark/accuracy --int8=true/false
96+
bash run_tuning.sh --input_model=resnext101_32x16d_wsl --dataset_location=/path/to/imagenet
97+
bash run_benchmark.sh --input_model=resnext101_32x16d_wsl --dataset_location=/path/to/imagenet --mode=performance/accuracy --int8=true/false
9898
```
9999

100100
# Saving and Loading Model

0 commit comments

Comments
 (0)