You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/mixed_precision.md
+39-14Lines changed: 39 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,38 +20,63 @@ The recently launched 3rd Gen Intel® Xeon® Scalable processor (codenamed Coope
20
20
21
21
## Mixed Precision Support Matrix
22
22
23
-
|Framework |BF16 |
24
-
|--------------|:-----------:|
25
-
|TensorFlow |✔|
26
-
|PyTorch |✔|
27
-
|ONNX |plan to support in the future|
28
-
|MXNet |✔|
23
+
|Framework |BF16 |FP16 |
24
+
|--------------|:-----------:|:-----------:|
25
+
|TensorFlow |✔|:x:|
26
+
|PyTorch |✔|:x:|
27
+
|ONNX Runtime |✔|✔|
28
+
|MXNet |✔|:x:|
29
29
30
-
> **During quantization, BF16 conversion is default enabled. Please refer to this [document](./quantization_mixed_precision.md) for its workflow.**
30
+
> **During quantization, BF16 conversion is default enabled, FP16 can be executed if 'device' of config is 'gpu'. Please refer to this [document](./quantization_mixed_precision.md) for its workflow.**
31
31
32
32
## Get Started with Mixed Precision API
33
33
34
-
To get a bf16 model, users can use the Mixed Precision API as follows.
34
+
To get a bf16/fp16 model, users can use the Mixed Precision API as follows.
35
35
36
36
37
+
Supported precisions for mix precision include bf16 and fp16. If users want to get a pure fp16 or bf16 model, they should add another precision into excluded_precisions.
38
+
39
+
- BF16:
40
+
37
41
```python
38
42
from neural_compressor import mix_precision
39
43
from neural_compressor.config import MixedPrecisionConfig
> **BF16 conversion may lead to accuracy drop. Intel® Neural Compressor provides an accuracy-aware tuning function to reduce accuracy loss, which will fallback converted ops to FP32 automatically to get better accuracy. To enable this function, users only need to provide an evaluation function (or dataloader + metric).**
64
+
> **BF16/FP16 conversion may lead to accuracy drop. Intel® Neural Compressor provides an accuracy-aware tuning function to reduce accuracy loss, which will fallback converted ops to FP32 automatically to get better accuracy. To enable this function, users only need to provide an evaluation function (or dataloader + metric).**
48
65
49
66
50
67
## Examples
51
68
52
-
There are 2 pre-requirements to run BF16 mixed precision examples:
69
+
- BF16:
70
+
71
+
There are 2 pre-requirements to run BF16 mixed precision examples:
72
+
73
+
1. Hardware: CPU supports `avx512_bf16` instruction set.
74
+
2. Software: intel-tensorflow >= [2.3.0](https://pypi.org/project/intel-tensorflow/2.3.0/) or torch >= [1.11.0](https://download.pytorch.org/whl/torch_stable.html).
75
+
76
+
If either pre-requirement can't be met, the program would exit consequently.
53
77
54
-
- Hardware: CPU supports `avx512_bf16` instruction set.
55
-
- Software: intel-tensorflow >= [2.3.0](https://pypi.org/project/intel-tensorflow/2.3.0/) or torch >= [1.11.0](https://download.pytorch.org/whl/torch_stable.html).
78
+
- FP16
56
79
57
-
If either pre-requirement can't be met, the program would exit consequently.
80
+
Currently Intel® Neural Compressor only support FP16 mixed precision for ONNX models.
81
+
82
+
To run FP16 mixed precision examples, users need to set 'device' of config to 'gpu' and 'backend' to 'onnxrt_cuda_ep'.
0 commit comments