From 75132992d1bcadad0c7ec5db6dcace9f78d7cb50 Mon Sep 17 00:00:00 2001 From: "Sun, Xuehao" Date: Thu, 19 Sep 2024 10:13:34 +0800 Subject: [PATCH 1/2] Update model accuracy Signed-off-by: Sun, Xuehao --- docs/source/validated_model_list.md | 2834 +++++++++++---------------- 1 file changed, 1144 insertions(+), 1690 deletions(-) diff --git a/docs/source/validated_model_list.md b/docs/source/validated_model_list.md index fba2536cfe8..69b1f340bda 100644 --- a/docs/source/validated_model_list.md +++ b/docs/source/validated_model_list.md @@ -1,21 +1,18 @@ +# Validated Models -Validated Models -====== Intel® Neural Compressor validated examples with multiple compression techniques. The typical examples link can be found in [example tables](https://github.com/intel/neural-compressor/blob/master/examples/README.md), and the performance/accuracy results is available here. 1. [Validated Quantization Examples](#Validated-Quantization-Examples) - 1.1. [TensorFlow Models with TensorFlow 2.15.0](#tensorflow-models-with-tensorflow-2150) + 1.1. [TensorFlow Models with TensorFlow 2.16.1](#tensorflow-models-with-tensorflow-2161) - 1.2. [PyTorch Models with Torch 2.2.1+cpu in PTQ Mode](#pytorch-models-with-torch-221cpu-in-ptq-mode) + 1.2. [Keras Models with keras 2.15.1](#keras-models-with-keras-2151) - 1.3. [PyTorch Models with Torch 2.2.1+cpu in QAT Mode](#pytorch-models-with-torch-221cpu-in-qat-mode) + 1.2. [PyTorch Models with Torch 2.3.0+cpu in PTQ Mode](#pytorch-models-with-torch-230cpu-in-ptq-mode) - 1.4. [PyTorch Models with Torch 2.0.1+cpu in WOQ Mode](#pytorch-models-with-torch-201cpu-in-woq-mode) + 1.3. [PyTorch Models with Torch 2.3.0+cpu in QAT Mode](#pytorch-models-with-torch-230cpu-in-qat-mode) - 1.5. [ONNX Models with ONNX Runtime 1.17.1](#onnx-models-with-onnx-runtime-1171) - - 1.6. [ONNX Models with ONNX Runtime 1.15.0 in WOQ Mode](#onnx-models-with-onnx-runtime-1150-in-woq-mode) + 1.4. [ONNX Models with ONNX Runtime 1.18.1](#onnx-models-with-onnx-runtime-1181) 2. [Validated Pruning Examples](#Validated-Pruning-Examples) @@ -25,14 +22,14 @@ Intel® Neural Compressor validated examples with multiple compression technique ## Validated Quantization Examples -System summary: Test by Intel on 3/18/2024. 1-node, 1x Intel(R) Xeon(R) Platinum 8480+ @3.8GHz, 56 cores/socket, HT On, Turbo On, Total Memory 256GB (16x16GB DDR5 4800 MT/s [4800 MT/s]), BIOS 3A14.TEL2P1, microcode 0x2b0001b0, -CentOS Stream 8, gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), DL Models, Frameworks: TensorFlow/ONNXRT/PyTorch, Datatype: FP32/INT8/BF16. +System summary: Test by Intel on 7/22/2024. 1-node, 1x Intel(R) Xeon(R) Platinum 8480+ @3.8GHz, 56 cores/socket, HT On, Turbo On, Total Memory 512GB (16x32GB DDR5 4800 MT/s [4800 MT/s]), BIOS EGSDCRB1.SYS.0081.D18.2205301336, microcode 0x2b000590, +Ubuntu 24.04 LTS, gcc (GCC) 13.2.0 (Ubuntu 13.2.0-23ubuntu4), DL Models, Frameworks: TensorFlow/ONNXRT/PyTorch, Datatype: FP32/INT8/BF16. Using 1 socket, 4 cores/instance, 14 instances and batch size 1 to benchmark most of the model. Performance varies by use, configuration and other factors. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks -### TensorFlow Models with TensorFlow 2.15.0 +### TensorFlow Models with TensorFlow 2.16.1 @@ -55,277 +52,297 @@ For more complete information about performance and benchmark results, visit www - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + + + + + + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + + + + + + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - - - + + + + + + + + - + - - - - - - + + + + + +
ResNet50 v1.0 pb74.11%74.27%-0.22%1720.00582.182.95x0.741120.74274-0.002181732.923578.8792.993584
ResNet50 v1.5 pb76.25%76.46%-0.28%1517.38570.652.66x0.76250.76464-0.00281535.199529.9992.896607
ResNet101 pb77.52%76.45%1.41%1058.93382.962.77x0.775240.764480.0140751048.361384.0182.729979
Inception V1 pb70.45%69.74%1.03%2080.56951.852.19x0.704540.697380.0102672079.239927.8172.241001
Inception V2 pb74.33%73.97%0.49%1587.53863.371.84x0.743260.739660.0048671644.364840.5311.95634
Inception V3 pb76.72%76.75%-0.03%1052.91434.272.42x0.76720.76746-0.000341076.103401.8852.677639
Inception V4 pb80.13%80.27%-0.18%707.41234.383.02x0.801260.80272-0.00182704.961199.283.53754
Inception ResNet V2 pb80.25%80.40%-0.18%320.37179.461.79x0.802520.80398-0.00182313.966178.2741.761143
DenseNet-161pb0.762880.762862.62E-05279.204214.0311.304503
MobileNet V1 pb71.79%70.96%1.18%4312.311512.592.85x0.71790.709560.0117544199.1311506.6812.787007
MobileNet V2 pb72.48%71.76%1.01%2287.771406.751.63x0.724840.717560.0101452170.3871445.0451.501951
VGG16 pb72.69%70.89%2.55%1367.34207.416.59x0.726920.708860.0254781388.622203.3936.827285
VGG19 pb72.67%71.01%2.33%1244.82176.797.04x0.726720.710140.0233481236.115169.7367.282574
ResNet50pb0.69090.690280.000898411.79284.5341.447244
ResNetV2 50 pb70.37%69.64%1.05%780.51582.961.34x0.703740.696420.010511779.416539.5351.444607
ResNetV2 101 pb72.64%71.87%1.08%494.43329.511.50x0.726440.71870.010769492.002295.7661.663484
ResNetV2 152 pb73.12%72.37%1.04%349.42235.481.48x0.73120.723680.010391348.385205.7171.693516
Densenet   161ViT pb76.29%76.29%0.00%282.31223.191.26x0.813920.8192-0.00645230.53132.6571.73779
SSD ResNet50 V1 pb37.91%38.00%-0.24%139.4930.994.50x0.37910.38-0.00237135.71128.7524.720054
SSD MobileNet V1 pb23.00%23.13%-0.57%1284.41756.561.70x0.229950.23127-0.005711237.695719.2991.720696
SSD ResNet50 v1 ckpt37.88%38.00%-0.31%139.5627.795.02x0.378810.38-0.00313130.53722.0465.921119
SSD MobileNet v1 ckpt22.96%23.13%-0.71%1280.88530.232.42x0.229630.23127-0.007091234.562529.3422.332258
Faster R-CNN ResNet101 pb30.32%30.39%-0.22%161.1923.806.77x0.303190.30387-0.00224144.21122.6376.370588
Faster R-CNN ResNet50 pb26.61%26.59%0.09%178.8929.206.13x0.26610.265860.000903164.55328.3765.79902
YOLOv3 pb83.28%82.35%1.12%249.3594.442.64x0.832790.823530.011244247.55981.4473.03951
BERT large SQuAD pb92.4492.99-0.58%46.5420.372.28x92.4443292.98612-0.0058349.17417.5192.806895
BERT large SQuAD (ONNX Model Zoo) pb92.3692.98-0.67%42.6520.792.05x92.3606292.98047-0.0066745.05917.552.567464
BERT base MRPCckpt85.78%86.52%-0.85%390.36212.961.83xTransformer LTpb25.81525.855-0.0015528.98815.7741.837708
VITTransformer lt MLPerf pb81.39%81.92%-0.64%230.91142.241.62x27.1296727.16596-0.0013410.2695.082.021457
-### PyTorch Models with Torch 2.2.1+cpu in PTQ Mode +### Keras Models with keras 2.15.1 @@ -346,310 +363,401 @@ For more complete information about performance and benchmark results, visit www - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - - - - - - - + + + + + + + - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ResNet18static69.59%69.76%-0.24%1989.72600.453.31xInception ResNet V2pb0.802520.80398-0.00182313.966178.2741.761143
Inception V3pb0.76720.76746-0.000341076.103401.8852.677639
MobileNet V2pb0.714920.71756-0.00368947.439779.5061.215435
ResNet101pb0.775240.764480.0140751048.361384.0182.729979
ResNet50static75.98%76.15%-0.21%1165.92303.913.84xpb0.69090.690280.000898411.79284.5341.447244
Inception V3ResNet50pb0.78070.7812-0.00064680.56498.0821.366361
ResNetV2 101pb0.726440.71870.010769492.002295.7661.663484
ResNetV2 50pb0.703740.696420.010511779.416539.5351.444607
VGG16pb0.726920.708860.0254781388.622203.3936.827285
VGG19pb0.726720.710140.0233481236.115169.7367.282574
+ +### PyTorch Models with Torch 2.3.0+cpu in PTQ Mode + + + + + + + + + + + + + + + + + + + + + - - - - - - + + + + + + - + - - - - - - + + + + + + - + - - - - - - + + + + + + - + - - - - - - + + + + + + - + - - - - - - + + + + + + - + - - - - - - + + + + + + - + - - - - - - + + + + + + - + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - - - - - - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - - - + + + + + + + + - - - - - - + + + + + + - - - - - + + + + +
ModelExampleAccuracyPerformance 1s4c14ins1bs
Throughput(samples/sec)
INT8FP32Accuracy Ratio
[(INT8-FP32)/FP32]
INT8FP32Performance Ratio
[INT8/FP32]
ResNet18 static69.46%69.52%-0.09%953.35302.523.15x0.695940.6976-0.002381707.522602.4692.834207
ResNeSt50EfficientNet-B3 static80.76%81.04%-0.35%365.4439.669.21x0.777780.78544-0.00975513.818360.0221.427185
ResNeXt101_32x8dPeleeNet static78.92%79.31%-0.49%548.78104.145.27x0.71830.721-0.00374837.834541.6571.546798
Efficientnet_b0ResNet50 static76.94%77.67%-0.94%636.62566.421.12x0.759840.76146-0.002131135.222311.4663.64477
Efficientnet_b3Inception V3 static77.78%78.54%-0.98%471.61358.591.32x0.694560.69522-0.00095948.026322.5522.939142
PeleenetResNeSt50 static71.83%72.10%-0.37%790.03504.441.57x0.807580.8104-0.00348406.10639.65610.24072
YOLO V3ResNeXt101_32x8d static55.10%54.93%0.31%162.9857.372.84x0.789220.79308-0.00487582.215106.7315.454976
SSD ResNet34YOLO V3 static19.4819.63-0.77%137.8911.6111.88x0.550950.549260.003077156.28760.2962.591996
Roberta base MRPC static92.97%93.59%-0.66%390.95175.442.23x0.9314080.935943-0.00485396.854176.8022.244624
CamemBERT base MRPC static88.47%89.28%-0.91%393.70174.512.26x0.8858130.892794-0.00782405.365182.8712.216672
DistilBERT base MRPC static90.30%90.27%0.04%783.37344.912.27x0.9063550.9026850.004066799.05346.4982.306074
DistilBERT base MRPC dynamic90.02%90.27%-0.28%684.20344.681.99x0.9001690.902685-0.00279705.909348.1572.027559
ALBERT base MRPC static92.63%92.63%0.00%312.48155.602.01x
Funnel   MRPCstatic91.94%92.25%-0.34%281.83179.041.57x0.9228070.9228070350.775164.3192.13472
Xlm Roberta MRPC static89.46%88.62%0.94%395.91173.592.28x0.8779660.886207-0.0093396.063175.9632.250831
Xlm Roberta MRPC dynamic88.54%88.24%0.35%373.90173.912.15x0.8854170.8823530.003472381.19175.9612.166332
BERT base MRPC static89.56%90.42%-0.95%405.08176.382.30x0.8959440.904202-0.00913402.419177.7262.264266
BERT base COLA static52.86%53.39%-0.99%395.37177.372.23x0.5347380.5338770.001612395.246177.0242.232726
BERT base STSB static87.39%88.05%-0.74%396.71173.802.28x0.8761140.880462-0.00494397.62177.2332.243487
BERT base SST-2 static91.97%92.32%-0.37%393.20173.652.26x0.9197250.923165-0.00373407.661182.9342.228459
BERT large COLA static62.80%63.35%-0.88%136.5551.822.64x0.6339250.6335320.00062147.86256.0082.640016
BERT base RTE static73.29%72.56%1.00%377.79173.842.17x0.7184120.725632-0.00995397.827177.3992.242555
BERT large MRPC static89.36%90.38%-1.12%136.7251.872.64x0.9006620.90378-0.00345146.83852.9732.77194
BERT large QNLI static90.79%91.54%-0.82%391.67173.822.25x0.9112210.915431-0.0046394.508176.9182.229892
BERT large RTE static73.29%74.01%-0.98%135.2051.902.61x0.7364620.740072-0.00488148.83755.8342.665705
BERT large RTEdynamic73.29%74.01%-0.98%117.1451.742.26xFunnel MRPCstatic0.9193830.922547-0.00343294.763187.4131.572799
BERT large SQuAD static92.2993.16-0.93%32.6116.881.93x92.3417393.15842-0.0087750.20518.692.686196
lvwerra/pegasus-samsum static 42.3242.67-0.82%93.8037.592.50x42.6716-0.00824102.73437.9932.704024
- -### PyTorch Models with Torch 2.2.1+cpu in QAT Mode +### PyTorch Models with Torch 2.3.0+cpu in QAT Mode @@ -672,738 +780,37 @@ For more complete information about performance and benchmark results, visit www - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - - - - - - - - - - - + + + + + +
ResNet18 static69.74%69.76%-0.03%1981.66598.393.31x0.697360.6976-0.000341717.593602.6482.850077
ResNet50 static76.03%76.15%-0.15%1095.95298.923.67x0.760340.76146-0.001471091.623305.833.569378
ResNeXt101_32x8d static79.31%79.31%0.00%549.02103.725.29x
BERT base MRPCstatic89.40%90.40%-1.11%375.61176.152.13x0.793080.793080584.537107.3825.443529
- -### PyTorch Models with Torch 2.0.1+cpu in WOQ Mode - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Model nameConfigurationLambada_openaiHellaswagWinograndePiqaAverage
[Mean accuracy of previous four tasks]
Wikitext
AccuracyAccuracyAccuracyAccuracyAccuracyAccuracy Ratio
[INT4/FP32]
Word_perplexity
EleutherAI/gpt-j-6bFP320.68310.49540.64090.75410.6434/10.8816
GPTQ
W4G128Asym
0.6790.48950.64330.74760.63990.994511.0999
GPTQ
W4G32Asym
0.68290.49230.64010.74860.64100.996311.0141
GPTQ
W4G128Sym
0.6850.49070.63610.74430.63900.993211.1498
GPTQ
W4G32Sym
0.69110.48990.64480.74970.64391.000811.0927
facebook/opt-6.7bFP320.67690.50490.65430.76280.6497/12.2862
GPTQ
W4G32Asym
0.68040.49840.65350.75680.64730.996212.4193
GPTQ
W4G32Sym
0.68850.49730.64330.7530.64550.993512.4607
decapoda-research/llama-7b-hfFP320.73610.56420.67090.78350.6887/9.4202
GPTQ
W4G32Asym
0.72440.56030.66140.78350.68240.99099.5881
decapoda-research/llama-13b-hfFP320.76270.59110.70090.78780.7106/8.212
GPTQ
W4G128Asym
0.75180.58430.69610.79110.70580.99328.4319
GPTQ
W4G32Asym
0.75720.58980.70560.78940.71050.99988.3429
GPTQ
W4G128Sym
0.75960.58410.69770.79050.70800.99638.4916
decapoda-research/llama-30b-hfFP320.77590.62660.72770.80960.7350/6.2384
GPTQ
W4G128Asym
0.7780.6240.72690.80470.73340.99796.4237
GPTQ
W4G32Asym
0.77060.62390.72850.80580.73220.99636.4697
GPTQ
W4G128Sym
0.78360.61950.72690.80470.73370.99836.5604
meta-llama/Llama-2-7b-chat-hfFP320.70580.57320.6480.77150.6746/11.7107
GPTQ
W4G128Asym
0.69820.56370.65270.77040.67130.995011.9702
GPTQ
W4G32Asym
0.69530.56820.65750.77580.67420.999411.9317
meta-llama/Llama-2-7b-hfFP320.73920.5670.67090.78350.6902/8.7911
GPTQ
W4G32Asym
0.73530.56420.66220.78290.68620.99428.9635
GPTQ
W4G128Sym
0.72460.56170.67560.77970.68540.99319.2799
meta-llama/Llama-2-13b-chat-hfFP320.73120.60590.71030.78350.7077/10.2213
GPTQ
W4G128Asym
0.72730.60180.70880.77420.70300.99342538.083
GPTQ
W4G32Asym
0.72830.60530.70240.77640.70310.99351889.374
GPTQ
W4G128Sym
0.7270.59970.70240.7780.70180.99162504.497
meta-llama/Llama-2-13b-hfFP320.76770.59720.69610.78780.7122/7.8984
GPTQ
W4G128Asym
0.76270.59330.6890.78510.70750.99341556.448
GPTQ
W4G32Asym
0.76750.59340.69770.78560.71110.99841514.927
GPTQ
W4G128Sym
0.75660.58990.70320.78560.70880.99531374.728
bigscience/bloom-7b1FP320.57640.46280.64560.72690.6029/30.6438
GPTQ
W4G32Sym
0.57990.45420.63610.73120.60040.995732.0626
bigscience/bloomz-7b1FP320.55930.47890.65270.76280.6134/51.7432
GPTQ
W4G32Asym
0.55250.47310.65040.76170.60940.993552.7828
databricks/dolly-v1-6bFP320.68660.50980.64330.76220.6505/11.3242
GPTQ
W4G128Asym
0.68780.50580.63930.76330.64910.997811.5514
GPTQ
W4G32Asym
0.68640.50840.65190.75680.65091.000611.4728
GPTQ
W4G128Sym
0.68760.50450.64330.75410.64740.995211.6474
databricks/dolly-v2-7bFP320.63790.52820.6140.74480.6312/16.161
GPTQ
W4G32Asym
0.63770.52280.59910.74480.62610.991916.4096
EleutherAI/gpt-neo-2.7bFP320.62240.42710.5770.7220.5871/13.9359
GPTQ
W4G128Asym
0.61230.42270.57380.72030.58230.991714.3377
GPTQ
W4G32Asym
0.6150.42590.57140.72470.58430.995114.2083
GPTQ
W4G32Sym
0.61540.42080.57770.71980.58340.993714.3121
EleutherAI/gpt-neox-20bFP320.72330.53590.66140.77530.6740/9.195
GPTQ
W4G128Asym
0.71860.53280.65350.76990.66870.99229.3463
GPTQ
W4G32Asym
0.72680.5330.6590.77150.67260.99799.2897
mosaicml/mpt-7bFP320.70560.57180.68590.79270.6890/9.9324
GPTQ
W4G128Asym
0.70060.56550.68030.79650.68570.995210.1515
mosaicml/mpt-7b-chatFP320.6550.57520.67480.78450.6724/13.5951
GPTQ
W4G128Asym
0.64720.57160.66850.7840.66780.993213.8539
mosaicml/mpt-7b-instructFP320.69180.58190.6780.79270.6861/10.8863
GPTQ
W4G128Asym
0.68640.57650.68270.78730.68320.995811.1451
mosaicml/mpt-7b-storywriterFP320.6930.54770.6630.7840.6719/9.9125
GPTQ
W4G128Asym
0.68540.54430.66610.78130.66930.996110.1137
tiiuae/falcon-rw-7bFP320.66040.54190.65980.77530.6594/11.7616
GPTQ
W4G128Asym
0.64840.53690.65750.78070.65590.994711.9411
GPTQ
W4G32Asym
0.65710.53980.65820.77640.65790.997811.8809
GPTQ
W4G128Sym
0.6520.5350.65750.76820.65320.990612.0048
tiiuae/falcon-7b-instructFP320.64370.51770.66690.78240.6527/14.5053
GPTQ
W4G128Asym
0.63010.51420.66540.78350.64830.993314.8146
GPTQ
W4G32Asym
0.63770.5170.65980.78070.64880.994114.6953
- - -### ONNX Models with ONNX Runtime 1.17.1 +### ONNX Models with ONNX Runtime 1.18.1 @@ -1424,859 +831,906 @@ For more complete information about performance and benchmark results, visit www - + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + + + + + + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + + + + + + + + + + + - - - - - - + + + + + + - + + + + + + + + + + + - - - - - - + + + + + + - + + + + + + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - + - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + - - - - - - + + + + + + - + + + + + + + + + + + + + + + + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - + + + + + + - - - - - - - - - - - - - - - - + + + + + + - + - - - - - - + + + + + + - + - - - - - - + + + + + + - + - - - - - - + + + + + + - + - - - - - - + + + + + + - + - - - - - - + + + + + + - + - - - - - - + + + + + + - + - - - - - - + + + + + + - + - - - - - - + + + + + + - + - - - - - - + + + + + + - + - - - - - - + + + + + + - - - - - - - - - - - + - - - - - - + + + + + + - + - - - - - - + + + + + + - + - - - - - - + + + + + + - + - - - - - - + + + + + + - + - - - - - - + + + + + + - + - - - - - - + + + + + + - + - - - - - - + + + + + + - - - - - - - - - - - + - - - - - - + + + + + + - + - - - - - - + + + + + + - + - - - - - - + + + + + + - + - - - - - - + + + + + + - + - - - - - - + + + + + + - + - - - - - - + + + + + + - -
ResNet50   V1.5ResNet50 V1.5 qlinearops72.16%72.29%-0.18%1666.73734.162.27x0.72180.72294-0.001581495.719715.9372.089177
ResNet50 V1.5 qdq72.19%72.29%-0.15%1658.10734.332.26x0.72130.72294-0.002271547.302717.0272.157941
ResNet50 V1.5 MLPerf qlinearops76.15%76.46%-0.41%1495.15733.592.04x0.761460.76462-0.004131365.556718.5461.900443
ResNet50 V1.5 MLPerf qdq76.12%76.46%-0.44%1661.90732.042.27x0.761280.76462-0.004371445.751718.9572.010901
ResNet50 V1.5 (ONNX Model Zoo) qlinearops74.77%74.99%-0.29%1713.86767.912.23x0.747680.74988-0.002931574.384749.3562.100983
ResNet50 V1.5 (ONNX Model Zoo) qdq74.48%74.99%-0.67%1747.21770.142.27x
MobileNet V2qlinearops65.55%66.89%-2.01%7519.954430.841.70x
MobileNet V2qdq65.60%66.89%-1.93%7572.974413.581.72x
MobileNet V2 (ONNX Model Zoo)qlinearops68.51%69.48%-1.41%7190.264019.161.79x0.747840.74988-0.002721564.149755.5772.070138
VGG16 qlinearops66.55%66.69%-0.20%613.47170.953.59x0.665520.66688-0.00204526.569162.6423.237595
VGG16 qdq66.62%66.69%-0.11%611.78186.213.29x0.666160.66688-0.00108520.089172.4213.01639
VGG16 (ONNX Model Zoo) qlinearops72.37%72.40%-0.04%619.00184.353.36x0.723660.72396-0.00041558.812162.873.431031
VGG16 (ONNX Model Zoo) qdq72.37%72.40%-0.03%623.02172.273.62x0.723640.72396-0.00044556.581176.9173.146001
MobileNet V3 MLPerf qlinearops75.51%75.74%-0.30%5711.042584.172.21x0.75510.7574-0.003045421.7162578.0762.103009
MobileNet V3 MLPerf qdq75.51%75.74%-0.30%6136.362630.212.33x0.75510.7574-0.003045382.872567.4792.096559
ShuffleNet V2 (ONNX Model Zoo) qlinearops66.13%66.36%-0.36%6820.893686.461.85x0.661260.66364-0.003596426.2213725.6861.724842
ShuffleNet V2 (ONNX Model Zoo)qdq0.662160.66364-0.002236534.243707.7351.762327
GoogleNet (ONNX Model Zoo) qlinearops67.69%67.79%-0.14%1971.181120.081.76x0.676940.67786-0.001361842.9041137.5781.620024
GoogleNet (ONNX Model Zoo) qdq67.64%67.79%-0.22%1838.281142.351.61x0.677140.67786-0.001061818.9941136.371.600706
SqueezeNet (ONNX Model Zoo) qlinearops56.49%56.87%-0.67%10163.135771.891.76x0.564860.56868-0.006729521.9855530.3621.721765
SqueezeNet (ONNX Model Zoo) qdq56.33%56.87%-0.94%10339.146002.841.72x0.564860.56868-0.006729391.0725519.7931.701345
CaffeNet (ONNX Model Zoo) qlinearops56.26%56.30%-0.07%2805.961077.802.60x0.562620.56304-0.000752949.36893.7723.299902
CaffeNet (ONNX Model Zoo) qdq56.18%56.30%-0.21%4351.65822.715.29x0.562580.56304-0.000822847.241901.1543.15955
AlexNet (ONNX Model Zoo) qlinearops54.73%54.79%-0.10%2169.83893.062.43x0.547320.54786-0.000992070.166816.7052.534778
AlexNet (ONNX Model Zoo) qdq54.74%54.79%-0.08%2232.07841.462.65x0.547120.54786-0.001352059.133844.9732.436922
ZFNet (ONNX Model Zoo) qlinearops55.83%55.96%-0.24%921.09525.211.75x0.558260.5596-0.00239858.761461.2541.861796
ZFNet (ONNX Model Zoo) qdq55.82%55.96%-0.24%925.69534.051.73x0.558680.5596-0.00164853.765457.9111.864478
Inception V1 (ONNX Model Zoo) qlinearops67.23%67.24%-0.02%1862.371161.551.60x0.672280.67242-0.000211891.3621205.9481.568361
Inception V1 (ONNX Model Zoo) qdq67.19%67.24%-0.07%1956.471262.641.55x0.672280.67242-0.000211879.2681202.1931.5632
BEiT (ONNX Model Zoo)qlinearops85.07285.284-0.00249205.151126.5871.620632
EfficientNet (ONNX Model Zoo) qlinearops77.02%77.11%-0.12%2793.231383.392.02x0.770180.77112-0.001222428.3151344.0251.806748
BEITEfficientNet (ONNX Model Zoo)qdq0.76990.77112-0.001582286.7261307.181.749358
DenseNet (ONNX Model Zoo) qlinearops85.0785.28-0.25%206.50128.131.61x0.605260.60958-0.00709626.256499.7621.253108
SSD (ONNX Model Zoo)SSD MobileNet V1 (ONNX Model Zoo)qlinearops0.22960.23022-0.002691121.431841.3241.332936
SSD MobileNet V1 (ONNX Model Zoo) qdq18.62%18.98%-1.90%56.9714.573.91x0.22960.23022-0.002691048.501798.2181.313552
DUC (ONNX Model Zoo) qlinearops81.62%81.92%-0.37%8.765.031.74x0.816220.81922-0.003669.2564.9881.855654
Ultra Face (ONNX Model Zoo) qlinearops83.33%83.65%-0.38%8780.521920.304.57x0.833250.83646-0.003848993.5761988.4594.522887
Emotion FERPlus (ONNX Model Zoo) qlinearops7.94%8.00%-0.70%6360.853067.122.07x0.079410.07997-0.0076113.7443087.5011.980159
ArcFace (ONNX Model Zoo) qlinearops99.82%99.80%0.02%449.50235.011.91x0.998170.9980.00017442.845230.7511.919147
BERT base MRPC qlinearops85.78%86.03%-0.28%511.36225.152.27x0.855390.86029-0.0057483.812219.4542.204617
BERT base MRPC qdq85.78%86.03%-0.28%484.44222.432.18x0.855390.86029-0.0057485.079218.3342.221729
BERT base MRPC integerops85.78%86.03%-0.28%728.48222.353.28x0.852940.86029-0.00854684.459218.8593.127397
DistilBERT base MRPC qdq85.05%84.56%0.58%635.93405.581.57x0.840690.84559-0.00579633.278399.3091.585935
DistilBERT base MRPC integerops85.29%84.56%0.87%1324.26405.483.27x0.855390.845590.011591388.435401.0773.461767
Roberta base MRPCMobile bert MRPC qdq88.24%89.95%-1.91%484.00223.372.17x0.855390.86275-0.00853505.624387.4311.305069
Mobile bert MRPCintegerops0.855390.86275-0.00853565.463386.391.463451
Roberta base MRPCintegerops0.909310.899510.010895702.165219.4953.199002
BERT SQuAD (ONNX Model Zoo) integerops80.2980.67-0.47%244.9399.292.47x80.2932880.67171-0.00469242.58297.7122.482622
BERT base cased MRPC (HuggingFace)MobileBERT SQuAD MLPerf (ONNX Model Zoo)integerops89.8735490.0265-0.0017151.685125.3491.210101
GPT2 lm head WikiText (ONNX Model Zoo)integerops31.9844828.996140.1030617.96410.2071.759969
BERT base uncased MRPC (HuggingFace) qlinearops90.21%90.42%-0.23%440.17214.152.06x0.90210.9042-0.00232434.653210.5822.064056
BERT base uncased MRPC (HuggingFace) integerops89.58%90.42%-0.93%715.22201.243.55x0.895770.9042-0.00932708.657210.7433.36266
Roberta base MRPC (HuggingFace) qlinearops91.00%91.38%-0.41%434.48214.202.03x0.910020.91379-0.00413431.371211.032.044122
Roberta base MRPC (HuggingFace) integerops90.85%91.38%-0.58%714.20213.543.34x0.908460.91379-0.00583711.112210.7113.374821
XLM Roberta base MRPC (HuggingFace) qlinearops89.37%90.10%-0.81%339.02214.411.58x0.893690.90102-0.00814334.877211.5611.582886
XLM Roberta base MRPC (HuggingFace) integerops89.66%90.10%-0.50%406.04215.121.89x0.896550.90102-0.00496401.993211.4291.901314
Camembert base MRPC (HuggingFace)integerops89.19%89.28%-0.10%712.67217.683.27x
MiniLM L12 H384 uncased MRPC (HuggingFace) qlinearops90.13%90.97%-0.93%1209.98588.932.05x0.892790.892790282.3213.3261.323327
MiniLM L12 H384 uncased MRPC (HuggingFace)Camembert base MRPC (HuggingFace) integerops91.07%90.97%0.10%1268.43588.052.16x0.891890.89279-0.00101707.223214.2283.301263
DistilBERT base uncased SST-2 (HuggingFace)MiniLM L12 H384 uncased MRPC   (HuggingFace) qlinearops90.71%91.06%-0.38%1253.85399.523.14x0.901260.90973-0.009311188.05578.3492.054209
DistilBERT base uncased SST-2 (HuggingFace)MiniLM L12 H384 uncased MRPC   (HuggingFace) integerops90.25%91.06%-0.88%925.68399.542.32x0.910680.909730.0010441285.128576.0372.230982
MiniLM L6 H384 uncased SST-2 (HuggingFace)DistilBERT base uncased SST-2   (HuggingFace) qlinearops89.45%90.14%-0.76%2209.721139.621.94x0.907110.91055-0.003781259.685396.5993.176218
MiniLM L6 H384 uncased SST-2 (HuggingFace)DistilBERT base uncased SST-2   (HuggingFace) integerops89.91%90.14%-0.26%2365.971137.322.08x0.902520.91055-0.00882914.626395.0852.315011
BERT base cased MRPC (HuggingFace)Albert base v2 SST-2 (HuggingFace) qlinearops87.70%88.29%-0.67%497.73214.322.32x0.920870.92317-0.00249284.622210.5211.351989
BERT base cased MRPC (HuggingFace)Albert base v2 SST-2 (HuggingFace) integerops88.19%88.29%-0.12%718.26214.323.35x0.917430.92317-0.00622284.686209.9951.35568
Electra small discriminator MRPC (HuggingFace)MiniLM L6 H384 uncased SST-2   (HuggingFace) qlinearops89.92%89.83%0.09%1951.071142.891.71x0.89450.90138-0.007632172.9831121.661.937292
Electra small discriminator MRPC (HuggingFace)MiniLM L6 H384 uncased SST-2   (HuggingFace) integerops89.27%89.83%-0.63%2198.931129.201.95x0.899080.90138-0.002552326.2651114.5722.087137
BERT mini MRPC (HuggingFace)BERT base cased MRPC (HuggingFace) qlinearops86.21%86.52%-0.35%5814.173388.021.72x0.877020.88294-0.0067494.961210.7952.348068
BERT mini MRPC (HuggingFace)integerops86.16%86.52%-0.41%6396.893445.061.86x
BART large MRPC (HuggingFace)BERT base cased MRPC (HuggingFace) integerops92.36%91.20%1.28%126.3152.282.42x0.881860.88294-0.00122714.614210.9913.386941
Spanbert SQuAD (HuggingFace)Electra small discriminator MRPC   (HuggingFace) qlinearops91.1491.98-0.91%75.8643.481.74x0.899150.898310.0009351998.7131115.1841.792272
Spanbert SQuAD (HuggingFace)Electra small discriminator MRPC   (HuggingFace) integerops91.4091.98-0.63%92.2443.512.12x0.892670.89831-0.006282202.8121121.4061.96433
Bert base multilingual cased SQuAD (HuggingFace)BERT mini MRPC (HuggingFace) qlinearops88.4289.13-0.79%79.0643.451.82x0.862130.86515-0.003495767.2293254.7921.771919
Bert base multilingual cased SQuAD (HuggingFace)BERT mini MRPC (HuggingFace) integerops88.7089.13-0.48%93.0343.232.15x0.861590.86515-0.004116354.6553424.4231.855686
DistilBert base uncased SQuAD (HuggingFace)Xlnet base cased MRPC (HuggingFace) qlinearops86.3386.86-0.62%118.6868.431.73x0.900520.89860.002137121.24395.5621.268737
DistilBert base uncased SQuAD (HuggingFace)Xlnet base cased MRPC (HuggingFace) integerops86.0586.86-0.94%186.3368.412.72x0.895830.8986-0.00308123.05595.5981.287213
BERT large uncased whole word masking SQuAD (HuggingFace)qlinearops92.3493.16-0.88%28.6713.122.19x
BERT large uncased whole word masking SQuAD (HuggingFace)BART large MRPC (HuggingFace) integerops92.9993.16-0.18%32.3213.142.46x0.92360.911970.012753126.13651.0552.470591
Roberta large SQuAD v2 (HuggingFace)DeBERTa v3 base MRPC (HuggingFace) integerops89.0489.020.02%32.3713.402.42x0.923860.922260.001735193.159153.1571.261183
LayoutLMv3 FUNSD (HuggingFace)Spanbert SQuAD (HuggingFace) qlinearops89.66%90.49%-0.91%47.6027.281.74x91.137991.97553-0.0091181.96443.361.890314
LayoutLMv3 FUNSD (HuggingFace)Spanbert SQuAD (HuggingFace) integerops89.95%90.49%-0.59%56.2627.432.05x91.3983591.97553-0.00628101.71243.3652.345486
LayoutLMv2 (HuggingFace)Bert base multilingual cased SQuAD   (HuggingFace) qlinearops80.95%81.17%-0.27%64.1438.911.65x88.420589.12625-0.0079286.32843.2691.995147
LayoutLMv2 (HuggingFace)Bert base multilingual cased SQuAD   (HuggingFace) integerops80.60%81.17%-0.71%67.0138.841.73x88.7022489.12625-0.00476101.7843.2442.353621
- -### ONNX Models with ONNX Runtime 1.15.0 in WOQ Mode - - - - - - - + + + + + + + + - - + + + + + + + + - - - - - - - + + + + + + + + - - - - + + + + + + + + - - - - - + + + + + + + + - - - - + + + + + + + + - - - - - + + + + + + + + - - - - - - - - - - + + + + + + + + - - - - - - - - - + + + + + + + + - - - - - + + + + + + + + - - - - + + + + + + + + - - - - - + + + + + + + + - - - - - - -
Model nameConfigurationLambada_openaiAccuracy Ratio
[INT4/FP32]
DistilBert base uncased SQuAD   (HuggingFace)qlinearops86.3282386.86496-0.00618120.70969.7231.731265
AccuracyPerplexityDistilBert base uncased SQuAD   (HuggingFace)integerops86.0486686.86496-0.0094203.70869.6822.923395
meta-llama/Llama-2-7b-chat-hfFP320.70583.2788/BERT large uncased whole word masking   SQuAD (HuggingFace)qlinearops92.3394593.15547-0.0087631.81312.9392.458691
GPTQ
W4G32Asym
0.70023.41240.9921BERT large uncased whole word masking   SQuAD (HuggingFace)integerops92.9922993.15547-0.0017535.82812.9372.769421
meta-llama/Llama-2-7b-hfFP320.73923.3950/Roberta large SQuAD v2 (HuggingFace)qlinearops89.0339589.016190.000217.6113.2681.327254
GPTQ
W4G32Asym
0.73123.57110.9892Roberta large SQuAD v2 (HuggingFace)integerops89.0383389.016190.00024935.85413.2592.704125
meta-llama/Llama-2-13b-chat-hfFP320.73122.9163/GPT2 WikiText (HuggingFace)qlinearops30.2529228.996140.04334313.84910.1671.362152
GPTQ
W4G128Asym
0.72402.99450.9902
meta-llama/Llama-2-13b-hfFP320.76773.0438/GPT2 WikiText (HuggingFace)integerops29.6815328.996140.02363714.63610.0881.450833
GPTQ
W4G128Asym
0.76343.11860.9944
GPTQ
W4G32Asym
0.76153.12760.9919DistilGPT2 WikiText (HuggingFace)qlinearops44.9323243.430430.03458221.79817.1291.272579
meta-llama/Llama-2-70b-chat-hfFP320.75432.6181/DistilGPT2 WikiText (HuggingFace)integerops44.619543.430430.02737923.01717.0861.347126
RTN
W4G32Asym
0.75182.64960.9967LayoutLMv3 FUNSD (HuggingFace)integerops0.900720.90487-0.0045939.498281.410643
meta-llama/Llama-2-70b-hfFP320.79642.6612/CodeBert    (HuggingFace)qlinearops0.649710.6541-0.0067175.68545.0971.678271
RTN
W4G32Sym
0.79412.72430.9971
- + CodeBert    (HuggingFace) + integerops + 0.64934 + 0.6541 + -0.00728 + 94.472 + 45.103 + 2.094584 + + ## Validated Pruning Examples @@ -2617,18 +2071,18 @@ For more complete information about performance and benchmark results, visit www ## Validated Knowledge Distillation Examples -| Example Name | Dataset | Student
(Metrics) | Teacher
(Metrics) | Student With Distillation
(Metrics Improvement) | Student With
Distributed Distillation
(Metrics Improvement) | -|---------------------|-----------|--------------------------------------|------------------------------------|-----------------------------------------------------|-----------------------------------------------------| -| MobileNet example | CIFAR-10 | MobileNetV2-0.35
(0.7965 ACC) | WideResNet40-2
(0.9522 ACC) | 0.8178 ACC
(0.0213 ACC) | 0.8235 ACC
(0.027 ACC) | -| CNN example | CIFAR-100 | CNN-2
(0.5494 ACC) | CNN-10
(0.7153 ACC) | 0.5540 ACC
(0.0046 ACC) | 0.5523 ACC
(0.0029 ACC) | -| VGG example | CIFAR-100 | VGG-8-BN
(0.7022 ACC) | VGG-13-BN
(0.7415 ACC) | 0.7025 ACC
(0.0003 ACC) | NA | -| ResNet example | ImageNet | ResNet18
(0.6739 ACC) | ResNet50
(0.7399 ACC) | 0.6845 ACC
(0.0106 ACC) | NA | -| BlendCnn example | MRPC | BlendCnn
(0.7034 ACC) | BERT-Base
(0.8382 ACC) | 0.7034 ACC
(0 ACC) | NA | -| BiLSTM example | SST-2 | BiLSTM
(0.8314 ACC) | RoBERTa-Base
(0.9403 ACC) | 0.9048 ACC
(0.0734 ACC) | NA | -|DistilBERT example | SQuAD | DistilBERT
(0.7323/0.8256 EM/F1) | BERT-Base
(0.8084/0.8814 EM/F1) | 0.7442/0.8371 EM/F1
(0.0119/0.0115 EM/F1) | NA | -|TinyBERT example | MNLI | TinyBERT
(0.8018/0.8044 m/mm) | BERT-Base
(0.8363/0.8411 m/mm) | 0.8025/0.8074 m/mm
(0.0007/0.0030 m/mm) | NA | -|BERT-3 example | QQP | BERT-3
(0.8626/0.8213 EM/F1) | BERT-Base
(0.9091/0.8782 EM/F1) | 0.8684/0.8259 EM/F1
(0.0058/0.0046 EM/F1) | NA | -|DistilRoBERTa example| COLA | DistilRoBERTa
(0.6057 ACC) | RoBERTa-Large
(0.6455 ACC) | 0.6187 ACC
(0.0130 ACC) | NA | +| Example Name | Dataset | Student
(Metrics) | Teacher
(Metrics) | Student With Distillation
(Metrics Improvement) | Student With
Distributed Distillation
(Metrics Improvement) | +| --------------------- | --------- | ----------------------------------- | ---------------------------------- | -------------------------------------------------- | ------------------------------------------------------------------ | +| MobileNet example | CIFAR-10 | MobileNetV2-0.35
(0.7965 ACC) | WideResNet40-2
(0.9522 ACC) | 0.8178 ACC
(0.0213 ACC) | 0.8235 ACC
(0.027 ACC) | +| CNN example | CIFAR-100 | CNN-2
(0.5494 ACC) | CNN-10
(0.7153 ACC) | 0.5540 ACC
(0.0046 ACC) | 0.5523 ACC
(0.0029 ACC) | +| VGG example | CIFAR-100 | VGG-8-BN
(0.7022 ACC) | VGG-13-BN
(0.7415 ACC) | 0.7025 ACC
(0.0003 ACC) | NA | +| ResNet example | ImageNet | ResNet18
(0.6739 ACC) | ResNet50
(0.7399 ACC) | 0.6845 ACC
(0.0106 ACC) | NA | +| BlendCnn example | MRPC | BlendCnn
(0.7034 ACC) | BERT-Base
(0.8382 ACC) | 0.7034 ACC
(0 ACC) | NA | +| BiLSTM example | SST-2 | BiLSTM
(0.8314 ACC) | RoBERTa-Base
(0.9403 ACC) | 0.9048 ACC
(0.0734 ACC) | NA | +| DistilBERT example | SQuAD | DistilBERT
(0.7323/0.8256 EM/F1) | BERT-Base
(0.8084/0.8814 EM/F1) | 0.7442/0.8371 EM/F1
(0.0119/0.0115 EM/F1) | NA | +| TinyBERT example | MNLI | TinyBERT
(0.8018/0.8044 m/mm) | BERT-Base
(0.8363/0.8411 m/mm) | 0.8025/0.8074 m/mm
(0.0007/0.0030 m/mm) | NA | +| BERT-3 example | QQP | BERT-3
(0.8626/0.8213 EM/F1) | BERT-Base
(0.9091/0.8782 EM/F1) | 0.8684/0.8259 EM/F1
(0.0058/0.0046 EM/F1) | NA | +| DistilRoBERTa example | COLA | DistilRoBERTa
(0.6057 ACC) | RoBERTa-Large
(0.6455 ACC) | 0.6187 ACC
(0.0130 ACC) | NA | ## Validated ONNX QDQ INT8 Models on Multiple Hardware through ONNX Runtime From 7ae2226ab784452610cc73d97fc54656e8dae278 Mon Sep 17 00:00:00 2001 From: "Sun, Xuehao" Date: Tue, 24 Sep 2024 20:21:19 +0800 Subject: [PATCH 2/2] format data Signed-off-by: Sun, Xuehao --- docs/source/validated_model_list.md | 2009 ++++++++++++++------------- 1 file changed, 1056 insertions(+), 953 deletions(-) diff --git a/docs/source/validated_model_list.md b/docs/source/validated_model_list.md index 69b1f340bda..eaa32c0dfdb 100644 --- a/docs/source/validated_model_list.md +++ b/docs/source/validated_model_list.md @@ -8,11 +8,13 @@ Intel® Neural Compressor validated examples with multiple compression technique 1.2. [Keras Models with keras 2.15.1](#keras-models-with-keras-2151) - 1.2. [PyTorch Models with Torch 2.3.0+cpu in PTQ Mode](#pytorch-models-with-torch-230cpu-in-ptq-mode) + 1.3. [PyTorch Models with Torch 2.3.0+cpu in PTQ Mode](#pytorch-models-with-torch-230cpu-in-ptq-mode) - 1.3. [PyTorch Models with Torch 2.3.0+cpu in QAT Mode](#pytorch-models-with-torch-230cpu-in-qat-mode) + 1.4. [PyTorch Models with Torch 2.3.0+cpu in QAT Mode](#pytorch-models-with-torch-230cpu-in-qat-mode) - 1.4. [ONNX Models with ONNX Runtime 1.18.1](#onnx-models-with-onnx-runtime-1181) + 1.5. [PyTorch Models with Torch 2.3.0+cpu in IPEX Mode](#pytorch-models-with-torch-230cpu-in-ipex-mode) + + 1.6. [ONNX Models with ONNX Runtime 1.18.1](#onnx-models-with-onnx-runtime-1181) 2. [Validated Pruning Examples](#Validated-Pruning-Examples) @@ -52,295 +54,314 @@ For more complete information about performance and benchmark results, visit www ResNet50 v1.0 pb - 0.74112 - 0.74274 - -0.00218 - 1732.923 - 578.879 - 2.993584 + 74.11% + 74.27% + -0.22% + 1732.92 + 578.88 + 2.99x ResNet50 v1.5 pb - 0.7625 - 0.76464 - -0.0028 - 1535.199 - 529.999 - 2.896607 + 76.25% + 76.46% + -0.28% + 1535.20 + 530.00 + 2.90x ResNet101 pb - 0.77524 - 0.76448 - 0.014075 - 1048.361 - 384.018 - 2.729979 + 77.52% + 76.45% + 1.41% + 1048.36 + 384.02 + 2.73x Inception V1 pb - 0.70454 - 0.69738 - 0.010267 - 2079.239 - 927.817 - 2.241001 + 70.45% + 69.74% + +1.03% + 2079.24 + 927.82 + 2.24x Inception V2 pb - 0.74326 - 0.73966 - 0.004867 - 1644.364 - 840.531 - 1.95634 + 74.33% + 73.97% + +0.49% + 1644.36 + 840.53 + 1.96x Inception V3 pb - 0.7672 - 0.76746 - -0.00034 - 1076.103 - 401.885 - 2.677639 + 76.72% + 76.75% + -0.03% + 1076.10 + 401.89 + 2.68x Inception V4 pb - 0.80126 - 0.80272 - -0.00182 - 704.961 + 80.13% + 80.27% + -0.18% + 704.96 199.28 - 3.53754 + 3.54x Inception ResNet V2 pb - 0.80252 - 0.80398 - -0.00182 - 313.966 - 178.274 - 1.761143 + 80.25% + 80.40% + -0.18% + 313.97 + 178.27 + 1.76x DenseNet-161 pb - 0.76288 - 0.76286 - 2.62E-05 - 279.204 - 214.031 - 1.304503 + 76.29% + 76.29% + +0.00% + 279.20 + 214.03 + 1.30x MobileNet V1 pb - 0.7179 - 0.70956 - 0.011754 - 4199.131 - 1506.681 - 2.787007 + 71.79% + 70.96% + +1.18% + 4199.13 + 1506.68 + 2.79x MobileNet V2 pb - 0.72484 - 0.71756 - 0.010145 - 2170.387 - 1445.045 - 1.501951 + 72.48% + 71.76% + +1.01% + 2170.39 + 1445.05 + 1.50x VGG16 pb - 0.72692 - 0.70886 - 0.025478 - 1388.622 - 203.393 - 6.827285 + 72.69% + 70.89% + +2.55% + 1388.62 + 203.39 + 6.83x VGG19 pb - 0.72672 - 0.71014 - 0.023348 - 1236.115 - 169.736 - 7.282574 + 72.67% + 71.01% + +2.33% + 1236.12 + 169.74 + 7.28x ResNet50 pb - 0.6909 - 0.69028 - 0.000898 + 69.09% + 69.03% + +0.09% 411.79 - 284.534 - 1.447244 + 284.53 + 1.45x ResNetV2 50 pb - 0.70374 - 0.69642 - 0.010511 - 779.416 - 539.535 - 1.444607 + 70.37% + 69.64% + +1.05% + 779.42 + 539.54 + 1.44x ResNetV2 101 pb - 0.72644 - 0.7187 - 0.010769 - 492.002 - 295.766 - 1.663484 + 72.64% + 71.87% + +1.08% + 492.00 + 295.77 + 1.66x ResNetV2 152 pb - 0.7312 - 0.72368 - 0.010391 - 348.385 - 205.717 - 1.693516 + 73.12% + 72.37% + +1.04% + 348.39 + 205.72 + 1.69x ViT pb - 0.81392 - 0.8192 - -0.00645 + 81.39% + 81.92% + -0.64% 230.53 - 132.657 - 1.73779 + 132.66 + 1.74x SSD ResNet50 V1 pb - 0.3791 - 0.38 - -0.00237 - 135.711 - 28.752 - 4.720054 + 37.91% + 38.00% + -0.24% + 135.71 + 28.75 + 4.72x SSD MobileNet V1 pb - 0.22995 - 0.23127 - -0.00571 - 1237.695 - 719.299 - 1.720696 + 23.00% + 23.13% + -0.57% + 1237.70 + 719.30 + 1.72x SSD ResNet50 v1 ckpt - 0.37881 - 0.38 - -0.00313 - 130.537 - 22.046 - 5.921119 + 37.88% + 38.00% + -0.31% + 130.54 + 22.05 + 5.92x SSD MobileNet v1 ckpt - 0.22963 - 0.23127 - -0.00709 - 1234.562 - 529.342 - 2.332258 + 22.96% + 23.13% + -0.71% + 1234.56 + 529.34 + 2.33x Faster R-CNN ResNet101 pb - 0.30319 - 0.30387 - -0.00224 - 144.211 - 22.637 - 6.370588 + 30.32% + 30.39% + -0.22% + 144.21 + 22.64 + 6.37x Faster R-CNN ResNet50 pb - 0.2661 - 0.26586 - 0.000903 - 164.553 - 28.376 - 5.79902 + 26.61% + 26.59% + +0.09% + 164.55 + 28.38 + 5.80x YOLOv3 pb - 0.83279 - 0.82353 - 0.011244 - 247.559 - 81.447 - 3.03951 + 83.28% + 82.35% + +1.12% + 247.56 + 81.45 + 3.04x BERT large SQuAD pb - 92.44432 - 92.98612 - -0.00583 - 49.174 - 17.519 - 2.806895 + 92.44% + 92.99% + -0.58% + 49.17 + 17.52 + 2.81x BERT large SQuAD (ONNX Model Zoo) pb - 92.36062 - 92.98047 - -0.00667 - 45.059 + 92.36% + 92.98% + -0.67% + 45.06 17.55 - 2.567464 + 2.57x Transformer LT pb - 25.815 - 25.855 - -0.00155 - 28.988 - 15.774 - 1.837708 + 25.82% + 25.86% + -0.15% + 28.99 + 15.77 + 1.84x Transformer lt MLPerf pb - 27.12967 - 27.16596 - -0.00134 - 10.269 + 27.13% + 27.17% + -0.13% + 10.27 5.08 - 2.021457 + 2.02x - - + + Mask R-CNN Inception V2 + pb + 28.46% + 28.73% + -0.91% + 195.68 + 50.72 + 3.86x + + + Mask R-CNN Inception V2 + ckpt + 28.46% + 28.73% + -0.91% + 206.14 + 47.04 + 4.38x + + ### Keras Models with keras 2.15.1 @@ -365,102 +386,102 @@ For more complete information about performance and benchmark results, visit www Inception ResNet V2 pb - 0.80252 - 0.80398 - -0.00182 - 313.966 - 178.274 - 1.761143 + 80.25% + 80.40% + -0.18% + 313.97 + 178.27 + 1.76x Inception V3 pb - 0.7672 - 0.76746 - -0.00034 - 1076.103 - 401.885 - 2.677639 + 76.72% + 76.75% + -0.03% + 1076.10 + 401.89 + 2.68x MobileNet V2 pb - 0.71492 - 0.71756 - -0.00368 - 947.439 - 779.506 - 1.215435 + 71.49% + 71.76% + -0.37% + 947.44 + 779.51 + 1.22x ResNet101 pb - 0.77524 - 0.76448 - 0.014075 - 1048.361 - 384.018 - 2.729979 + 77.52% + 76.45% + +1.41% + 1048.36 + 384.02 + 2.73x ResNet50 pb - 0.6909 - 0.69028 - 0.000898 + 69.09% + 69.03% + +0.09% 411.79 - 284.534 - 1.447244 + 284.53 + 1.45x ResNet50 pb - 0.7807 - 0.7812 - -0.00064 + 78.07% + 78.12% + -0.06% 680.56 - 498.082 - 1.366361 + 498.08 + 1.37x ResNetV2 101 pb - 0.72644 - 0.7187 - 0.010769 - 492.002 - 295.766 - 1.663484 + 72.64% + 71.87% + +1.08% + 492.00 + 295.77 + 1.66x ResNetV2 50 pb - 0.70374 - 0.69642 - 0.010511 - 779.416 - 539.535 - 1.444607 + 70.37% + 69.64% + +1.05% + 779.42 + 539.54 + 1.44x VGG16 pb - 0.72692 - 0.70886 - 0.025478 - 1388.622 - 203.393 - 6.827285 + 72.69% + 70.89% + +2.55% + 1388.62 + 203.39 + 6.83x VGG19 pb - 0.72672 - 0.71014 - 0.023348 - 1236.115 - 169.736 - 7.282574 + 72.67% + 71.01% + +2.33% + 1236.12 + 169.74 + 7.28x @@ -487,275 +508,294 @@ For more complete information about performance and benchmark results, visit www ResNet18 static - 0.69594 - 0.6976 - -0.00238 - 1707.522 - 602.469 - 2.834207 + 69.59% + 69.76% + -0.24% + 1707.52 + 602.47 + 2.83x EfficientNet-B3 static - 0.77778 - 0.78544 - -0.00975 - 513.818 - 360.022 - 1.427185 + 77.78% + 78.54% + -0.98% + 513.82 + 360.02 + 1.43x PeleeNet static - 0.7183 - 0.721 - -0.00374 - 837.834 - 541.657 - 1.546798 + 71.83% + 72.10% + -0.37% + 837.83 + 541.66 + 1.55x ResNet50 static - 0.75984 - 0.76146 - -0.00213 - 1135.222 - 311.466 - 3.64477 + 75.98% + 76.15% + -0.21% + 1135.22 + 311.47 + 3.64x Inception V3 static - 0.69456 - 0.69522 - -0.00095 - 948.026 - 322.552 - 2.939142 + 69.46% + 69.52% + -0.09% + 948.03 + 322.55 + 2.94x ResNeSt50 static - 0.80758 - 0.8104 - -0.00348 - 406.106 - 39.656 - 10.24072 + 80.76% + 81.04% + -0.35% + 406.11 + 39.66 + 10.24x ResNeXt101_32x8d static - 0.78922 - 0.79308 - -0.00487 - 582.215 - 106.731 - 5.454976 + 78.92% + 79.31% + -0.49% + 582.22 + 106.73 + 5.45x YOLO V3 static - 0.55095 - 0.54926 - 0.003077 - 156.287 - 60.296 - 2.591996 + 55.10% + 54.93% + +0.31% + 156.29 + 60.30 + 2.59x Roberta base MRPC static - 0.931408 - 0.935943 - -0.00485 - 396.854 - 176.802 - 2.244624 + 93.14% + 93.59% + -0.48% + 396.85 + 176.80 + 2.24x CamemBERT base MRPC static - 0.885813 - 0.892794 - -0.00782 - 405.365 - 182.871 - 2.216672 + 88.58% + 89.28% + -0.78% + 405.37 + 182.87 + 2.22x DistilBERT base MRPC static - 0.906355 - 0.902685 - 0.004066 + 90.64% + 90.27% + +0.41% 799.05 - 346.498 - 2.306074 + 346.50 + 2.31x DistilBERT base MRPC dynamic - 0.900169 - 0.902685 - -0.00279 - 705.909 - 348.157 - 2.027559 + 90.02% + 90.27% + -0.28% + 705.91 + 348.16 + 2.03x ALBERT base MRPC static - 0.922807 - 0.922807 - 0 - 350.775 - 164.319 - 2.13472 + 92.28% + 92.28% + 0.00% + 350.78 + 164.32 + 2.13x Xlm Roberta MRPC static - 0.877966 - 0.886207 - -0.0093 - 396.063 - 175.963 - 2.250831 + 87.80% + 88.62% + -0.93% + 396.06 + 175.96 + 2.25x Xlm Roberta MRPC dynamic - 0.885417 - 0.882353 - 0.003472 + 88.54% + 88.24% + +0.35% 381.19 - 175.961 - 2.166332 + 175.96 + 2.17x BERT base MRPC static - 0.895944 - 0.904202 - -0.00913 - 402.419 - 177.726 - 2.264266 + 89.59% + 90.42% + -0.91% + 402.42 + 177.73 + 2.26x BERT base COLA static - 0.534738 - 0.533877 - 0.001612 - 395.246 - 177.024 - 2.232726 + 53.47% + 53.39% + +0.16% + 395.25 + 177.02 + 2.23x BERT base STSB static - 0.876114 - 0.880462 - -0.00494 + 87.61% + 88.05% + -0.49% 397.62 - 177.233 - 2.243487 + 177.23 + 2.24x BERT base SST-2 static - 0.919725 - 0.923165 - -0.00373 - 407.661 - 182.934 - 2.228459 + 91.97% + 92.32% + -0.37% + 407.66 + 182.93 + 2.23x BERT large COLA static - 0.633925 - 0.633532 - 0.00062 - 147.862 - 56.008 - 2.640016 + 63.39% + 63.35% + +0.06% + 147.86 + 56.01 + 2.64x BERT base RTE static - 0.718412 - 0.725632 - -0.00995 - 397.827 - 177.399 - 2.242555 + 71.84% + 72.56% + -1.00% + 397.83 + 177.40 + 2.24x BERT large MRPC static - 0.900662 - 0.90378 - -0.00345 - 146.838 - 52.973 - 2.77194 + 90.07% + 90.38% + -0.34% + 146.84 + 52.97 + 2.77x BERT large QNLI static - 0.911221 - 0.915431 - -0.0046 - 394.508 - 176.918 - 2.229892 + 91.12% + 91.54% + -0.46% + 394.51 + 176.92 + 2.23x BERT large RTE static - 0.736462 - 0.740072 - -0.00488 - 148.837 - 55.834 - 2.665705 + 73.65% + 74.01% + -0.49% + 148.84 + 55.83 + 2.67x Funnel MRPC - static - 0.919383 - 0.922547 - -0.00343 - 294.763 - 187.413 - 1.572799 + + 91.94% + 92.25% + -0.34% + 294.76 + 187.41 + 1.57x BERT large SQuAD static - 92.34173 - 93.15842 - -0.00877 - 50.205 + 92.34% + 93.16% + -0.88% + 50.21 18.69 - 2.686196 + 2.69x lvwerra/pegasus-samsum static - 42.32 - 42.6716 - -0.00824 - 102.734 - 37.993 - 2.704024 + 42.32% + 42.67% + -0.82% + 102.73 + 37.99 + 2.70x - - + + ResNet18 PT2E + static + 69.49% + 69.76% + -0.39% + 1873.51 + 1106.97 + 1.69x + + + OPT-125M PT2E + static + 37.07% + 37.90% + -2.20% + 42.09 + 29.68 + 1.42x + + ### PyTorch Models with Torch 2.3.0+cpu in QAT Mode @@ -780,32 +820,75 @@ For more complete information about performance and benchmark results, visit www ResNet18 static - 0.69736 - 0.6976 - -0.00034 - 1717.593 - 602.648 - 2.850077 + 69.74% + 69.76% + -0.03% + 1717.59 + 602.65 + 2.85x ResNet50 static - 0.76034 - 0.76146 - -0.00147 - 1091.623 + 76.03% + 76.15% + -0.15% + 1091.62 305.83 - 3.569378 + 3.57x ResNeXt101_32x8d static - 0.79308 - 0.79308 - 0 - 584.537 - 107.382 - 5.443529 + 79.31% + 79.31% + 0.00% + 584.54 + 107.38 + 5.44x + + + + +### PyTorch Models with Torch 2.3.0+cpu in IPEX Mode + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ModelExampleAccuracyPerformance 1s4c14ins1bs
Throughput(samples/sec)
INT8FP32Accuracy Ratio
[(INT8-FP32)/FP32]
INT8FP32Performance Ratio
[INT8/FP32]
bert-large-uncased-whole-word-masking-finetuned-squadstatic93.01%93.16%-0.16%150.0522.426.69x
distilbert-base-uncased-distilled-squadstatic86.10%86.84%-0.85%1034.60151.136.85x
@@ -833,902 +916,922 @@ For more complete information about performance and benchmark results, visit www ResNet50 V1.5 qlinearops - 0.7218 - 0.72294 - -0.00158 - 1495.719 - 715.937 - 2.089177 + 72.18% + 72.29% + -0.16% + 1495.72 + 715.94 + 2.09x ResNet50 V1.5 qdq - 0.7213 - 0.72294 - -0.00227 - 1547.302 - 717.027 - 2.157941 + 72.13% + 72.29% + -0.23% + 1547.30 + 717.03 + 2.16x ResNet50 V1.5 MLPerf qlinearops - 0.76146 - 0.76462 - -0.00413 - 1365.556 - 718.546 - 1.900443 + 76.15% + 76.46% + -0.41% + 1365.56 + 718.55 + 1.90x ResNet50 V1.5 MLPerf qdq - 0.76128 - 0.76462 - -0.00437 - 1445.751 - 718.957 - 2.010901 + 76.13% + 76.46% + -0.44% + 1445.75 + 718.96 + 2.01x ResNet50 V1.5 (ONNX Model Zoo) qlinearops - 0.74768 - 0.74988 - -0.00293 - 1574.384 - 749.356 - 2.100983 + 74.77% + 74.99% + -0.29% + 1574.38 + 749.36 + 2.10x ResNet50 V1.5 (ONNX Model Zoo) qdq - 0.74784 - 0.74988 - -0.00272 - 1564.149 - 755.577 - 2.070138 + 74.78% + 74.99% + -0.27% + 1564.15 + 755.58 + 2.07x VGG16 qlinearops - 0.66552 - 0.66688 - -0.00204 - 526.569 - 162.642 - 3.237595 + 66.55% + 66.69% + -0.20% + 526.57 + 162.64 + 3.24x VGG16 qdq - 0.66616 - 0.66688 - -0.00108 - 520.089 - 172.421 - 3.01639 + 66.62% + 66.69% + -0.11% + 520.09 + 172.42 + 3.02x VGG16 (ONNX Model Zoo) qlinearops - 0.72366 - 0.72396 - -0.00041 - 558.812 + 72.37% + 72.40% + -0.04% + 558.81 162.87 - 3.431031 + 3.43x VGG16 (ONNX Model Zoo) qdq - 0.72364 - 0.72396 - -0.00044 - 556.581 - 176.917 - 3.146001 + 72.36% + 72.40% + -0.04% + 556.58 + 176.92 + 3.15x MobileNet V3 MLPerf qlinearops - 0.7551 - 0.7574 - -0.00304 - 5421.716 - 2578.076 - 2.103009 + 75.51% + 75.74% + -0.30% + 5421.72 + 2578.08 + 2.10x MobileNet V3 MLPerf qdq - 0.7551 - 0.7574 - -0.00304 + 75.51% + 75.74% + -0.30% 5382.87 - 2567.479 - 2.096559 + 2567.48 + 2.10x ShuffleNet V2 (ONNX Model Zoo) qlinearops - 0.66126 - 0.66364 - -0.00359 - 6426.221 - 3725.686 - 1.724842 + 66.13% + 66.36% + -0.36% + 6426.22 + 3725.69 + 1.72x ShuffleNet V2 (ONNX Model Zoo) qdq - 0.66216 - 0.66364 - -0.00223 + 66.22% + 66.36% + -0.22% 6534.24 - 3707.735 - 1.762327 + 3707.74 + 1.76x GoogleNet (ONNX Model Zoo) qlinearops - 0.67694 - 0.67786 - -0.00136 - 1842.904 - 1137.578 - 1.620024 + 67.69% + 67.79% + -0.14% + 1842.90 + 1137.58 + 1.62x GoogleNet (ONNX Model Zoo) qdq - 0.67714 - 0.67786 - -0.00106 - 1818.994 + 67.71% + 67.79% + -0.11% + 1818.99 1136.37 - 1.600706 + 1.60x SqueezeNet (ONNX Model Zoo) qlinearops - 0.56486 - 0.56868 - -0.00672 - 9521.985 - 5530.362 - 1.721765 + 56.49% + 56.87% + -0.67% + 9521.99 + 5530.36 + 1.72x SqueezeNet (ONNX Model Zoo) qdq - 0.56486 - 0.56868 - -0.00672 - 9391.072 - 5519.793 - 1.701345 + 56.49% + 56.87% + -0.67% + 9391.07 + 5519.79 + 1.70x CaffeNet (ONNX Model Zoo) qlinearops - 0.56262 - 0.56304 - -0.00075 + 56.26% + 56.30% + -0.07% 2949.36 - 893.772 - 3.299902 + 893.77 + 3.30x CaffeNet (ONNX Model Zoo) qdq - 0.56258 - 0.56304 - -0.00082 - 2847.241 - 901.154 - 3.15955 + 56.26% + 56.30% + -0.08% + 2847.24 + 901.15 + 3.16x AlexNet (ONNX Model Zoo) qlinearops - 0.54732 - 0.54786 - -0.00099 - 2070.166 - 816.705 - 2.534778 + 54.73% + 54.79% + -0.10% + 2070.17 + 816.71 + 2.53x AlexNet (ONNX Model Zoo) qdq - 0.54712 - 0.54786 - -0.00135 - 2059.133 - 844.973 - 2.436922 + 54.71% + 54.79% + -0.14% + 2059.13 + 844.97 + 2.44x ZFNet (ONNX Model Zoo) qlinearops - 0.55826 - 0.5596 - -0.00239 - 858.761 - 461.254 - 1.861796 + 55.83% + 55.96% + -0.24% + 858.76 + 461.25 + 1.86x ZFNet (ONNX Model Zoo) qdq - 0.55868 - 0.5596 - -0.00164 - 853.765 - 457.911 - 1.864478 + 55.87% + 55.96% + -0.16% + 853.77 + 457.91 + 1.86x Inception V1 (ONNX Model Zoo) qlinearops - 0.67228 - 0.67242 - -0.00021 - 1891.362 - 1205.948 - 1.568361 + 67.23% + 67.24% + -0.02% + 1891.36 + 1205.95 + 1.57x Inception V1 (ONNX Model Zoo) qdq - 0.67228 - 0.67242 - -0.00021 - 1879.268 - 1202.193 - 1.5632 + 67.23% + 67.24% + -0.02% + 1879.27 + 1202.19 + 1.56x BEiT (ONNX Model Zoo) qlinearops - 85.072 - 85.284 - -0.00249 - 205.151 - 126.587 - 1.620632 + 85.07% + 85.28% + -0.25% + 205.15 + 126.59 + 1.62x EfficientNet (ONNX Model Zoo) qlinearops - 0.77018 - 0.77112 - -0.00122 - 2428.315 - 1344.025 - 1.806748 + 77.02% + 77.11% + -0.12% + 2428.32 + 1344.03 + 1.81x EfficientNet (ONNX Model Zoo) qdq - 0.7699 - 0.77112 - -0.00158 - 2286.726 + 76.99% + 77.11% + -0.16% + 2286.73 1307.18 - 1.749358 + 1.75x DenseNet (ONNX Model Zoo) qlinearops - 0.60526 - 0.60958 - -0.00709 - 626.256 - 499.762 - 1.253108 + 60.53% + 60.96% + -0.71% + 626.26 + 499.76 + 1.25x SSD MobileNet V1 (ONNX Model Zoo) qlinearops - 0.2296 - 0.23022 - -0.00269 - 1121.431 - 841.324 - 1.332936 + 22.96% + 23.02% + -0.27% + 1121.43 + 841.32 + 1.33x SSD MobileNet V1 (ONNX Model Zoo) qdq - 0.2296 - 0.23022 - -0.00269 - 1048.501 - 798.218 - 1.313552 + 22.96% + 23.02% + -0.27% + 1048.50 + 798.22 + 1.31x DUC (ONNX Model Zoo) qlinearops - 0.81622 - 0.81922 - -0.00366 - 9.256 - 4.988 - 1.855654 + 81.62% + 81.92% + -0.37% + 9.26 + 4.99 + 1.86x Ultra Face (ONNX Model Zoo) qlinearops - 0.83325 - 0.83646 - -0.00384 - 8993.576 - 1988.459 - 4.522887 + 83.33% + 83.65% + -0.38% + 8993.58 + 1988.46 + 4.52x Emotion FERPlus (ONNX Model Zoo) qlinearops - 0.07941 - 0.07997 - -0.007 - 6113.744 - 3087.501 - 1.980159 + 7.94% + 8.00% + -0.70% + 6113.74 + 3087.50 + 1.98x ArcFace (ONNX Model Zoo) qlinearops - 0.99817 - 0.998 - 0.00017 - 442.845 - 230.751 - 1.919147 + 99.82% + 99.80% + +0.02% + 442.85 + 230.75 + 1.92x BERT base MRPC qlinearops - 0.85539 - 0.86029 - -0.0057 - 483.812 - 219.454 - 2.204617 + 85.54% + 86.03% + -0.57% + 483.81 + 219.45 + 2.20x BERT base MRPC qdq - 0.85539 - 0.86029 - -0.0057 - 485.079 - 218.334 - 2.221729 + 85.54% + 86.03% + -0.57% + 485.08 + 218.33 + 2.22x BERT base MRPC integerops - 0.85294 - 0.86029 - -0.00854 - 684.459 - 218.859 - 3.127397 + 85.29% + 86.03% + -0.85% + 684.46 + 218.86 + 3.13x DistilBERT base MRPC qdq - 0.84069 - 0.84559 - -0.00579 - 633.278 - 399.309 - 1.585935 + 84.07% + 84.56% + -0.58% + 633.28 + 399.31 + 1.59x DistilBERT base MRPC integerops - 0.85539 - 0.84559 - 0.01159 - 1388.435 - 401.077 - 3.461767 + 85.54% + 84.56% + +1.16% + 1388.44 + 401.08 + 3.46x Mobile bert MRPC qdq - 0.85539 - 0.86275 - -0.00853 - 505.624 - 387.431 - 1.305069 + 85.54% + 86.28% + -0.85% + 505.62 + 387.43 + 1.31x Mobile bert MRPC integerops - 0.85539 - 0.86275 - -0.00853 - 565.463 + 85.54% + 86.28% + -0.85% + 565.46 386.39 - 1.463451 + 1.46x Roberta base MRPC integerops - 0.90931 - 0.89951 - 0.010895 - 702.165 - 219.495 - 3.199002 + 90.93% + 89.95% + +1.09% + 702.17 + 219.50 + 3.20x BERT SQuAD (ONNX Model Zoo) integerops - 80.29328 - 80.67171 - -0.00469 - 242.582 - 97.712 - 2.482622 + 80.29% + 80.67% + -0.47% + 242.58 + 97.71 + 2.48x MobileBERT SQuAD MLPerf (ONNX Model Zoo) integerops - 89.87354 - 90.0265 - -0.0017 - 151.685 - 125.349 - 1.210101 + 89.87% + 90.03% + -0.17% + 151.69 + 125.35 + 1.21x GPT2 lm head WikiText (ONNX Model Zoo) integerops - 31.98448 - 28.99614 - 0.10306 - 17.964 - 10.207 - 1.759969 + 31.98% + 29.00% + +10.31% + 17.96 + 10.21 + 1.76x BERT base uncased MRPC (HuggingFace) qlinearops - 0.9021 - 0.9042 - -0.00232 - 434.653 - 210.582 - 2.064056 + 90.21% + 90.42% + -0.23% + 434.65 + 210.58 + 2.06x BERT base uncased MRPC (HuggingFace) integerops - 0.89577 - 0.9042 - -0.00932 - 708.657 - 210.743 - 3.36266 + 89.58% + 90.42% + -0.93% + 708.66 + 210.74 + 3.36x Roberta base MRPC (HuggingFace) qlinearops - 0.91002 - 0.91379 - -0.00413 - 431.371 + 91.00% + 91.38% + -0.41% + 431.37 211.03 - 2.044122 + 2.04x Roberta base MRPC (HuggingFace) integerops - 0.90846 - 0.91379 - -0.00583 - 711.112 - 210.711 - 3.374821 + 90.85% + 91.38% + -0.58% + 711.11 + 210.71 + 3.37x XLM Roberta base MRPC (HuggingFace) qlinearops - 0.89369 - 0.90102 - -0.00814 - 334.877 - 211.561 - 1.582886 + 89.37% + 90.10% + -0.81% + 334.88 + 211.56 + 1.58x XLM Roberta base MRPC (HuggingFace) integerops - 0.89655 - 0.90102 - -0.00496 - 401.993 - 211.429 - 1.901314 + 89.66% + 90.10% + -0.50% + 401.99 + 211.43 + 1.90x Camembert base MRPC (HuggingFace) qlinearops - 0.89279 - 0.89279 - 0 - 282.3 - 213.326 - 1.323327 + 89.28% + 89.28% + 0.00% + 282.30 + 213.33 + 1.32x Camembert base MRPC (HuggingFace) integerops - 0.89189 - 0.89279 - -0.00101 - 707.223 - 214.228 - 3.301263 + 89.19% + 89.28% + -0.10% + 707.22 + 214.23 + 3.30x - MiniLM L12 H384 uncased MRPC   (HuggingFace) + MiniLM L12 H384 uncased MRPC (HuggingFace) qlinearops - 0.90126 - 0.90973 - -0.00931 + 90.13% + 90.97% + -0.93% 1188.05 - 578.349 - 2.054209 + 578.35 + 2.05x - MiniLM L12 H384 uncased MRPC   (HuggingFace) + MiniLM L12 H384 uncased MRPC (HuggingFace) integerops - 0.91068 - 0.90973 - 0.001044 - 1285.128 - 576.037 - 2.230982 + 91.07% + 90.97% + +0.10% + 1285.13 + 576.04 + 2.23x - DistilBERT base uncased SST-2   (HuggingFace) + DistilBERT base uncased SST-2 (HuggingFace) qlinearops - 0.90711 - 0.91055 - -0.00378 - 1259.685 - 396.599 - 3.176218 + 90.71% + 91.06% + -0.38% + 1259.69 + 396.60 + 3.18x - DistilBERT base uncased SST-2   (HuggingFace) + DistilBERT base uncased SST-2 (HuggingFace) integerops - 0.90252 - 0.91055 - -0.00882 - 914.626 - 395.085 - 2.315011 + 90.25% + 91.06% + -0.88% + 914.63 + 395.09 + 2.32x Albert base v2 SST-2 (HuggingFace) qlinearops - 0.92087 - 0.92317 - -0.00249 - 284.622 - 210.521 - 1.351989 + 92.09% + 92.32% + -0.25% + 284.62 + 210.52 + 1.35x Albert base v2 SST-2 (HuggingFace) integerops - 0.91743 - 0.92317 - -0.00622 - 284.686 - 209.995 - 1.35568 + 91.74% + 92.32% + -0.62% + 284.69 + 210.00 + 1.36x - MiniLM L6 H384 uncased SST-2   (HuggingFace) + MiniLM L6 H384 uncased SST-2 (HuggingFace) qlinearops - 0.8945 - 0.90138 - -0.00763 - 2172.983 + 89.45% + 90.14% + -0.76% + 2172.98 1121.66 - 1.937292 + 1.94x - MiniLM L6 H384 uncased SST-2   (HuggingFace) + MiniLM L6 H384 uncased SST-2 (HuggingFace) integerops - 0.89908 - 0.90138 - -0.00255 - 2326.265 - 1114.572 - 2.087137 + 89.91% + 90.14% + -0.26% + 2326.27 + 1114.57 + 2.09x BERT base cased MRPC (HuggingFace) qlinearops - 0.87702 - 0.88294 - -0.0067 - 494.961 - 210.795 - 2.348068 + 87.70% + 88.29% + -0.67% + 494.96 + 210.80 + 2.35x BERT base cased MRPC (HuggingFace) integerops - 0.88186 - 0.88294 - -0.00122 - 714.614 - 210.991 - 3.386941 + 88.19% + 88.29% + -0.12% + 714.61 + 210.99 + 3.39x - Electra small discriminator MRPC   (HuggingFace) + Electra small discriminator MRPC (HuggingFace) qlinearops - 0.89915 - 0.89831 - 0.000935 - 1998.713 - 1115.184 - 1.792272 + 89.92% + 89.83% + +0.09% + 1998.71 + 1115.18 + 1.79x - Electra small discriminator MRPC   (HuggingFace) + Electra small discriminator MRPC (HuggingFace) integerops - 0.89267 - 0.89831 - -0.00628 - 2202.812 - 1121.406 - 1.96433 + 89.27% + 89.83% + -0.63% + 2202.81 + 1121.41 + 1.96x BERT mini MRPC (HuggingFace) qlinearops - 0.86213 - 0.86515 - -0.00349 - 5767.229 - 3254.792 - 1.771919 + 86.21% + 86.52% + -0.35% + 5767.23 + 3254.79 + 1.77x BERT mini MRPC (HuggingFace) integerops - 0.86159 - 0.86515 - -0.00411 - 6354.655 - 3424.423 - 1.855686 + 86.16% + 86.52% + -0.41% + 6354.66 + 3424.42 + 1.86x Xlnet base cased MRPC (HuggingFace) qlinearops - 0.90052 - 0.8986 - 0.002137 - 121.243 - 95.562 - 1.268737 + 90.05% + 89.86% + +0.21% + 121.24 + 95.56 + 1.27x Xlnet base cased MRPC (HuggingFace) integerops - 0.89583 - 0.8986 - -0.00308 - 123.055 - 95.598 - 1.287213 + 89.58% + 89.86% + -0.31% + 123.06 + 95.60 + 1.29x BART large MRPC (HuggingFace) integerops - 0.9236 - 0.91197 - 0.012753 - 126.136 - 51.055 - 2.470591 + 92.36% + 91.20% + +1.28% + 126.14 + 51.06 + 2.47x DeBERTa v3 base MRPC (HuggingFace) integerops - 0.92386 - 0.92226 - 0.001735 - 193.159 - 153.157 - 1.261183 + 92.39% + 92.23% + +0.17% + 193.16 + 153.16 + 1.26x Spanbert SQuAD (HuggingFace) qlinearops - 91.1379 - 91.97553 - -0.00911 - 81.964 + 91.14% + 91.98% + -0.91% + 81.96 43.36 - 1.890314 + 1.89x Spanbert SQuAD (HuggingFace) integerops - 91.39835 - 91.97553 - -0.00628 - 101.712 - 43.365 - 2.345486 + 91.40% + 91.98% + -0.63% + 101.71 + 43.37 + 2.35x - Bert base multilingual cased SQuAD   (HuggingFace) + Bert base multilingual cased SQuAD (HuggingFace) qlinearops - 88.4205 - 89.12625 - -0.00792 - 86.328 - 43.269 - 1.995147 + 88.42% + 89.13% + -0.79% + 86.33 + 43.27 + 2.00x - Bert base multilingual cased SQuAD   (HuggingFace) + Bert base multilingual cased SQuAD (HuggingFace) integerops - 88.70224 - 89.12625 - -0.00476 + 88.70% + 89.13% + -0.48% 101.78 - 43.244 - 2.353621 + 43.24 + 2.35x - DistilBert base uncased SQuAD   (HuggingFace) + DistilBert base uncased SQuAD (HuggingFace) qlinearops - 86.32823 - 86.86496 - -0.00618 - 120.709 - 69.723 - 1.731265 + 86.33% + 86.86% + -0.62% + 120.71 + 69.72 + 1.73x - DistilBert base uncased SQuAD   (HuggingFace) + DistilBert base uncased SQuAD (HuggingFace) integerops - 86.04866 - 86.86496 - -0.0094 - 203.708 - 69.682 - 2.923395 + 86.05% + 86.86% + -0.94% + 203.71 + 69.68 + 2.92x - BERT large uncased whole word masking   SQuAD (HuggingFace) + BERT large uncased whole word masking SQuAD (HuggingFace) qlinearops - 92.33945 - 93.15547 - -0.00876 - 31.813 - 12.939 - 2.458691 + 92.34% + 93.16% + -0.88% + 31.81 + 12.94 + 2.46x - BERT large uncased whole word masking   SQuAD (HuggingFace) + BERT large uncased whole word masking SQuAD (HuggingFace) integerops - 92.99229 - 93.15547 - -0.00175 - 35.828 - 12.937 - 2.769421 + 92.99% + 93.16% + -0.18% + 35.83 + 12.94 + 2.77x Roberta large SQuAD v2 (HuggingFace) qlinearops - 89.03395 - 89.01619 - 0.0002 + 89.03% + 89.02% + +0.02% 17.61 - 13.268 - 1.327254 + 13.27 + 1.33x Roberta large SQuAD v2 (HuggingFace) integerops - 89.03833 - 89.01619 - 0.000249 - 35.854 - 13.259 - 2.704125 + 89.04% + 89.02% + +0.02% + 35.85 + 13.26 + 2.70x GPT2 WikiText (HuggingFace) qlinearops - 30.25292 - 28.99614 - 0.043343 - 13.849 - 10.167 - 1.362152 + 30.25% + 29.00% + +4.33% + 13.85 + 10.17 + 1.36x GPT2 WikiText (HuggingFace) integerops - 29.68153 - 28.99614 - 0.023637 - 14.636 - 10.088 - 1.450833 + 29.68% + 29.00% + +2.36% + 14.64 + 10.09 + 1.45x DistilGPT2 WikiText (HuggingFace) qlinearops - 44.93232 - 43.43043 - 0.034582 - 21.798 - 17.129 - 1.272579 + 44.93% + 43.43% + +3.46% + 21.80 + 17.13 + 1.27x DistilGPT2 WikiText (HuggingFace) integerops - 44.6195 - 43.43043 - 0.027379 - 23.017 - 17.086 - 1.347126 + 44.62% + 43.43% + +2.74% + 23.02 + 17.09 + 1.35x LayoutLMv3 FUNSD (HuggingFace) integerops - 0.90072 - 0.90487 - -0.00459 - 39.498 - 28 - 1.410643 + 90.07% + 90.49% + -0.46% + 39.50 + 28.00 + 1.41x - CodeBert    (HuggingFace) + CodeBert (HuggingFace) qlinearops - 0.64971 - 0.6541 - -0.00671 - 75.685 - 45.097 - 1.678271 + 64.97% + 65.41% + -0.67% + 75.69 + 45.10 + 1.68x - CodeBert    (HuggingFace) + CodeBert (HuggingFace) integerops - 0.64934 - 0.6541 - -0.00728 - 94.472 - 45.103 - 2.094584 + 64.93% + 65.41% + -0.73% + 94.47 + 45.10 + 2.09x + + + FCN (ONNX Model Zoo) + qlinearops + 64.54% + 64.98% + -0.67% + 25.83 + 12.90 + 2.00x + + + FCN (ONNX Model Zoo) + qdq + 64.54% + 64.98% + -0.67% + 25.97 + 12.99 + 2.00x