Skip to content

Conversation

@yuwenzho
Copy link
Contributor

Type of Change

example

Description

update ONNXRT example for new API

JIRA ticket: ILITV-2468

How has this PR been tested?

extension test on onnx models

Dependency Change?

no

@yuwenzho yuwenzho marked this pull request as ready for review December 7, 2022 03:28
@yuwenzho
Copy link
Contributor Author

yuwenzho commented Dec 9, 2022

hi @chensuyue, PR is ready for extension test

@chensuyue
Copy link
Contributor

extension test

  1. pls check the tuning regression.
  2. benchmark.sh api gap.

@yuwenzho yuwenzho force-pushed the new_api_onnx_example branch from d0882e3 to 15c4863 Compare December 16, 2022 03:03
@yuwenzho
Copy link
Contributor Author

@chensuyue extension test:
https://inteltf-jenk.sh.intel.com/job/intel-lpot-validation-top-mr-extension/3784/artifact/report.html
image

performance regression is caused by switching performance dataset from dummy to real dataset.

@chensuyue
Copy link
Contributor

extension test for the other examples.

@yuwenzho
Copy link
Contributor Author

https://inteltf-jenk.sh.intel.com/job/intel-lpot-validation-top-mr-extension/3877/
Note: object detection models need new quantization recipe support from Strategy team and may not pass extension test now.

@yuwenzho
Copy link
Contributor Author

https://inteltf-jenk.sh.intel.com/job/intel-lpot-validation-top-mr-extension/3877/ Note: object detection models need new quantization recipe support from Strategy team and may not pass extension test now.

NLP models failed due to some typos and code changes not working.
Retest: https://inteltf-jenk.sh.intel.com/job/intel-lpot-validation-top-mr-extension/3883/

@yuwenzho
Copy link
Contributor Author

https://inteltf-jenk.sh.intel.com/job/intel-lpot-validation-top-mr-extension/3877/ Note: object detection models need new quantization recipe support from Strategy team and may not pass extension test now.

NLP models failed due to some typos and code changes not working. Retest: https://inteltf-jenk.sh.intel.com/job/intel-lpot-validation-top-mr-extension/3883/

Retest: https://inteltf-jenk.sh.intel.com/job/intel-lpot-validation-top-mr-extension/3890/
yolov3, yolov4 and tiny_yolov3 will not be enabled in this version because 'onnxrt.graph_optimization.level' is not supported now.

@yuwenzho
Copy link
Contributor Author

Retest: https://inteltf-jenk.sh.intel.com/job/intel-lpot-validation-top-mr-extension/3890/ yolov3, yolov4 and tiny_yolov3 will not be enabled in this version because 'onnxrt.graph_optimization.level' is not supported now.

  1. ssd-12, ssd-12_qdq, faster_rcnn, faster_rcnn_qdq, mask_rcnn, mask_rcnn_qdq will be re-enabled in 2.1 with supported 'onnxrt.graph_optimization.level' and quantization recipe. Please ignore them in extension test.
  2. hf model failed with error: 'setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (4,) + inhomogeneous part.', which is caused from numpy version update. issue

Update:

  • remove ssd, faster_rcnn and mask_rcnn model
  • update model config json
  • add numpy==1.23.5 into requirements.txt in huggingface model

Retest: https://inteltf-jenk.sh.intel.com/job/intel-lpot-validation-top-mr-extension/3909/

@yuwenzho
Copy link
Contributor Author

passed: bert_squad_model_zoo_dynamic, mobilebert_squad_mlperf_dynamic, mobilebert_squad_mlperf_qdq, duc, BiDAF_dynamic and huggingface question answering models

failed: gpt2_lm_head_wikitext_model_zoo_dynamic and huggingface test classification models, retest: https://inteltf-jenk.sh.intel.com/job/intel-lpot-validation-top-mr-extension/3913/

@yuwenzho
Copy link
Contributor Author

passed:
bert_squad_model_zoo_dynamic, mobilebert_squad_mlperf_dynamic, mobilebert_squad_mlperf_qdq, duc, BiDAF_dynamic and huggingface question answering models: https://inteltf-jenk.sh.intel.com/job/intel-lpot-validation-top-mr-extension/3908/artifact/report.html
gpt2_lm_head_wikitext_model_zoo_dynamic and huggingface test classification models: https://inteltf-jenk.sh.intel.com/job/intel-lpot-validation-top-mr-extension/3919/artifact/report.html

@mengniwang95 mengniwang95 mentioned this pull request Dec 26, 2022
@chensuyue chensuyue merged commit 97c8e3b into master Dec 27, 2022
@chensuyue chensuyue deleted the new_api_onnx_example branch December 27, 2022 01:50
VincyZhang pushed a commit that referenced this pull request Feb 12, 2023
* SparseLib add vtune support

refine doc about profiling
yiliu30 added a commit that referenced this pull request Apr 5, 2025
Building on the vllm WoQ path, this PR adds support for re-quantizing FP8 weights w/ per-tensor or per-channel scaling.

---------

Co-authored-by: Yi Liu <[email protected]>
mengniwang95 pushed a commit that referenced this pull request Apr 15, 2025
Building on the vllm WoQ path, this PR adds support for re-quantizing FP8 weights w/ per-tensor or per-channel scaling.

---------

Co-authored-by: Yi Liu <[email protected]>
xin3he pushed a commit that referenced this pull request Apr 22, 2025
Building on the vllm WoQ path, this PR adds support for re-quantizing FP8 weights w/ per-tensor or per-channel scaling.

---------

Co-authored-by: Yi Liu <[email protected]>
XuehaoSun pushed a commit that referenced this pull request May 13, 2025
Building on the vllm WoQ path, this PR adds support for re-quantizing FP8 weights w/ per-tensor or per-channel scaling.

---------

Co-authored-by: Yi Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants