Skip to content

merge bench code into benchgc #199

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 40 commits into from
Sep 4, 2024
Merged

merge bench code into benchgc #199

merged 40 commits into from
Sep 4, 2024

Conversation

xurui1995
Copy link
Contributor

@xurui1995 xurui1995 commented Jul 30, 2024

merge bench code into benchgc

  • add mode=P for performance testing
  • driver=pattern, case=mlp for mlp pattern
  • remove old bench code dir

issue: #172

@xurui1995 xurui1995 added the enhancement New feature or request label Jul 30, 2024
@xurui1995 xurui1995 self-assigned this Jul 30, 2024
@xurui1995 xurui1995 added the WIP work in progress label Jul 30, 2024
@xurui1995 xurui1995 force-pushed the xurui/merge_into_benchgc branch from ed55792 to c3c5441 Compare August 27, 2024 02:09
@xurui1995
Copy link
Contributor Author

added a mlp case in the correctness check script as example,and will add more in the future

#mlp
python3 -m benchgc --verbose 1  --driver pattern --case mlp --batch_size=32 --hidden_size_list=32x16x64 --has_bias=1x1 --act_type=noop --dtype=f32 

@xurui1995 xurui1995 requested a review from ZhennanQin September 2, 2024 05:00
@xurui1995
Copy link
Contributor Author

@yifeizh2 @zhczhong @crazydemo @niuxiaog @BRUCE11111 the bench tool now is part of benchgc, please follow the new readme to do benchmark, https://github.com/intel/graph-compiler/blob/xurui/merge_into_benchgc/test/benchgc/README.md

Note: The current benchgc does not provide DLTI attr for the mlir module, will add that in the next PR.

@xurui1995 xurui1995 requested a review from ciyongch September 3, 2024 05:57
* 3 : COMPARE_VERBOSE, + print threshold for comparison
* 4 : ERROR_OUTPUT_VERBOSE, + print all error data points if failed
* 5 : OUTPUT_VERBOSE, + print all result including passed tensor
* 6 : INPUT_VERBOSE, + print input torch tensors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think saving the tensor into a file will be better than printing them in the terminal?

Copy link
Contributor Author

@xurui1995 xurui1995 Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you, if the tensor is larger then dump into a file sounds better than printing. The printing thing is not added by this PR, I just added them on the README.txt, I can discuss with @WangJialei-A , and maybe we could provide another option to dump. For this PR let's keep the printing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ciyongch @xurui1995
Need more discussion and design this part carefully.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ciyongch @xurui1995 Need more discussion and design this part carefully.

For debug capability, we shall have a more convenient/flexible way to get the intermediate result.

Comment on lines 145 to 150
assert (
len(module.operation.regions) == 1
), "Expected kernel module to have only one region"
assert (
len(module.operation.regions[0].blocks) == 1
), "Expected kernel module to have only one block"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have such limitation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed now, here I try to find the entry from the top-level functions of a module, in fact, I have not seen any module with more than one region and blocks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you remove the original get_entry?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they are almost the same, in original get_entry, you have to pass entry name as '"entry"', which is kind a annoy. In addition, have a func to get FuncOp by name can cover the func only for getting entry.

Copy link
Contributor

@ciyongch ciyongch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with a minor comment.

arg.fill_param = [
"matmul",
"wei",
arglist[0].dtype,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we use different idx here for wei?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing this out. Previously, I tried to use a single matmul op's filling strategy in the MLP, but later I encountered some issues. Now, I have aligned the filling and compare strategies of the MLP with the MLP validation script in GC v1. In the future, a separate PR might be proposed to optimize the MLP's filling.

@xurui1995 xurui1995 merged commit 0956de2 into main Sep 4, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request ready to review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate benchmark code into benchgc
4 participants