Skip to content

merge bench code into benchgc #199

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 40 commits into from
Sep 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
4772f9a
introduce benchgc for correctness check
WangJialei-A Aug 16, 2024
2124dc2
Merge branch 'main' into xurui/merge_bench_new
xurui1995 Aug 26, 2024
1f5b6ba
merge code
xurui1995 Aug 26, 2024
2cccd04
introduce benchgc for correctness check
WangJialei-A Aug 16, 2024
1cabc2c
remove print
xurui1995 Aug 27, 2024
c3c5441
merge code
xurui1995 Aug 27, 2024
e316a98
fix
xurui1995 Aug 27, 2024
841e81f
simplify
xurui1995 Aug 27, 2024
42a50c2
merge main
xurui1995 Aug 27, 2024
b16a9b6
Merge branch 'main' into xurui/merge_into_benchgc
xurui1995 Aug 27, 2024
1e8b074
fix format
xurui1995 Aug 27, 2024
1c20184
fix format
xurui1995 Aug 27, 2024
69f2e94
reorg the pattern dir
xurui1995 Aug 27, 2024
8d0953c
improve
xurui1995 Aug 27, 2024
e05d5f0
fix format
xurui1995 Aug 27, 2024
e96d310
fix
xurui1995 Aug 27, 2024
9d03541
Merge branch 'main' into xurui/merge_into_benchgc
xurui1995 Aug 27, 2024
44d591d
add example
xurui1995 Aug 27, 2024
bae0e8b
Merge branch 'main' into xurui/merge_into_benchgc
xurui1995 Aug 27, 2024
bc7262d
Merge branch 'main' into xurui/merge_into_benchgc
xurui1995 Aug 27, 2024
420e3de
Merge branch 'main' into xurui/merge_into_benchgc
xurui1995 Aug 28, 2024
7923184
fix some comments
xurui1995 Aug 29, 2024
8e85b80
fix
xurui1995 Aug 29, 2024
56f2de6
fix
xurui1995 Aug 29, 2024
b87b2d4
add readme
xurui1995 Aug 29, 2024
8f09ed0
Merge branch 'main' into xurui/merge_into_benchgc
xurui1995 Aug 29, 2024
4726c81
Merge branch 'main' into xurui/merge_into_benchgc
xurui1995 Aug 30, 2024
b2597b9
add mlp filling
xurui1995 Sep 2, 2024
248dd12
Merge branch 'main' into xurui/merge_into_benchgc
xurui1995 Sep 2, 2024
4392974
fix mlp
xurui1995 Sep 2, 2024
3566b83
add case
xurui1995 Sep 2, 2024
8deb44c
remove old bench code
xurui1995 Sep 2, 2024
a0641e9
update readme
xurui1995 Sep 2, 2024
5372bf0
Merge branch 'main' into xurui/merge_into_benchgc
xurui1995 Sep 2, 2024
061af38
Merge branch 'main' into xurui/merge_into_benchgc
xurui1995 Sep 2, 2024
66efdd2
fix register_onednn_graph_dialect
xurui1995 Sep 3, 2024
f211736
fix
xurui1995 Sep 3, 2024
27ca472
Merge branch 'main' into xurui/merge_into_benchgc
xurui1995 Sep 4, 2024
e356975
update filling and cmp for mlp
xurui1995 Sep 4, 2024
4cd0ac5
Merge branch 'main' into xurui/merge_into_benchgc
xurui1995 Sep 4, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion python/config.py.in
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ llvm_obj_root = "@LLVM_BINARY_DIR@"
llvm_lib_dir = "@LLVM_LIBRARY_DIR@"
shlib_ext = "@LTDL_SHLIB_EXT@"
gc_lib_dir = "@LLVM_LIBRARY_OUTPUT_INTDIR@"

GC_ENABLE_DNNL_API ="@GC_ENABLE_DNNL_API@" in ["ON", "1"]

if sys.platform.startswith("win32"):
mlir_runner_utils_dir = os.path.normpath(os.path.join(llvm_obj_root, "bin"))
Expand Down
5 changes: 2 additions & 3 deletions python/gc_mlir/_mlir_libs/_site_initialize_0.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
#
# ===-----------------------------------------------------------------------===#
from gc_mlir.config import GC_ENABLE_DNNL_API


def context_init_hook(context):
Expand All @@ -13,11 +14,9 @@ def context_init_hook(context):

register_cpuruntime_dialect(context)

try:
if GC_ENABLE_DNNL_API:
from ._gc_mlir.onednn_graph import (
register_dialect as register_onednn_graph_dialect,
)

register_onednn_graph_dialect(context)
except ModuleNotFoundError:
print("onednn_graph dialect not found")
3 changes: 3 additions & 0 deletions scripts/correctness.sh
Original file line number Diff line number Diff line change
Expand Up @@ -102,5 +102,8 @@ python3 -m benchgc --verbose 0 --driver mlir --case ${CASE_DIR}/reduce.mlir || F
# mlir
# python3 -m benchgc --verbose 0 --driver mlir --case ${CASE_DIR}/llama2.mlir || FAIL=1

#mlp
python3 -m benchgc --verbose 1 --driver pattern --case mlp --batch_size=32 --hidden_size_list=32x16x64 --has_bias=1x1 --act_type=noop --dtype=f32

set +e
exit $FAIL
1 change: 1 addition & 0 deletions test/benchgc/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -39,3 +39,4 @@ add_subdirectory("src/benchgc/mlir")
add_subdirectory("src/benchgc/linalg")
add_subdirectory("src/benchgc/tensor")
add_subdirectory("src/benchgc/arith")
add_subdirectory("src/benchgc/pattern")
192 changes: 182 additions & 10 deletions test/benchgc/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,32 +2,47 @@

## Description

Benchgc is a tool used to verify the correctness and performance of graph compiler. Benchgc accepts MLIR files based on the OneDNN graph dialect as test cases and prepares test data for them. For correctness verification, Benchgc will use PyTorch as a reference for comparison.
Benchgc is a tool used to verify the correctness and performance of graph compiler. Benchgc accepts MLIR files as test cases and prepares test data for them. For correctness verification, Benchgc will use PyTorch as a reference for comparison.

## Prerequisite
* python >= 3.10
* torch >= 2.2
* pybind11
* Enable mlir python binding, Refer to [`python/README.md`](../../python/README.md) for detail

## Build and install
## Build
There are two ways for using benchgc

* Build `.whl` and install benchgc
```
# Please execute at the top level of the project

mkdir -p build
cd build

mkdir build && cd build
cmake .. -DMLIR_DIR=$MLIR_PATH -DGC_TEST_ENABLE=ON -DGC_ENABLE_BINDINGS_PYTHON=ON -DGC_BENCH_ENABLE=ON
make -j benchgc

python -m pip install test/benchgc/dist/benchgc-*.whl

```

* Run benchgc from source code

```
# Please execute at the top level of the project

mkdir build && cd build
cmake .. -DMLIR_DIR=$MLIR_PATH -DGC_TEST_ENABLE=ON -DGC_ENABLE_BINDINGS_PYTHON=ON -DGC_BENCH_ENABLE=ON
make -j GcPythonModules
export PYTHONPATH=$(pwd)/python_packages/gc_mlir_core/:$(pwd)/../test/benchgc/src/
```

## Synopsis
```
python -m benchgc [OPTIONS] --driver [DRIVER] --case [CASE]
python -m benchgc [OPTIONS] --mode [MODE] --driver [DRIVER] --case [CASE]
```
## Flags
## Common Options
### --mode [str]
* C : correctness testing (by default)
* P : performance testing

### --driver [str]
* linalg: test the single op in linalg dialect
* mlir: upload a mlir file and run
Expand All @@ -38,11 +53,25 @@ python -m benchgc [OPTIONS] --driver [DRIVER] --case [CASE]
* if driver=pattern, please provide the pre-defined pattern name, such as mlp here
* if driver is a dialect name, please provide the detail op name to start a single op test

### --entry [str]
* default : "entry"
* the entry name of the kernel of input mlir or generated mlir

### --seed [int]
* set the seed to generate the test data and reprodce the test

### --verbose [int]
* set the verbose level
* set the verbose level, default : 0
* 0 : NO_VERBOSE
* 1 : MODULE_VERBOSE, print the module will be executed
* 2 : ARG_VERBOSE, + print arg information
* 3 : COMPARE_VERBOSE, + print threshold for comparison
* 4 : ERROR_OUTPUT_VERBOSE, + print all error data points if failed
* 5 : OUTPUT_VERBOSE, + print all result including passed tensor
* 6 : INPUT_VERBOSE, + print input torch tensors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think saving the tensor into a file will be better than printing them in the terminal?

Copy link
Contributor Author

@xurui1995 xurui1995 Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you, if the tensor is larger then dump into a file sounds better than printing. The printing thing is not added by this PR, I just added them on the README.txt, I can discuss with @WangJialei-A , and maybe we could provide another option to dump. For this PR let's keep the printing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ciyongch @xurui1995
Need more discussion and design this part carefully.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ciyongch @xurui1995 Need more discussion and design this part carefully.

For debug capability, we shall have a more convenient/flexible way to get the intermediate result.


### --ir_printing (action=store_true)
* Print the ir during the pass-pipeline

### --md index:SHAPExTYPE
* Describe the shape and data type for argument
Expand Down Expand Up @@ -97,7 +126,28 @@ module {
| Norm check | N | threshold |
| Benchdnn driver | D | driver_name:dtype:case |

## Bench Options
### --bench_kind [str]
* py : use the MLIR Python API to invoke the kernel and use Python to calculate the time cost
* wrapper : modify MLIR by wrapping the kernel into a new method and calling the `nanoTime()` method before and after calling the kernel. Finally, calculate the difference as the time cost

### --warm_up [int]
* warm-up times of the execution

### --repeat [int]
* repeat times of the execution

## Pattern Options
Each pattern has its own unique options.
### mlp
* `--batch_size`: the input
* `--hidden_size_list`: hidden_sizes of mlp, example: 32x16x64
* `--has_bias`: if the matmul op has bias, example: 1x0
* `--act_type`: choices=["noop", "relu"]
* `--dtype`: choices=["bf16", "f32"]

## Example
### Correctness testing example
```
# single add op test
# using the same data filling / compare strategy as the benchdnn primitive driver if not set
Expand Down Expand Up @@ -254,4 +304,126 @@ p2p check: threshold: 0.0000000
(1, 0): ref: 25.1690636 res: 25.1690636 abs_diff: 0.0000000 rel_diff: 0.0000000
(1, 1): ref: -7.8600063 res: -7.8600044 abs_diff: 0.0000019 rel_diff: 0.0000002
FAIL: linalg.matmul_transpose_b
```

### Perf testing example
* single op example
```
python3 -m benchgc --verbose 1 --mode P --driver linalg --case add --md 0:4x5xf32 --md 1:4x5xf32 --md 2:4x5xf32

module {
func.func @entry(%arg0: tensor<4x5xf32>, %arg1: tensor<4x5xf32>) -> tensor<4x5xf32> attributes {llvm.emit_c_interface} {
%cst = arith.constant 0.000000e+00 : f32
%0 = tensor.empty() : tensor<4x5xf32>
%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<4x5xf32>) -> tensor<4x5xf32>
%2 = linalg.add ins(%arg0, %arg1 : tensor<4x5xf32>, tensor<4x5xf32>) outs(%1 : tensor<4x5xf32>) -> tensor<4x5xf32>
return %2 : tensor<4x5xf32>
}
}

===========bench result===========
{
"args": {
"mode": "P",
"driver": "linalg",
"case": "add",
"md": [
"0:4x5xf32",
"1:4x5xf32",
"2:4x5xf32"
],
"fill": [],
"cmp": [],
"seed": 0,
"verbose": 1,
"entry": "entry",
"ir_printing": false,
"cast": "cast_signed",
"dimension": null,
"dimensions": null,
"dilations": null,
"strides": null,
"bench_kind": "py",
"warm_up": 100,
"repeat": 100
},
"compile_cost(ms)": 37.72595152258873,
"execute_cost(ms)": 0.00022314488887786865
}
```

* mlir example
```
python3 -m benchgc --mode P --verbose 1 --driver mlir --case=./test.mlir --bench_kind wrapper --warm_up 50 --repeat 200
\module {
func.func @entry(%arg0: tensor<512x128xf32>) -> tensor<512x128xf32> attributes {llvm.emit_c_interface} {
%cst = arith.constant 0.000000e+00 : f32
%0 = tensor.empty() : tensor<512x128xf32>
%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<512x128xf32>) -> tensor<512x128xf32>
%2 = linalg.abs ins(%arg0 : tensor<512x128xf32>) outs(%1 : tensor<512x128xf32>) -> tensor<512x128xf32>
return %2 : tensor<512x128xf32>
}
}

===========bench result===========
{
"args": {
"mode": "P",
"driver": "mlir",
"case": "/home/xurui/gc_v2/test.mlir",
"md": [],
"fill": [],
"cmp": [],
"seed": 0,
"verbose": 1,
"entry": "entry",
"ir_printing": false,
"bench_kind": "wrapper",
"warm_up": 50,
"repeat": 200
},
"compile_cost(ms)": 70.6995539367199,
"execute_cost(ms)": 0.029325044999999984
}
```
* mlp example
```
python3 -m benchgc --verbose 1 --mode P --driver pattern --case mlp --batch_size=32 --hidden_size_list=32x16x64 --has_bias=0x0 --act_type=noop --dtype=f32

module {
func.func @entry(%arg0: tensor<32x32xf32>, %arg1: tensor<32x16xf32>, %arg2: tensor<16x64xf32>) -> tensor<32x64xf32> attributes {llvm.emit_c_interface} {
%0 = tensor.empty() : tensor<32x16xf32>
%1 = linalg.matmul {cast = #linalg.type_fn<cast_signed>} ins(%arg0, %arg1 : tensor<32x32xf32>, tensor<32x16xf32>) outs(%0 : tensor<32x16xf32>) -> tensor<32x16xf32>
%2 = tensor.empty() : tensor<32x64xf32>
%3 = linalg.matmul {cast = #linalg.type_fn<cast_signed>} ins(%1, %arg2 : tensor<32x16xf32>, tensor<16x64xf32>) outs(%2 : tensor<32x64xf32>) -> tensor<32x64xf32>
return %3 : tensor<32x64xf32>
}
}

===========bench result===========
{
"args": {
"mode": "P",
"driver": "pattern",
"case": "mlp",
"md": [],
"fill": [],
"cmp": [],
"seed": 0,
"verbose": 1,
"entry": "entry",
"ir_printing": false,
"bench_kind": "py",
"warm_up": 100,
"repeat": 100,
"batch_size": 32,
"hidden_size_list": "32x16x64",
"has_bias": "0x0",
"act_type": "noop",
"dtype": "f32"
},
"compile_cost(ms)": 109.86808314919472,
"execute_cost(ms)": 0.02944003790616989
}

```
Loading