Skip to content

[bench] enable mlir benchmark #106

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 115 commits into from
Aug 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
115 commits
Select commit Hold shift + click to select a range
0781b4f
rebase
May 27, 2024
93306ef
Merge branch 'main' into longsheng/add_onednn_ops
May 27, 2024
60807a8
fix tidy
May 27, 2024
9344308
fix
May 27, 2024
eb4678f
fix
May 27, 2024
d82cc81
fix
May 27, 2024
bda491d
fix
May 28, 2024
f1a8e10
fix
May 28, 2024
e50b9fb
init
xurui1995 May 29, 2024
7515714
init
xurui1995 May 29, 2024
1ecf8a4
fix
May 29, 2024
45e64c9
Merge branch 'main' into longsheng/add_onednn_ops
May 29, 2024
50c7737
rebase
May 29, 2024
fe0c118
Merge branch 'main' into longsheng/llma2_onednn_lower
May 29, 2024
8e144c9
fix
May 29, 2024
44619b4
fix
May 29, 2024
dd98c65
update
May 29, 2024
04db05b
add flatten
May 30, 2024
4877c06
fix format
May 30, 2024
f6072f0
Merge branch 'main' into longsheng/llma2_onednn_lower
May 30, 2024
4dc1aa4
update
May 30, 2024
0bdecfe
Merge branch 'main' into xurui/add_benchmark
xurui1995 May 31, 2024
47b6551
add test
May 31, 2024
e86afaa
debug
May 31, 2024
6682794
test
May 31, 2024
0c3ff50
fix
May 31, 2024
7b4c2f9
opt
xurui1995 Jun 2, 2024
72983b7
fix
Jun 3, 2024
b14398c
Merge branch 'main' into longsheng/llma2_onednn_lower
Jun 3, 2024
d208ec4
fix
Jun 3, 2024
fc85574
test
Jun 3, 2024
dee3d54
fix
Jun 3, 2024
eb04513
fix
Jun 3, 2024
212710b
update bench
xurui1995 Jun 3, 2024
9d2e336
add license
xurui1995 Jun 3, 2024
d9e8009
update bench
xurui1995 Jun 3, 2024
7e1a4be
update test
Jun 3, 2024
eeaceb7
add bf16
xurui1995 Jun 3, 2024
efb6133
update interface
Jun 3, 2024
20d7dd0
Merge branch 'main' into longsheng/llma2_onednn_lower
Jun 3, 2024
37782b6
fix
Jun 3, 2024
fb61105
Merge branch 'main' into longsheng/llma2_onednn_lower
Jun 3, 2024
f0916fc
fix
xurui1995 Jun 3, 2024
af1bdbb
remove print
xurui1995 Jun 3, 2024
52502d2
Merge branch 'main' into xurui/add_benchmark
xurui1995 Jun 3, 2024
f7a6e1f
Merge branch 'main' into xurui/add_benchmark
xurui1995 Jun 3, 2024
ecf95e8
Merge branch 'main' into longsheng/llma2_onednn_lower
Jun 3, 2024
c3203b3
Merge remote-tracking branch 'origin/longsheng/llma2_onednn_lower' in…
xurui1995 Jun 4, 2024
5b953e5
remove useless arg
xurui1995 Jun 4, 2024
a58afd2
rename
xurui1995 Jun 4, 2024
dfb212c
update test
xurui1995 Jun 4, 2024
c3b9008
update compliler
xurui1995 Jun 5, 2024
4746b97
merge main
xurui1995 Jun 11, 2024
26be584
fix conflict
xurui1995 Jun 11, 2024
401975f
fix conflict
xurui1995 Jun 11, 2024
abd8ff4
comment out ResultsToOutParamsPass
xurui1995 Jun 11, 2024
056a0a9
fix style
xurui1995 Jun 11, 2024
8079132
fix
xurui1995 Jun 11, 2024
d504ce6
update
xurui1995 Jun 11, 2024
f6adb8e
improve conversion
xurui1995 Jun 11, 2024
47ea2fc
add disable_results_to_params option
xurui1995 Jun 12, 2024
23f2e29
update wrapper bench
xurui1995 Jun 12, 2024
e2c5893
fix mlp driver
xurui1995 Jun 12, 2024
315f48d
add tuner
xurui1995 Jun 13, 2024
4024ece
update config class
xurui1995 Jun 18, 2024
c33f532
fix tuner
xurui1995 Jun 18, 2024
7f7b042
support skip tuner of op
xurui1995 Jun 18, 2024
7ff5592
add timeout option
xurui1995 Jun 19, 2024
8c3a5fd
update example
xurui1995 Jun 27, 2024
7dff6a1
Merge branch 'main' into xurui/add_benchmark
xurui1995 Jul 2, 2024
2399d08
Merge branch 'main' into xurui/add_benchmark
xurui1995 Jul 2, 2024
4923f17
fix style and license
xurui1995 Jul 2, 2024
8c168e7
change tuner with linalg matmul
xurui1995 Jul 2, 2024
c47c33d
add linalg mlp
xurui1995 Jul 2, 2024
9da1002
fix
xurui1995 Jul 2, 2024
9bdf0b5
add more tuner option
xurui1995 Jul 5, 2024
2691727
add space percent
xurui1995 Jul 8, 2024
1b8c1c6
add batch bench
xurui1995 Jul 9, 2024
8e15377
add linalgx binding
xurui1995 Jul 12, 2024
201f942
fix ci
xurui1995 Jul 12, 2024
a58bd6f
fix style
xurui1995 Jul 12, 2024
cb66f38
add linalgx test case
xurui1995 Jul 12, 2024
b06cdda
opt the code
xurui1995 Jul 15, 2024
30945d8
restore pass for out to param
xurui1995 Jul 15, 2024
a7562d2
update readme
xurui1995 Jul 16, 2024
2b11201
Merge branch 'main' into xurui/add_benchmark
xurui1995 Jul 16, 2024
5f12a72
fix
xurui1995 Jul 18, 2024
e06b194
remove tuner code
xurui1995 Jul 18, 2024
21ea72f
update tests
xurui1995 Jul 18, 2024
87e85b0
update test case
xurui1995 Jul 18, 2024
00da905
Merge branch 'main' into xurui/add_benchmark
xurui1995 Jul 18, 2024
d47201d
Merge branch 'main' into xurui/add_benchmark
xurui1995 Jul 22, 2024
667cd13
improve set shared lib paths
xurui1995 Jul 25, 2024
f113eff
Merge branch 'main' into xurui/add_benchmark
xurui1995 Jul 25, 2024
3623d16
add missing file
xurui1995 Jul 25, 2024
1f3e6e6
Merge branch 'xurui/add_benchmark' of https://github.com/intel/graph-…
xurui1995 Jul 25, 2024
e80ac7c
Update tools/example/simple_test.py
xurui1995 Jul 26, 2024
2d8e40d
Update tools/example/simple_test.py
xurui1995 Jul 26, 2024
1f45ec1
remove new cmake vars
xurui1995 Jul 26, 2024
defda06
remove linalgx binding
xurui1995 Jul 26, 2024
4ff5baa
Merge branch 'main' into xurui/add_benchmark
xurui1995 Aug 5, 2024
61fc0d6
remove dependency
xurui1995 Aug 5, 2024
5f8ac95
merge main
xurui1995 Aug 6, 2024
524bd7d
Merge branch 'main' into xurui/add_benchmark
xurui1995 Aug 8, 2024
95b3cc8
Merge branch 'main' into xurui/add_benchmark
xurui1995 Aug 9, 2024
fd1ad8b
remove unused import
xurui1995 Aug 9, 2024
aa69883
fix path in config.py.in
xurui1995 Aug 12, 2024
27ff72c
Merge branch 'main' into xurui/add_benchmark
xurui1995 Aug 12, 2024
0c82ebd
Merge branch 'main' into xurui/add_benchmark
xurui1995 Aug 14, 2024
b466415
fix config.py.in
xurui1995 Aug 14, 2024
21bec88
fix path
xurui1995 Aug 14, 2024
0407f22
fix path
xurui1995 Aug 14, 2024
271e0c3
fix
xurui1995 Aug 15, 2024
3517083
Merge branch 'main' into xurui/add_benchmark
xurui1995 Aug 15, 2024
d6fff77
Merge branch 'main' into xurui/add_benchmark
xurui1995 Aug 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion python/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ declare_mlir_python_sources(GcPythonSources.Common
ADD_TO_PARENT GcPythonSources
SOURCES
__init__.py
graph_compiler.py
dialects/__init__.py
# init hooks
_mlir_libs/_site_initialize_0.py
Expand Down Expand Up @@ -98,6 +99,8 @@ add_mlir_python_common_capi_library(GcPythonCAPI
GcPythonSources
MLIRPythonExtension.RegisterEverything
MLIRPythonSources.Core
MLIRPythonSources.Dialects.linalg
MLIRPythonSources.ExecutionEngine
)
target_link_libraries(GcPythonCAPI PUBLIC GcInterface)

Expand All @@ -112,6 +115,9 @@ add_mlir_python_modules(GcPythonModules
GcPythonSources
MLIRPythonExtension.RegisterEverything
MLIRPythonSources
MLIRPythonSources.ExecutionEngine
COMMON_CAPI_LINK_LIBS
GcPythonCAPI
)
)

configure_file(config.py.in ${MLIR_BINARY_DIR}/python_packages/gc_mlir_core/gc_mlir/config.py @ONLY)
22 changes: 22 additions & 0 deletions python/config.py.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
import os
import sys

llvm_obj_root = "@LLVM_BINARY_DIR@"
llvm_lib_dir = "@LLVM_LIBRARY_DIR@"
shlib_ext = "@LTDL_SHLIB_EXT@"

if sys.platform.startswith("win32"):
mlir_runner_utils_dir = os.path.normpath(os.path.join(llvm_obj_root, "bin"))
shlib_prefix = ""
else:
mlir_runner_utils_dir = llvm_lib_dir
shlib_prefix = "lib"

MLIR_C_RUNNER_UTILS = os.path.normpath(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to just register a folder path here, maybe MLIR_SHARED_LIB_PATH, and then in the graph_compiler.py file, you can search the lib name in an OS independent way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed this with similar way in lit.site.cfg.py, but we can not get the SHLIBEXT value which defined in configure_lit_site_cfg, according to the source code we can use LTDL_SHLIB_EXT
https://github.com/llvm/llvm-project/blob/109b50808f72c228518766c3b384dd14e0dcf4ee/llvm/cmake/modules/AddLLVM.cmake#L1841-L1854

https://github.com/llvm/llvm-project/blob/109b50808f72c228518766c3b384dd14e0dcf4ee/llvm/cmake/modules/HandleLLVMOptions.cmake#L233

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can set mlir_runner_utils_dir to @MLIR_RUNNER_UTILS_DIR@ for our lit.site.cfg.py.in. Can we use that cmake variable too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried and we can not get the @MLIR_RUNNER_UTILS_DIR@ from here, not in the same scope.

os.path.join(
mlir_runner_utils_dir, shlib_prefix + "mlir_c_runner_utils" + shlib_ext
)
)
MLIR_RUNNER_UTILS = os.path.normpath(
os.path.join(mlir_runner_utils_dir, shlib_prefix + "mlir_runner_utils" + shlib_ext)
)
47 changes: 47 additions & 0 deletions python/gc_mlir/graph_compiler.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# ===-- graph_compiler.py - DESC ------------------------------*- Python -*-===#
#
# This file is licensed under the Apache License v2.0 with LLVM Exceptions.
# See https://llvm.org/LICENSE.txt for license information.
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
#
# ===-----------------------------------------------------------------------===#

from gc_mlir import execution_engine
from gc_mlir import ir
from gc_mlir import passmanager
from gc_mlir.config import MLIR_C_RUNNER_UTILS, MLIR_RUNNER_UTILS

__all__ = [
"GraphCompiler",
]


class GraphCompiler:
def __init__(
self,
pipeline: str = "any(gc-cpu-pipeline)",
opt_level: int = 3,
):
self.shared_libs = [MLIR_C_RUNNER_UTILS, MLIR_RUNNER_UTILS]
self.pipeline = pipeline
self.opt_level = opt_level

def __call__(self, module: ir.Module, ir_printing: bool = False):
self.compile(module, ir_printing)

def compile(self, module: ir.Module, ir_printing: bool = False):
pm = passmanager.PassManager.parse(self.pipeline)
if ir_printing:
pm.enable_ir_printing()
pm.run(module.operation)

def jit(self, module: ir.Module) -> execution_engine.ExecutionEngine:
return execution_engine.ExecutionEngine(
module, opt_level=self.opt_level, shared_libs=self.shared_libs
)

def compile_and_jit(
self, module: ir.Module, ir_printing: bool = False
) -> execution_engine.ExecutionEngine:
self.compile(module, ir_printing)
return self.jit(module)
26 changes: 13 additions & 13 deletions test/mlir/test/gc/python/smoketest.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
# ===============================================================================
# RUN: %python %s | FileCheck %s

from gc_mlir.dialects import func, onednn_graph
from gc_mlir.graph_compiler import GraphCompiler
from gc_mlir.ir import *
from gc_mlir.dialects import onednn_graph, func
from gc_mlir.passmanager import PassManager


def run(f):
Expand Down Expand Up @@ -36,22 +36,22 @@ def testCreateOp():
print(module)


# CHECK-LABEL: TEST: testPassManager
# CHECK-LABEL: TEST: testCompiler
@run
def testPassManager():
def testCompiler():
with Context():
module = Module.parse(
"""
// CHECK: [[C0:%.+]] = arith.constant 0
// CHECK: [[INIT:%.+]] = tensor.empty()
// CHECK: [[FILLED:%.+]] = linalg.fill ins([[C0]] : bf16) outs([[INIT]] : tensor<128x256xbf16>) -> tensor<128x256xbf16>
// CHECK: linalg.matmul ins(%arg0, %arg1 : tensor<128x512xbf16>, tensor<512x256xbf16>) outs([[FILLED]] : tensor<128x256xbf16>) -> tensor<128x256xbf16>
func.func @matmul(%arg0: tensor<128x512xbf16>, %arg1: tensor<512x256xbf16>) -> tensor<128x256xbf16> {
%0 = onednn_graph.matmul %arg0, %arg1 : (tensor<128x512xbf16>, tensor<512x256xbf16>) -> tensor<128x256xbf16>
return %0 : tensor<128x256xbf16>
func.func @matmul(%arg0: tensor<128x512xf32>, %arg1: tensor<512x256xf32>) -> tensor<128x256xf32> {
%0 = onednn_graph.matmul %arg0, %arg1 : (tensor<128x512xf32>, tensor<512x256xf32>) -> tensor<128x256xf32>
return %0 : tensor<128x256xf32>
}
"""
)
pm = PassManager.parse("builtin.module(convert-onednn-graph-to-linalg)")
pm.run(module.operation)

compiler = GraphCompiler(
pipeline="builtin.module(convert-onednn-graph-to-linalg)"
)
compiler.compile(module)
# CHECK-NOT: onednn_graph.matmul
print(module)
92 changes: 92 additions & 0 deletions tools/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# Python Tools
## Pre-requisites
### Enable python binding
* Enable MLIR python binding, [README](https://github.com/intel/graph-compiler/blob/main/python/README.md)
### Set env
* **PYTHONPATH**=*${BUILD_DIR}*/python_packages/gc_mlir_core
* **LD_PRELOAD**=path/to/libiomp5.so


## Bench
The tool has two different ways to calculate the time cost, and more experiments are needed to test which one is more stable and accurate. Currently, users can choose which way to use through options
* Use the MLIR Python API to invoke the kernel and use Python to calculate the time cost
* Modify MLIR by wrapping the kernel into a new method and calling the `nanoTime()` method before and after calling the kernel. Finally, calculate the difference as the time cost
```
func.func private @nanoTime() -> i64 attributes {llvm.emit_c_interface}
func.func public @wrapped_main(%arg0: memref<1xi64>, %arg1: tensor<128x512xbf16>, %arg2: tensor<512x256xbf16>) -> tensor<128x256xbf16> attributes {llvm.emit_c_interface} {
%0 = call @nanoTime() : () -> i64
%1 = call @main_entry(%arg1, %arg2) : (tensor<128x512xbf16>, tensor<512x256xbf16>) -> tensor<128x256xbf16>
%2 = call @nanoTime() : () -> i64
%3 = arith.subi %2, %0 : i64
%c0 = arith.constant 0 : index
memref.store %3, %arg0[%c0] : memref<1xi64>
return %1 : tensor<128x256xbf16>
}
}
```

### Examples:
```
# simple version
python3 ./tools/main.py --driver=load_mlir --path=./tools/workloads/test.mlir

# complex version
python3 ./tools/main.py --type=bench --bench_kind=py --driver=load_mlir --path=./tools/workloads/test.mlir --warm_up=200 --repeat=200 --print_ir --entry=main_entry
```

```
# result example
===========bench result===========
{
"args": {
"type": "bench",
"driver": "load_mlir",
"path": "./tools/workloads/test.mlir",
"entry": "main_entry",
"bench_kind": "py",
"print_ir": false,
"warm_up": 20,
"repeat": 100
},
"compile_cost(ms)": 25.58841183781624,
"execute_cost(ms)": 1.7501823976635933
}
```

### Common Options
* `--driver`: the pattern to bench, currently support `mlp` and `load_mlir`
* `--bench_kind`: `py` or `wrapper`, different evaluation implementation of the benchmark
* `--warm_up`: warm-up times of the execution
* `--repeat`: repeat times of the execution
* `--print_ir`: print the ir before execution
* `--disable_results_to_params`: do not use this when using the default pipeline (gc-cpu-pipeline)

### Driver Specific Options
* load_mlir
* `--path`: the mlir file path
* `--entry`: the name of entry func
```
python3 ./tools/main.py --driver=load_mlir --path=./tools/workloads/test.mlir
```


* mlp
* `--batch_size`: the input
* `--hidden_size_list`: hidden_sizes of mlp, example: 32x16x64
* `--has_bias`: if the matmul op has bias, example: 1x0
* `--act_type`: choices=["noop", "relu", "sigmoid"]
* `--dtype`: choices=["bf16", "f32"]
```
python3 ./tools/main.py --driver=mlp --batch_size=32 --hidden_size_list=32x16x64 --has_bias=0x0 --act_type=noop --dtype=f32

===========bench func name: main_entry ===========
module {
func.func @main_entry(%arg0: tensor<32x32xf32>, %arg1: tensor<32x16xf32>, %arg2: tensor<16x64xf32>) -> tensor<32x64xf32> attributes {llvm.emit_c_interface} {
%0 = tensor.empty() : tensor<32x16xf32>
%1 = linalg.matmul {cast = #linalg.type_fn<cast_signed>} ins(%arg0, %arg1 : tensor<32x32xf32>, tensor<32x16xf32>) outs(%0 : tensor<32x16xf32>) -> tensor<32x16xf32>
%2 = tensor.empty() : tensor<32x64xf32>
%3 = linalg.matmul {cast = #linalg.type_fn<cast_signed>} ins(%1, %arg2 : tensor<32x16xf32>, tensor<16x64xf32>) outs(%2 : tensor<32x64xf32>) -> tensor<32x64xf32>
return %3 : tensor<32x64xf32>
}
}
```
Loading