Skip to content

Commit ae8476e

Browse files
update the sparse example doc/ transpose mode code (#247)
* update the readME * fix the bug of layernorm fusion * modify the layernorm fusion * cleancode * add the example file * update README * add the export tranpose ir * add calibrate method and sample size to trainer * add the new export * add the typo * fix typo * update the readme Co-authored-by: Xu, Zhenzhong <[email protected]>
1 parent 5847101 commit ae8476e

File tree

10 files changed

+771
-55
lines changed

10 files changed

+771
-55
lines changed

examples/deployment/neural_engine/sparse/bert_mini/README.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -64,10 +64,18 @@ python prepare_dataset.py --dataset_name=glue --task_name=sst2 --output_dir=./da
6464
### 2.2 Get sparse model
6565

6666
Neural Engine can parse Sparse ONNX model and Neural Engine IR.
67-
You can train a Bert mini sst2 sparse model with distillation through Neural Compressor [example](https://github.com/intel-innersource/frameworks.ai.lpot.intel-lpot/blob/28e9b1e66c23f4443a2be8f2926fee1e919f5a14/examples/pytorch/nlp/huggingface_models/text-classification/pruning_while_distillation/group_lasso/eager/README.md). and transpose the weight and activation to get better performance.
68-
Neural Engine will automatically detect weight structured sparse ratio, as long as it beyond 70% (since normaly get performance gain when sparse ratio beyond 70%), Neural Engine will call [SparseLib](https://github.com/intel-innersource/frameworks.ai.nlp-toolkit.intel-nlp-toolkit/tree/develop/nlp_toolkit/backends/neural_engine/SparseLib) kernels and high performance layernorm op with transpose mode to improve inference performance.
67+
You can train a Bert mini sst2 sparse model with distillation through Neural Compressor [example](https://github.com/intel-innersource/frameworks.ai.lpot.intel-lpot/blob/28e9b1e66c23f4443a2be8f2926fee1e919f5a14/examples/pytorch/nlp/huggingface_models/text-classification/pruning_while_distillation/group_lasso/eager/README.md). Or use the [sparse model](https://huggingface.co/Intel/bert-mini-sst2-distilled-sparse-90-1X4-block) we publiced on huggingface which is bert mini on sst2 with sparse ratio 90% 1X4 block.
68+
You can get INT8 ONNX sparse model from optimization module by setting precision=int8, command as follows:
69+
```shell
70+
bash prepare_model.sh --input_model=Intel/bert-mini-sst2-distilled-sparse-90-1X4-block --task_name=sst2 --output_dir=./model_and_tokenizer --precision=int8
71+
```
72+
Then you can generate tranposed sparse model to get better performance, command as follows:
73+
```shell
74+
python export_tranpose_ir.py --input_model=./model_and_tokenizer/int8-model.onnx
75+
```
6976

7077
### Benchmark
78+
Neural Engine will automatically detect weight structured sparse ratio, as long as it beyond 70% (since normaly get performance gain when sparse ratio beyond 70%), Neural Engine will call [SparseLib](https://github.com/intel-innersource/frameworks.ai.nlp-toolkit.intel-nlp-toolkit/tree/develop/nlp_toolkit/backends/neural_engine/SparseLib) kernels and high performance layernorm op with transpose mode to improve inference performance.
7179

7280
2.1 accuracy
7381
run python
@@ -92,15 +100,7 @@ Neural Engine will automatically detect weight structured sparse ratio, as long
92100
bash run_benchmark.sh --input_model=./sparse_int8_ir --mode=performance --batch_size=8 --seq_len=128
93101
```
94102

95-
or compile framwork model to IR using python API
96-
97-
```
98-
from nlp_toolkit.backends.neural_engine.compile import compile
99-
graph = compile('./sparse_int8_ir')
100-
graph.save('./ir')
101-
```
102-
103-
and run C++
103+
Or run C++
104104
The warmup below is recommended to be 1/10 of iterations and no less than 3.
105105

106106
```
@@ -110,5 +110,5 @@ Neural Engine will automatically detect weight structured sparse ratio, as long
110110
export UNIFIED_BUFFER=1
111111
numactl -C 0-<cpu_cores-1> <NLP_Toolkit_folder>/nlp_toolkit/backends/neural_engine/bin/neural_engine
112112
--batch_size=<batch_size> --iterations=<iterations> --w=<warmup>
113-
--seq_len=128 --config=./ir/conf.yaml --weight=./ir/model.bin
113+
--seq_len=128 --config=./sparse_int8_ir/conf.yaml --weight=./sparse_int8_ir/model.bin
114114
```
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
from nlp_toolkit.backends.neural_engine.compile import compile
2+
from nlp_toolkit.backends.neural_engine.compile.graph import Graph
3+
import os
4+
import argparse
5+
6+
if __name__ == '__main__':
7+
parser = argparse.ArgumentParser()
8+
parser.add_argument('--input_model', default="./model_and_tokenizer/int8-model.onnx",
9+
type=str, help="Input model path.")
10+
parser.add_argument('--output_dir',
11+
help='directory to save data to',
12+
type=str, default='./sparse_int8_ir')
13+
args = parser.parse_args()
14+
15+
graph = compile(args.input_model)
16+
graph.save()
17+
model = Graph()
18+
model.graph_init('./ir/conf.yaml', './ir/model.bin')
19+
model.transpose_mode_int8()
20+
model.save(args.output_dir)
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
#!/bin/bash
2+
# set -x
3+
function main {
4+
init_params "$@"
5+
prepare_model
6+
}
7+
8+
# init params
9+
function init_params {
10+
for var in "$@"
11+
do
12+
case $var in
13+
--input_model=*)
14+
input_model=$(echo $var |cut -f2 -d=)
15+
;;
16+
--task_name=*)
17+
task_name=$(echo $var |cut -f2 -d=)
18+
;;
19+
--cache_dir=*)
20+
cache_dir=$(echo $var |cut -f2 -d=)
21+
;;
22+
--output_dir=*)
23+
output_dir=$(echo $var |cut -f2 -d=)
24+
;;
25+
--precision=*)
26+
precision=$(echo $var |cut -f2 -d=)
27+
;;
28+
esac
29+
done
30+
}
31+
32+
function prepare_model {
33+
34+
mode_cmd=""
35+
if [[ ${precision} = 'int8' ]]; then
36+
mode_cmd=$mode_cmd" --tune --quantization_approach PostTrainingStatic"
37+
fi
38+
if [[ ${precision} = 'bf16' ]]; then
39+
mode_cmd=$mode_cmd" --enable_bf16"
40+
fi
41+
echo ${mode_cmd}
42+
43+
cache="./tmp"
44+
if [[ ${cache_dir} ]]; then
45+
cache="$cache_dir"
46+
fi
47+
echo ${cache}
48+
49+
python run_glue.py \
50+
--model_name_or_path ${input_model} \
51+
--task_name ${task_name} \
52+
--do_train \
53+
--do_eval \
54+
--cache_dir ${cache} \
55+
--output_dir ${output_dir} \
56+
--overwrite_output_dir \
57+
--to_onnx \
58+
${mode_cmd}
59+
60+
}
61+
62+
main "$@"
63+

0 commit comments

Comments
 (0)