Skip to content

Commit 5847101

Browse files
SparseLib add doc for kernel dump and injector (#244)
* SparseLib add doc for kernel dump add doc for postop injector Co-authored-by: ZheWang <[email protected]>
1 parent f56fd61 commit 5847101

File tree

5 files changed

+361
-8
lines changed

5 files changed

+361
-8
lines changed

nlp_toolkit/backends/neural_engine/SparseLib/README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,10 @@ make -j
2626
./test_spmm_vnni_kernel
2727
```
2828

29+
### Performance
30+
We provide a benchmark tool to measure the performance out of box, please refer to [benchmark](../test/SparseLib/benchmark/README.md) for more details.
31+
For advanced users, please refer to [profling section](docs/profiling.md).
32+
2933
## API reference for users
3034
### sparse_matmul kernel:
3135
```cpp
Lines changed: 2 additions & 0 deletions
Loading
Lines changed: 205 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,205 @@
1+
<a name="VRz3S"></a>
2+
# Introduction
3+
op-fusion is a very widely used optimization approach in Deep-Learning.Consider we have two ops,Conv and Relu,in traditional way,we apply Conv op firstly,then store the value to the memory,after that we load the value and apply Relu.Obviously there have a useless load&store operations,we can fuse the Conv&Relu to remove the useless I/O,this is the key idea about op-fusion.<br />In SparseLib,we will provide a new class named injector for the op-fusion.In the perspective of the developers who want to apply the op-fusion optimization,they can make injector as a member of their jit_kernel class and initalize it in the kernel class's construct function,when they want to apply postop,just need to call **injector->vector_compute** and tell injector what registers has been used by **injector->escape_regs**.Besides,upper level user also should call **injector->prepare_table** to prepare the LUT which postop need in the end of thier xbyak kernel.<br />injector supports 8 operators currently,there are exp,tanh,gelu,relu,linear,quantize(fp32->u8/s8),dequantize(u8/s8->fp32) and look-up result from LUT(as experimental API now).Injector also supports a postop-chain for apply multiple postops sequentially.
4+
<a name="vY7m9"></a>
5+
# Proposal
6+
<a name="ml53U"></a>
7+
## SparseLib developer's perspective
8+
<a name="PuqaO"></a>
9+
### Framework changes
10+
<a name="Bu07F"></a>
11+
#### param_types.hpp
12+
Add some new fields.The most important field is `postop_attr`,which indicate the postop's attribute developer want to apply,include data_type(e.g. fp32/bf16),op_type(e.g. element wise/binary wise),algo_type(e.g. Gelu/Relu),aplha(zero points for quantization),beta,sacle for some operators such as linear&quantize.
13+
```cpp
14+
enum class postop_alg : uint8_t { exp, gelu, tanh, gelu, relu, quantize, dequantize, linear, int8_lut };
15+
16+
enum class postop_type : uint8_t { eltwise };
17+
18+
// postop attribute for op-fusion
19+
class postop_attr {
20+
public:
21+
data_type dt;
22+
postop_type op_type;
23+
postop_alg op_alg;
24+
float alpha = 0;
25+
float beta = 0;
26+
float scale = 0;
27+
28+
postop_attr(){};
29+
30+
postop_attr(const data_type& dt, const postop_type& op_type, const postop_alg& op_alg, float alpha = 0.0,
31+
float beta = 0.0, float scale = 0.0)
32+
: dt(dt), op_type(op_type), op_alg(op_alg), alpha(alpha), beta(beta), scale(scale) {}
33+
};
34+
```
35+
36+
##### alpha,beta,scale meaning
37+
these 3 params only used in quantize,dequantize,linear,relu.
38+
The quantize's mathematical definition is fp32=saturate(round(int8/scale+zero_point)) and the dequantize's mathematical definition is int8=(fp32-zero_point)*scale.In these two operators,alpha represent zero_point,scale represent scale and beta is useless.
39+
The mathematical definition of linear is y=αx+β.attr's alpha represent alpha,beta represent beta and scale is useless.
40+
The relu's mathematical definition is as follow,attr's alpha represent alpha,beta and scale are useless.
41+
![](../imgs/relu_formula.svg)
42+
43+
<a name="raAMd"></a>
44+
#### operator_desc.hpp
45+
Add a new member `apply_postops_list_`store the `postop_attr` which user want to apply.
46+
```cpp
47+
class operator_desc {
48+
public:
49+
operator_desc()
50+
: ker_kind_(jd::kernel_kind::undef),
51+
ker_prop_(jd::kernel_prop::undef),
52+
eng_kind_(jd::engine_kind::undef),
53+
impl_nthr_(0),
54+
ts_descs_({}),
55+
attrs_({}),
56+
apply_postops_list_({}) {}
57+
operator_desc(const jd::kernel_kind& ker_kind, const jd::kernel_prop& ker_prop, const jd::engine_kind& eng_kind,
58+
const std::vector<tensor_desc>& ts_descs, const std::unordered_map<std::string, std::string>& attrs,
59+
const std::vector<postop_attr>& apply_postops_list = {})
60+
: ker_kind_(ker_kind),
61+
ker_prop_(ker_prop),
62+
eng_kind_(eng_kind),
63+
impl_nthr_((omp_get_max_threads() == omp_get_num_procs()) ? 1 : omp_get_max_threads()),
64+
ts_descs_(ts_descs),
65+
attrs_(attrs),
66+
apply_postops_list_(apply_postops_list) {}
67+
68+
private:
69+
std::vector<postop_attr> apply_postops_list_;
70+
}
71+
```
72+
<a name="hZaPk"></a>
73+
#### jit_eltwise_injector.hpp
74+
I design a element-wise injector named eltwise_injector which can apply eltwise-postops.I maybe combine this injector into a new injector named postop-injector in the future,but at present we needn't because we only have element-wise postop now.Overdesign is harmful.<br />Here are the APIs which injector expose to the developer:<br />`eltwise_injector_init` used for injector initialization.<br />`vector_compute` used for execute the postop calculate, user can indicate the eltwiseop's idx to select the op which user want to apply, if the idx list is empty, the injector will apply all ops in postop-chian.<br />`escape_regs` used for tell injector which registers have been used in upper level kernel.All dst zmm registers should be registered.<br />
75+
`escape_erase` used for remove the specify type register ID from used_regs set,if reg_idx is not given,this function will erase all IDs by default.
76+
`prepare_table` used for insert the LUT which injected code need in the end of the upper level kernel.
77+
```cpp
78+
class jit_eltwise_injector {
79+
public:
80+
explicit jit_eltwise_injector(){};
81+
virtual ~jit_eltwise_injector() {}
82+
83+
void eltwise_injector_init(jit_generator* ptr, const std::vector<postop_attr>& postop_attrs);
84+
void vector_compute(const Xbyak::Zmm& zmm_src, const std::vector<postop_attr>& postop_attrs,std::vector<int> postop_idxs = {});
85+
void escape_regs(reg_type type, int reg_idx);
86+
void escape_erase(reg_type type,int reg_idx=-1);
87+
void prepare_table();
88+
};
89+
```
90+
<a name="AHBMr"></a>
91+
### How to use the injector
92+
let take the kernel `eltwiseop` as example.
93+
<a name="EibWR"></a>
94+
#### step0.Add a postop_attrs vector member in your params for pass the postop_attrs to the jit_kernel
95+
```cpp
96+
struct eltwiseop_param_t {
97+
size_t element_num;
98+
data_type dt;
99+
std::vector<postop_attr> postop_attrs;
100+
};
101+
```
102+
<a name="xqgcY"></a>
103+
#### step1.Make injector as a member of your jit_class and init it in your construct&param_init function.
104+
```cpp
105+
class jit_eltwiseop_t : public jit_generator {
106+
public:
107+
explicit jit_eltwiseop_t(const ssd::eltwiseop_param_t& param) : jit_generator(), param_(param) {
108+
eltwise_injector.eltwise_injector_init(this, param_.postop_attrs);
109+
assign_regs();
110+
}
111+
112+
private:
113+
ssd::eltwiseop_param_t param_;
114+
jit_eltwise_injector eltwise_injector;
115+
};
116+
```
117+
118+
```cpp
119+
bool eltwiseop_kd_t::init() {
120+
auto op_attr = op_desc_.attrs();
121+
params_.postop_attrs = op_desc_.apply_postops_list();
122+
return true;
123+
}
124+
```
125+
126+
<a name="zcZPl"></a>
127+
#### step2.Tell the injector which registers have been used before you apply postops.
128+
```cpp
129+
void jit_eltwiseop_t::assign_regs() {
130+
remain_task_mask = Xbyak::Opmask(6);
131+
scratch_ = Xbyak::Reg64(r10);
132+
reg_src = Zmm(6);
133+
addr_src = r15;
134+
addr_dst = r14;
135+
reg_param = rdi;
136+
remain_element_num = rsi;
137+
138+
eltwise_injector.escape_regs(reg_type::mask, remain_task_mask.getIdx());
139+
eltwise_injector.escape_regs(reg_type::reg64, scratch_.getIdx());
140+
eltwise_injector.escape_regs(reg_type::zmm, reg_src.getIdx());
141+
eltwise_injector.escape_regs(reg_type::zmm, addr_src.getIdx());
142+
eltwise_injector.escape_regs(reg_type::zmm, addr_dst.getIdx());
143+
}
144+
```
145+
**NOTE:Injector will avoid allocate special usage registers such as **`RCX,RDX,RSI,RDI,RSP`**. upper level op dose not need to tell injector the usage information of these registers.**
146+
<a name="zfFIG"></a>
147+
#### step3.Apply the postops where you want and then prepare the LUT at the end of the kernel.
148+
```cpp
149+
void jit_eltwiseop_t::generate() {
150+
this->preamble();
151+
load_params();
152+
153+
//load data.
154+
vmovups(reg_src, ptr[addr_src]);
155+
eltwise_injector.vector_compute(reg_src, param_.postop_attrs);
156+
//store data.
157+
vmovups(ptr[addr_dst], reg_src);
158+
159+
this->postamble();
160+
161+
eltwise_injector.prepare_table();
162+
}
163+
```
164+
**NOTE:The postops will be apply **`in-place`** and storing work is upper op's task.**
165+
<a name="rV6bL"></a>
166+
## SparseLib user's perspective
167+
This is the guide about how to set op-fusion in UT in user's perspective.
168+
<a name="IqCA0"></a>
169+
#### step0.Prepare the postop_attr
170+
```cpp
171+
postop_attr fp32_gelu_attr{data_type::fp32, postop_type::eltwise, postop_alg::gelu};
172+
postop_attr bf16_gelu_attr{data_type::bf16, postop_type::eltwise, postop_alg::gelu};
173+
postop_attr fp32_gelu_attr{data_type::fp32, postop_type::eltwise, postop_alg::gelu};
174+
postop_attr bf16_gelu_attr{data_type::bf16, postop_type::eltwise, postop_alg::gelu};
175+
```
176+
<a name="kyHEm"></a>
177+
#### step1.Gen_case.
178+
```cpp
179+
cases.push_back(
180+
{gen_case(kernel_kind::eltwiseop, kernel_prop::forward_inference, engine_kind::cpu, {data0_desc, data0_desc},
181+
{{"postop_list", "fp32_gelu+fp32_exp"}, mask_mock1, reg64_mock1, zmm_mock1},
182+
{fp32_gelu_attr, fp32_exp_attr}),
183+
false});
184+
cases.push_back(
185+
{gen_case(kernel_kind::eltwiseop, kernel_prop::forward_inference, engine_kind::cpu, {data1_desc, data1_desc},
186+
{{"postop_list", "bf16_gelu+bf16_exp"}, mask_mock1, reg64_mock1, zmm_mock1},
187+
{bf16_gelu_attr, bf16_exp_attr}),
188+
false});
189+
```
190+
**NOTE:please add a pair<"postop_list",dt1op1+dt2op2+...> in **`op_attrs`** field for kernel hasing.**
191+
#### step2.Check result
192+
```cpp
193+
void get_true_data(const operator_desc& op_desc, const std::vector<const void*>& rf_data) {
194+
float* src = (float*)rf_data[0];
195+
float* dst = (float*)rf_data[1];
196+
auto attr = op_desc.apply_postops_list();
197+
198+
for (int i = 0; i < num; i++) {
199+
float tmp = your_kernel_logic(src[i]);
200+
apply_postop_list(num, attr, tmp);
201+
dst[i]=tmp;
202+
}
203+
}
204+
```
205+
**NOTE: **`apply_postop_list`** is from head file **`unit_test_utils.hpp`****

0 commit comments

Comments
 (0)