-
Notifications
You must be signed in to change notification settings - Fork 53
Open
Description
Encoding matrix with uint1 does not work. There are two different uint1 matrices, that produce the same result after transform_weight, which is not correct.
Sample matrices are eight 1s followed by zeroes, and and nine 1s followed by zeroes.
In both cases, the result will be [15, 15, 0, 0], even though in the second one it should probably be [31, 15, 0, 0],
Minimal example to reproduce / illustrate:
import bitblas
import torch
M = 1
NK = 32
matmul_config = bitblas.MatmulConfig(
M=M, # M dimension
N=NK, # N dimension
K=NK, # K dimension
A_dtype="float16", # activation A dtype
W_dtype="uint1", # weight W dtype
accum_dtype="float32", # accumulation dtype
out_dtype="float16", # output dtype
layout="nt", # matrix layout, "nt" indicates the layout of A is non-transpose and the layout of W is transpose
with_bias=False, # bias
# configs for weight only quantization
group_size=None, # setting for grouped quantization
with_scaling=False, # setting for scaling factor
with_zeros=False, # setting for zeros
zeros_mode=None, # setting for how to calculating zeros
)
matmul = bitblas.Matmul(config=matmul_config)
t1 = torch.zeros((NK,NK), dtype=torch.uint8).cuda()
t1[0,0:8]+= 1
out1 = matmul.transform_weight(t1)
t2=t1.clone()
t2[0,8]+=1
out2 = matmul.transform_weight(t2)
print(torch.equal(out1, out2)) # This should not be true, but it is
print(torch.equal(t1, t2))
Tested on version bitblas==0.1.0.post1 and bitblas==0.1.0 from pip:
- NVIDIA A100-SXM4-80GB, cuda 12.7, Python 3.12.11
- NVIDIA GeForce RTX 2080, cuda 12.4, Python 3.11.6
Metadata
Metadata
Assignees
Labels
No labels