-
Notifications
You must be signed in to change notification settings - Fork 13k
Closed
Labels
bugSomething isn't workingSomething isn't workingsplitGGUF split model shardingGGUF split model sharding
Description
System:
Ubuntu server 22.04 LTS
5950X
64GB Ram 3200Mhz
2x Nvidia 3090
Steps to reproduce:
First I converted the model to FP16 GGUF with:
./convert.py --outfile Karasu-Mixtral-8x22B-v0.1-fp16.gguf --outtype f16 lightblue_Karasu-Mixtral-8x22B-v0.1
That worked just fine and I got:
Then to quantize it to Q5_K_M:
./quantize Karasu-Mixtral-8x22B-v0.1-fp16.gguf Karasu-Mixtral-8x22B-v0.1-Q5_K_M.gguf Q5_K_M
But when using gguf-split --split even though I'm using --split-max-tensors 128 the sizes of the chunks are inconsistent:
./gguf-split --split --split-max-tensors 128 /nfs/models/Karasu-Mixtral-8x22B-v0.1-Q5_K_M.gguf /nfs/models/
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingsplitGGUF split model shardingGGUF split model sharding