Inconsistent size output of chunks with gguf-split

System:
Ubuntu server 22.04 LTS
5950X
64GB Ram 3200Mhz
2x Nvidia 3090

Steps to reproduce:

First I converted the model to FP16 GGUF with:
`./convert.py --outfile Karasu-Mixtral-8x22B-v0.1-fp16.gguf --outtype f16 lightblue_Karasu-Mixtral-8x22B-v0.1`

That worked just fine and I got:
![image](https://github.com/ggerganov/llama.cpp/assets/5622210/6da7e036-863c-44ef-9b38-eee0e746f406)

Then to quantize it to Q5_K_M:
`./quantize Karasu-Mixtral-8x22B-v0.1-fp16.gguf Karasu-Mixtral-8x22B-v0.1-Q5_K_M.gguf Q5_K_M`

That worked fine too:
![image](https://github.com/ggerganov/llama.cpp/assets/5622210/39401f0d-7f3d-4f18-b6b3-0f328ad34f12)


But when using gguf-split --split even though I'm using --split-max-tensors 128 the sizes of the chunks are inconsistent:

`./gguf-split --split --split-max-tensors 128 /nfs/models/Karasu-Mixtral-8x22B-v0.1-Q5_K_M.gguf /nfs/models/`

![image](https://github.com/ggerganov/llama.cpp/assets/5622210/f403478e-08f7-460b-8706-8b6b1c687111)
![image](https://github.com/ggerganov/llama.cpp/assets/5622210/4e9fc7de-fe17-4d4b-8c73-e092a5f7136d)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inconsistent size output of chunks with gguf-split #6634

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inconsistent size output of chunks with gguf-split #6634

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions