Skip to content

Eval bug: MiniCPM quantization fails with missing key minicpm.embedding_scale #16192

@Night1992

Description

@Night1992

Name and Version

./build/bin/llama-cli --version  1 ↵  863  14:31:16
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.020 sec
ggml_metal_device_init: GPU name: Apple M4
ggml_metal_device_init: GPU family: MTLGPUFamilyApple9 (1009)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal3 (5001)
ggml_metal_device_init: simdgroup reduction = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory = true
ggml_metal_device_init: has bfloat = true
ggml_metal_device_init: use residency sets = true
ggml_metal_device_init: use shared buffers = true
ggml_metal_device_init: recommendedMaxWorkingSetSize = 11453.25 MB
version: 6547 (138c87c)
built with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.2.0

Operating systems

Mac

GGML backends

AMX

Hardware

download url:https://huggingface.co/runfuture/MiniCPM-2B-dpo-fp16-gguf/tree/main

Models

No response

Problem description & steps to reproduce

when I quantize a MiniCPM model,got a error info:
llama_model_quantize: failed to quantize: key not found in model: minicpm.embedding_scale

First Bad Commit

No response

Relevant log output

./build/bin/llama-quantize ./models/MiniCPM-2B-dpo-fp16-gguf.gguf ./models/Minicpm/ggml-model-Q4_K_M.gguf Q4_K_M          
main: build = 6547 (138c87ce)
main: built with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.2.0
main: quantizing './models/MiniCPM-2B-dpo-fp16-gguf.gguf' to './models/Minicpm/ggml-model-Q4_K_M.gguf' as Q4_K_M
llama_model_loader: loaded meta data with 21 key-value pairs and 362 tensors from ./models/MiniCPM-2B-dpo-fp16-gguf.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = minicpm
llama_model_loader: - kv   1:                               general.name str              = MiniCPM
llama_model_loader: - kv   2:                     minicpm.context_length u32              = 2048
llama_model_loader: - kv   3:                   minicpm.embedding_length u32              = 2304
llama_model_loader: - kv   4:                        minicpm.block_count u32              = 40
llama_model_loader: - kv   5:                minicpm.feed_forward_length u32              = 5760
llama_model_loader: - kv   6:               minicpm.rope.dimension_count u32              = 64
llama_model_loader: - kv   7:               minicpm.attention.head_count u32              = 36
llama_model_loader: - kv   8:            minicpm.attention.head_count_kv u32              = 36
llama_model_loader: - kv   9:   minicpm.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                          general.file_type u32              = 1
llama_model_loader: - kv  11:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  12:                      tokenizer.ggml.tokens arr[str,122753]  = ["<unk>", "<s>", "</s>", "<SEP>", "<C...
llama_model_loader: - kv  13:                      tokenizer.ggml.scores arr[f32,122753]  = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr[i32,122753]  = [3, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  15:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  16:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  17:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  18:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  19:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  20:                    tokenizer.chat_template str              = {% for message in messages %}{% if me...
llama_model_loader: - type  f32:   81 tensors
llama_model_loader: - type  f16:  281 tensors
llama_model_quantize: failed to quantize: key not found in model: minicpm.embedding_scale
main: failed to quantize model from './models/MiniCPM-2B-dpo-fp16-gguf.gguf'

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinggood first issueGood for newcomers

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions