-
Notifications
You must be signed in to change notification settings - Fork 13.2k
Description
Name and Version
./build/bin/llama-cli --version 1 ↵ 863 14:31:16
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.020 sec
ggml_metal_device_init: GPU name: Apple M4
ggml_metal_device_init: GPU family: MTLGPUFamilyApple9 (1009)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal3 (5001)
ggml_metal_device_init: simdgroup reduction = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory = true
ggml_metal_device_init: has bfloat = true
ggml_metal_device_init: use residency sets = true
ggml_metal_device_init: use shared buffers = true
ggml_metal_device_init: recommendedMaxWorkingSetSize = 11453.25 MB
version: 6547 (138c87c)
built with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.2.0
Operating systems
Mac
GGML backends
AMX
Hardware
download url:https://huggingface.co/runfuture/MiniCPM-2B-dpo-fp16-gguf/tree/main
Models
No response
Problem description & steps to reproduce
when I quantize a MiniCPM model,got a error info:
llama_model_quantize: failed to quantize: key not found in model: minicpm.embedding_scale
First Bad Commit
No response
Relevant log output
./build/bin/llama-quantize ./models/MiniCPM-2B-dpo-fp16-gguf.gguf ./models/Minicpm/ggml-model-Q4_K_M.gguf Q4_K_M
main: build = 6547 (138c87ce)
main: built with Apple clang version 16.0.0 (clang-1600.0.26.6) for arm64-apple-darwin24.2.0
main: quantizing './models/MiniCPM-2B-dpo-fp16-gguf.gguf' to './models/Minicpm/ggml-model-Q4_K_M.gguf' as Q4_K_M
llama_model_loader: loaded meta data with 21 key-value pairs and 362 tensors from ./models/MiniCPM-2B-dpo-fp16-gguf.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = minicpm
llama_model_loader: - kv 1: general.name str = MiniCPM
llama_model_loader: - kv 2: minicpm.context_length u32 = 2048
llama_model_loader: - kv 3: minicpm.embedding_length u32 = 2304
llama_model_loader: - kv 4: minicpm.block_count u32 = 40
llama_model_loader: - kv 5: minicpm.feed_forward_length u32 = 5760
llama_model_loader: - kv 6: minicpm.rope.dimension_count u32 = 64
llama_model_loader: - kv 7: minicpm.attention.head_count u32 = 36
llama_model_loader: - kv 8: minicpm.attention.head_count_kv u32 = 36
llama_model_loader: - kv 9: minicpm.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 10: general.file_type u32 = 1
llama_model_loader: - kv 11: tokenizer.ggml.model str = llama
llama_model_loader: - kv 12: tokenizer.ggml.tokens arr[str,122753] = ["<unk>", "<s>", "</s>", "<SEP>", "<C...
llama_model_loader: - kv 13: tokenizer.ggml.scores arr[f32,122753] = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv 14: tokenizer.ggml.token_type arr[i32,122753] = [3, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 15: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 16: tokenizer.ggml.eos_token_id u32 = 2
llama_model_loader: - kv 17: tokenizer.ggml.unknown_token_id u32 = 0
llama_model_loader: - kv 18: tokenizer.ggml.add_bos_token bool = true
llama_model_loader: - kv 19: tokenizer.ggml.add_eos_token bool = false
llama_model_loader: - kv 20: tokenizer.chat_template str = {% for message in messages %}{% if me...
llama_model_loader: - type f32: 81 tensors
llama_model_loader: - type f16: 281 tensors
llama_model_quantize: failed to quantize: key not found in model: minicpm.embedding_scale
main: failed to quantize model from './models/MiniCPM-2B-dpo-fp16-gguf.gguf'