minicpm: make embedding_scale residual_scale logit_scale optional with legacy defaults. #16273
+10
−3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
baseline-perplexity-16192.txt
afterfix-perplexity-16192.txt
baseline-bench-16192.txt
afterfix-bench-16192.txt
ci.zip
Fixes: #16192
Summary
Older MiniCPM GGUFs do not include the scaling metadata keys. The loader previously treated these as required, so quantization failed with "key not found in model". This PR treats those keys optional and supplies legacy default values so older files quantize and load.
Problem
Some MiniCPM GGUFs do not contain
The loader currently treats these as required, so quantization fails with:
key not found in model: minicpm.embedding_scale
Solution
In the LLM_ARCH_MINICPM branch of the loader, Initialize MiniCPM scaling parameters with legacy MiniCPM values:
Read the three GGUF keys with required = false. When the GGUF provides the keys, their values override the defaults; otherwise the legacy defaults are used.
Newer GGUFs that already include these keys are unaffected.
User impact
Validation
Functional (older + newer MiniCPM)
Perplexity (CPU-only)
Command used in both runs:
Results:
Conclusion: Perplexity is identical. Throughput difference is within normal CPU variance.
Raw logs (attached)
llama-bench (CPU-only)
Command used in both runs:
Results:
No regression observed.
Raw logs (attached)
Local CI (CPU-only)
Executed from repo root:
Outcome:
CI Log attached: ci.zip
Style
Formatted with clang-format 18.1.3, only the lines changed in this PR were formatted.
Environment
Build SHAs used
Baseline: c498fc8 (short c498fc8)
After fix: 6337679 (short 6337679)