-
Notifications
You must be signed in to change notification settings - Fork 13k
Description
🚨 DeepSeek V3 weight_scale_inv Tensor Mapping Issue
Model: Kimi K2 (DeepSeek V3)
Architecture: DeepSeek V3 with FP8 quantization
Issue: Converter fails on weight_scale_inv tensors
Priority: HIGH - Blocking model integration
📋 Technical Details
Environment
- llama.cpp commit: 36c1532 (latest master)
- Python version: 3.13.5
- Platform: macOS (Apple Silicon M4)
- Model size: ~1TB (61 files)
Error Details
ValueError: Can not map tensor 'model.layers.0.mlp.down_proj.weight_scale_inv'
Root Cause: The llama.cpp converter doesn't support the weight_scale_inv
tensor mapping that DeepSeek V3 uses for FP8 weight dequantization.
Model Information
- Model: Kimi K2 (DeepSeek V3)
- Architecture: DeepSeek V3 with FP8 quantization
- Size: ~2TB (78 files)
- Target: Convert to GGUF format for Apple Silicon deployment
🔍 Investigation Results
✅ What We Verified
- llama.cpp is current - Pulled latest master branch
- Model architecture recognized - DeepSeek V3ForCausalLM detected correctly
- Basic conversion starts - Model loading and initial tensor processing works
- Hardware ready - max2 with 128GB RAM and 4TB storage available
❌ What We Found Missing
- No DeepSeek V3 support - llama.cpp converter lacks weight_scale_inv handling
- No experimental branches - Checked dev and other development branches
- No community patches - No existing workarounds in GitHub issues/PRs
- No alternative converters - Standard tools don't support this architecture
🔍 Technical Investigation
- llama.cpp version: Latest master (commit 36c1532)
- DeepSeek V3 class: Uses DeepseekV2Model class (line 5901-5902)
- Tensor mapping: Inherits from TextModel without weight_scale_inv support
- FP8 tensors: Model uses torch.float8_e4m3fn format
🛠️ Attempted Solutions
1. Standard Conversion
python3 convert_hf_to_gguf.py /Volumes/4TB/ai-models/kimi-k2-original \
--outfile /Volumes/4TB/ai-models/kimi-k2.gguf \
--outtype bf16
Result: Failed at weight_scale_inv tensor mapping
2. Alternative Output Types
- Tried
--outtype auto
,f16
,f32
- Result: Same failure point
3. Direct Model Loading
# Attempted with transformers + MPS backend
model = AutoModelForCausalLM.from_pretrained("/Volumes/4TB/ai-models/kimi-k2-original")
Result: Failed due to model size (~1.9TB buffer error)
4. MLX Environment
- Checked for DeepSeek V3 support in MLX
- Result: Model class not available
5. GitHub Investigation
- Searched llama.cpp issues and PRs for DeepSeek V3 support
- Checked experimental branches (dev, wip branches)
- Result: No existing support or workarounds found
🎯 Request for Assistance
We are seeking guidance from the llama.cpp community on how to proceed with DeepSeek V3 model conversion. Specifically:
- Is DeepSeek V3 support planned or in development?
- Are there any experimental branches with weight_scale_inv support?
- What would be the best approach to add this support?
- Are there alternative conversion methods we should consider?
📊 Impact
This issue is blocking the integration of a significant AI model (Kimi K2) into our development lab. The model represents a substantial investment and would be a valuable addition to our AI team.
🔗 Additional Context
- Model Location: Available for testing if needed
- Environment Details: macOS with Apple Silicon, 128GB RAM available
Status: BLOCKED - Awaiting community guidance
Priority: HIGH - Model integration critical for lab expansion
Thank you for your time and assistance!