Skip to content

[BUG] DeepSeek V3 weight_scale_inv tensor mapping not supported in converter #14781

@howeirdo

Description

@howeirdo

🚨 DeepSeek V3 weight_scale_inv Tensor Mapping Issue

Model: Kimi K2 (DeepSeek V3)
Architecture: DeepSeek V3 with FP8 quantization
Issue: Converter fails on weight_scale_inv tensors
Priority: HIGH - Blocking model integration

📋 Technical Details

Environment

  • llama.cpp commit: 36c1532 (latest master)
  • Python version: 3.13.5
  • Platform: macOS (Apple Silicon M4)
  • Model size: ~1TB (61 files)

Error Details

ValueError: Can not map tensor 'model.layers.0.mlp.down_proj.weight_scale_inv'

Root Cause: The llama.cpp converter doesn't support the weight_scale_inv tensor mapping that DeepSeek V3 uses for FP8 weight dequantization.

Model Information

  • Model: Kimi K2 (DeepSeek V3)
  • Architecture: DeepSeek V3 with FP8 quantization
  • Size: ~2TB (78 files)
  • Target: Convert to GGUF format for Apple Silicon deployment

🔍 Investigation Results

✅ What We Verified

  1. llama.cpp is current - Pulled latest master branch
  2. Model architecture recognized - DeepSeek V3ForCausalLM detected correctly
  3. Basic conversion starts - Model loading and initial tensor processing works
  4. Hardware ready - max2 with 128GB RAM and 4TB storage available

❌ What We Found Missing

  1. No DeepSeek V3 support - llama.cpp converter lacks weight_scale_inv handling
  2. No experimental branches - Checked dev and other development branches
  3. No community patches - No existing workarounds in GitHub issues/PRs
  4. No alternative converters - Standard tools don't support this architecture

🔍 Technical Investigation

  • llama.cpp version: Latest master (commit 36c1532)
  • DeepSeek V3 class: Uses DeepseekV2Model class (line 5901-5902)
  • Tensor mapping: Inherits from TextModel without weight_scale_inv support
  • FP8 tensors: Model uses torch.float8_e4m3fn format

🛠️ Attempted Solutions

1. Standard Conversion

python3 convert_hf_to_gguf.py /Volumes/4TB/ai-models/kimi-k2-original \
  --outfile /Volumes/4TB/ai-models/kimi-k2.gguf \
  --outtype bf16

Result: Failed at weight_scale_inv tensor mapping

2. Alternative Output Types

  • Tried --outtype auto, f16, f32
  • Result: Same failure point

3. Direct Model Loading

# Attempted with transformers + MPS backend
model = AutoModelForCausalLM.from_pretrained("/Volumes/4TB/ai-models/kimi-k2-original")

Result: Failed due to model size (~1.9TB buffer error)

4. MLX Environment

  • Checked for DeepSeek V3 support in MLX
  • Result: Model class not available

5. GitHub Investigation

  • Searched llama.cpp issues and PRs for DeepSeek V3 support
  • Checked experimental branches (dev, wip branches)
  • Result: No existing support or workarounds found

🎯 Request for Assistance

We are seeking guidance from the llama.cpp community on how to proceed with DeepSeek V3 model conversion. Specifically:

  1. Is DeepSeek V3 support planned or in development?
  2. Are there any experimental branches with weight_scale_inv support?
  3. What would be the best approach to add this support?
  4. Are there alternative conversion methods we should consider?

📊 Impact

This issue is blocking the integration of a significant AI model (Kimi K2) into our development lab. The model represents a substantial investment and would be a valuable addition to our AI team.

🔗 Additional Context

  • Model Location: Available for testing if needed
  • Environment Details: macOS with Apple Silicon, 128GB RAM available

Status: BLOCKED - Awaiting community guidance
Priority: HIGH - Model integration critical for lab expansion

Thank you for your time and assistance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions