[BUG] DeepSeek V3 weight_scale_inv tensor mapping not supported in converter

## 🚨 DeepSeek V3 weight_scale_inv Tensor Mapping Issue

**Model:** Kimi K2 (DeepSeek V3)  
**Architecture:** DeepSeek V3 with FP8 quantization  
**Issue:** Converter fails on weight_scale_inv tensors  
**Priority:** HIGH - Blocking model integration  

### 📋 Technical Details

#### Environment
- **llama.cpp commit:** 36c15324 (latest master)
- **Python version:** 3.13.5
- **Platform:** macOS (Apple Silicon M4)
- **Model size:** ~1TB (61 files)

#### Error Details
```
ValueError: Can not map tensor 'model.layers.0.mlp.down_proj.weight_scale_inv'
```

**Root Cause:** The llama.cpp converter doesn't support the `weight_scale_inv` tensor mapping that DeepSeek V3 uses for FP8 weight dequantization.

#### Model Information
- **Model:** Kimi K2 (DeepSeek V3)
- **Architecture:** DeepSeek V3 with FP8 quantization
- **Size:** ~2TB (78 files)
- **Target:** Convert to GGUF format for Apple Silicon deployment

### 🔍 Investigation Results

#### ✅ What We Verified
1. **llama.cpp is current** - Pulled latest master branch
2. **Model architecture recognized** - DeepSeek V3ForCausalLM detected correctly
3. **Basic conversion starts** - Model loading and initial tensor processing works
4. **Hardware ready** - max2 with 128GB RAM and 4TB storage available

#### ❌ What We Found Missing
1. **No DeepSeek V3 support** - llama.cpp converter lacks weight_scale_inv handling
2. **No experimental branches** - Checked dev and other development branches
3. **No community patches** - No existing workarounds in GitHub issues/PRs
4. **No alternative converters** - Standard tools don't support this architecture

#### 🔍 Technical Investigation
- **llama.cpp version:** Latest master (commit 36c15324)
- **DeepSeek V3 class:** Uses DeepseekV2Model class (line 5901-5902)
- **Tensor mapping:** Inherits from TextModel without weight_scale_inv support
- **FP8 tensors:** Model uses torch.float8_e4m3fn format

### 🛠️ Attempted Solutions

#### 1. Standard Conversion
```bash
python3 convert_hf_to_gguf.py /Volumes/4TB/ai-models/kimi-k2-original \
  --outfile /Volumes/4TB/ai-models/kimi-k2.gguf \
  --outtype bf16
```
**Result:** Failed at weight_scale_inv tensor mapping

#### 2. Alternative Output Types
- Tried `--outtype auto`, `f16`, `f32`
- **Result:** Same failure point

#### 3. Direct Model Loading
```python
# Attempted with transformers + MPS backend
model = AutoModelForCausalLM.from_pretrained("/Volumes/4TB/ai-models/kimi-k2-original")
```
**Result:** Failed due to model size (~1.9TB buffer error)

#### 4. MLX Environment
- Checked for DeepSeek V3 support in MLX
- **Result:** Model class not available

#### 5. GitHub Investigation
- Searched llama.cpp issues and PRs for DeepSeek V3 support
- Checked experimental branches (dev, wip branches)
- **Result:** No existing support or workarounds found

### 🎯 Request for Assistance

We are seeking guidance from the llama.cpp community on how to proceed with DeepSeek V3 model conversion. Specifically:

1. **Is DeepSeek V3 support planned or in development?**
2. **Are there any experimental branches with weight_scale_inv support?**
3. **What would be the best approach to add this support?**
4. **Are there alternative conversion methods we should consider?**

### 📊 Impact

This issue is blocking the integration of a significant AI model (Kimi K2) into our development lab. The model represents a substantial investment and would be a valuable addition to our AI team.

### 🔗 Additional Context

- **Model Location:** Available for testing if needed
- **Environment Details:** macOS with Apple Silicon, 128GB RAM available

---

**Status:** BLOCKED - Awaiting community guidance  
**Priority:** HIGH - Model integration critical for lab expansion  

Thank you for your time and assistance! 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] DeepSeek V3 weight_scale_inv tensor mapping not supported in converter #14781

🚨 DeepSeek V3 weight_scale_inv Tensor Mapping Issue

📋 Technical Details

Environment

Error Details

Model Information

🔍 Investigation Results

✅ What We Verified

❌ What We Found Missing

🔍 Technical Investigation

🛠️ Attempted Solutions

1. Standard Conversion

2. Alternative Output Types

3. Direct Model Loading

4. MLX Environment

5. GitHub Investigation

🎯 Request for Assistance

📊 Impact

🔗 Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] DeepSeek V3 weight_scale_inv tensor mapping not supported in converter #14781

Description

🚨 DeepSeek V3 weight_scale_inv Tensor Mapping Issue

📋 Technical Details

Environment

Error Details

Model Information

🔍 Investigation Results

✅ What We Verified

❌ What We Found Missing

🔍 Technical Investigation

🛠️ Attempted Solutions

1. Standard Conversion

2. Alternative Output Types

3. Direct Model Loading

4. MLX Environment

5. GitHub Investigation

🎯 Request for Assistance

📊 Impact

🔗 Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions