Skip to content

Feature Request: The script convert_hf_to_gguf.py supports conversion of DeepSeek-R1-0528-FP4. #15415

@luke-8-pro

Description

@luke-8-pro

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

I'm trying to use convert_hf_to_gguf.py to convert the DeepSeek-R1-0528-FP4 safetensor files into gguf format. I hope llama.cpp can be further improved to fully support the conversion of models like DeepSeek-R1-0528-FP4 and other NVFP4-type formats.

Motivation

Adding support to convert the DeepSeek-R1-0528-FP4 model into the GGUF format via convert_hf_to_gguf.py is a critical enhancement for the llama.cpp ecosystem. DeepSeek-R1-0528-FP4 is a recently released, high-performance language model developed by DeepSeek, featuring strong reasoning capabilities and optimized for efficient inference. As it gains increasing attention within the open-source community, more and more users are interested in running it locally on consumer-grade hardware—precisely the use case that llama.cpp is designed to enable.

However, llama.cpp only supports models in the GGUF format, which is specifically designed for memory efficiency, flexible quantization, and fast CPU/GPU inference. Without a reliable conversion pipeline from Hugging Face (safetensors) to GGUF, users cannot fully leverage the potential of this model within the llama.cpp framework.

Currently, attempts to convert DeepSeek-R1-0528-FP4 using the latest version of convert_hf_to_gguf.py result in runtime errors. These issues likely stem from architectural differences or metadata handling that the script does not yet fully support—such as custom layer configurations, tensor naming conventions, or FP4-specific quantization logic.

Enabling this conversion would provide several key benefits:

Local Inference Accessibility: Users could run DeepSeek-R1-0528-FP4 directly on personal devices without relying on cloud APIs, ensuring data privacy, low latency, and offline usability.

Efficient Hardware Utilization: The advanced quantization options in GGUF allow users to run large models on systems with limited VRAM or even on CPUs, significantly lowering the hardware barrier to entry.

Enhanced Compatibility with Existing Tools: Once converted, the model can be seamlessly integrated into a wide range of llama.cpp frontends (e.g., LM Studio, Ollama, text-generation-webui), improving user experience and ecosystem interoperability.

Community Empowerment: Providing official or community-supported conversion enables broader adoption, fine-tuning, and the development of downstream applications built upon this powerful model.

In summary, supporting the conversion of DeepSeek-R1-0528-FP4 in convert_hf_to_gguf.py is not merely a technical improvement—it is a strategic step toward maintaining inclusivity, performance, and future-readiness within the local LLM community. Resolving the current conversion issues will empower users to harness one of the most promising open-source models of 2024 within the efficient and versatile llama.cpp runtime.

Possible Implementation

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions