Add openpangu_moe model #42229

yzy1996 · 2025-11-16T15:58:07Z

What does this PR do?

This PR introduces inference support for the openpangu_moe model. By integrating the inference implementation directly into the Transformers library, we aim to ensure seamless compatibility and enable continuous updates alongside future open-source releases.

I welcome any feedback on potential improvements or additional changes that would enhance this implementation. Please feel free to share your suggestions!

Our open-source model is available at: https://ai.gitcode.com/ascend-tribe/openPangu-Ultra-MoE-718B-V1.1

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

github-actions · 2025-11-16T15:59:09Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, openpangu_moe

vasqu

Some initial review:

I think we can use modular to a higher extent - moe, attn (deepseek), rope (llama), ...
Would be nice if we could adopt this with more modular
Atm, we have a lot of old things used like _prepare_4d_causal_attention_mask or legacy caches which we should avoid

^ those are the biggest points IMO. Will this also be released on https://huggingface.co/ ?

vasqu · 2025-11-18T16:09:56Z

docs/source/en/model_doc/openpangu_moe.md

+The OpenPanguMoE model was proposed in [<INSERT PAPER NAME HERE>](<INSERT PAPER LINK HERE>) by <INSERT AUTHORS HERE>.
+<INSERT SHORT SUMMARY HERE>
+
+The abstract from the paper is the following:
+
+<INSERT PAPER ABSTRACT HERE>
+
+Tips:
+
+<INSERT TIPS ABOUT MODEL HERE>
+
+This model was contributed by [INSERT YOUR HF USERNAME HERE](https://huggingface.co/<INSERT YOUR HF USERNAME HERE>).
+The original code can be found [here](<INSERT LINK TO GITHUB REPO HERE>).
+
+## Usage examples
+
+<INSERT SOME NICE EXAMPLES HERE>


Don't forget the docs :D

vasqu · 2025-11-18T16:10:35Z

src/transformers/models/auto/modeling_auto.py

No associated tokenizer? Would go to tokenization_auto

vasqu · 2025-11-18T16:11:44Z

src/transformers/models/openpangu_moe/configuration_openpangu_moe.py

+# coding=utf-8
+# Copyright (c) 2025 Huawei Technologies Co., Ltd. All rights reserved.


Please use the full licence, elsewhere too

Suggested change

# coding=utf-8

# Copyright (c) 2025 Huawei Technologies Co., Ltd. All rights reserved.

# coding=utf-8

# Copyright (c) 2025 Huawei Technologies Co., Ltd. All rights reserved.

#

# Licensed under the Apache License, Version 2.0 (the "License");

# you may not use this file except in compliance with the License.

# You may obtain a copy of the License at

#

# http://www.apache.org/licenses/LICENSE-2.0

#

# Unless required by applicable law or agreed to in writing, software

# distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the License for the specific language governing permissions and

# limitations under the License.

vasqu · 2025-11-18T16:12:27Z

src/transformers/models/openpangu_moe/configuration_openpangu_moe.py

+from ...configuration_utils import PreTrainedConfig
+
+class OpenPanguMoEConfig(PreTrainedConfig):
+


Docstrings! You can also use modular for this if you find models similar enough

vasqu · 2025-11-18T16:15:32Z

src/transformers/models/openpangu_moe/configuration_openpangu_moe.py

+        attention_kv_lora_dim=512,
+        attention_q_lora_dim=1536,
+        attention_qk_rope_dim=64,
+        attention_v_dim=128,
+        attention_qk_dim=128,


MLA from deepseek v3? Let's align names with things that exist elsewhere

vasqu · 2025-11-18T16:45:23Z