-
Notifications
You must be signed in to change notification settings - Fork 31.2k
Add openpangu_moe model #42229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add openpangu_moe model #42229
Conversation
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, openpangu_moe |
vasqu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial review:
- I think we can use modular to a higher extent - moe, attn (deepseek), rope (llama), ...
- Would be nice if we could adopt this with more modular
- Atm, we have a lot of old things used like
_prepare_4d_causal_attention_maskor legacy caches which we should avoid
^ those are the biggest points IMO. Will this also be released on https://huggingface.co/ ?
| The OpenPanguMoE model was proposed in [<INSERT PAPER NAME HERE>](<INSERT PAPER LINK HERE>) by <INSERT AUTHORS HERE>. | ||
| <INSERT SHORT SUMMARY HERE> | ||
|
|
||
| The abstract from the paper is the following: | ||
|
|
||
| <INSERT PAPER ABSTRACT HERE> | ||
|
|
||
| Tips: | ||
|
|
||
| <INSERT TIPS ABOUT MODEL HERE> | ||
|
|
||
| This model was contributed by [INSERT YOUR HF USERNAME HERE](https://huggingface.co/<INSERT YOUR HF USERNAME HERE>). | ||
| The original code can be found [here](<INSERT LINK TO GITHUB REPO HERE>). | ||
|
|
||
| ## Usage examples | ||
|
|
||
| <INSERT SOME NICE EXAMPLES HERE> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget the docs :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No associated tokenizer? Would go to tokenization_auto
| # coding=utf-8 | ||
| # Copyright (c) 2025 Huawei Technologies Co., Ltd. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the full licence, elsewhere too
| # coding=utf-8 | |
| # Copyright (c) 2025 Huawei Technologies Co., Ltd. All rights reserved. | |
| # coding=utf-8 | |
| # Copyright (c) 2025 Huawei Technologies Co., Ltd. All rights reserved. | |
| # | |
| # Licensed under the Apache License, Version 2.0 (the "License"); | |
| # you may not use this file except in compliance with the License. | |
| # You may obtain a copy of the License at | |
| # | |
| # http://www.apache.org/licenses/LICENSE-2.0 | |
| # | |
| # Unless required by applicable law or agreed to in writing, software | |
| # distributed under the License is distributed on an "AS IS" BASIS, | |
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |
| # See the License for the specific language governing permissions and | |
| # limitations under the License. |
| from ...configuration_utils import PreTrainedConfig | ||
|
|
||
| class OpenPanguMoEConfig(PreTrainedConfig): | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstrings! You can also use modular for this if you find models similar enough
| attention_kv_lora_dim=512, | ||
| attention_q_lora_dim=1536, | ||
| attention_qk_rope_dim=64, | ||
| attention_v_dim=128, | ||
| attention_qk_dim=128, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MLA from deepseek v3? Let's align names with things that exist elsewhere
| hidden_states = self.post_mlp_layernorm(hidden_states) | ||
| hidden_states = residual + hidden_states | ||
|
|
||
| return (hidden_states, present_key_value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return (hidden_states, present_key_value) | |
| return hidden_states |
Cache is modified in place so passing it is not necessary
| attn_output = attn_output.transpose(1, 2).contiguous().view(bsz, q_len, -1) | ||
| attn_output = self.o_proj(attn_output) | ||
|
|
||
| return attn_output, past_key_value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return attn_output, past_key_value | |
| return attn_output, attn_weights |
Same as the comment before + we have a feature for outputting the weights
| if use_cache and use_legacy_cache: | ||
| present_key_value = present_key_value.to_legacy_cache() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No more legacy cache
| if module.padding_idx is not None: | ||
| module.weight.data[module.padding_idx].zero_() | ||
|
|
||
| class OpenPanguMoEModel(OpenPanguMoEPreTrainedModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fairly standard, so we should be able to use other models here to inherit from, e.g. llama
| attention_mask = _prepare_4d_causal_attention_mask( | ||
| attention_mask, | ||
| (batch_size, seq_length), | ||
| hidden_states, | ||
| past_key_values_length, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a new API for this, create_causal_mask
What does this PR do?
This PR introduces inference support for the openpangu_moe model. By integrating the inference implementation directly into the Transformers library, we aim to ensure seamless compatibility and enable continuous updates alongside future open-source releases.
I welcome any feedback on potential improvements or additional changes that would enhance this implementation. Please feel free to share your suggestions!
Our open-source model is available at: https://ai.gitcode.com/ascend-tribe/openPangu-Ultra-MoE-718B-V1.1
Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.