-
Couldn't load subscription status.
- Fork 121
Fix for #2116 When all modules of a layer excluded there is an error in forward replay #2117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
The error is a state bug in logbar so I need to fix this and unrelated to gpt-qmodel. I will check and merge this PR soon! |
|
@avtc There is an issue where For example, if we have a normal moe layer:
So layers subset (1) and optionally subset (2) are forced to execute serially when they can still execute under faster data-paralleized. Can you do a debug and print out the subsets and also the modules list when you now do the is_moe check? So with
|
|
The curent |
|
I will partially revert and test. Do not see how convert to draft, please assume this is not finalized |
|
I have removed the extra logic from before the loop, and only fixed the issue with undefined variable. P.S. Sometimes I have encountered the error even with 4 GPUs: Maybe it is related with current fix... |
|
This PR can be considered as final |
@Qubitium please review the fix for #2116