Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
The
device_mapcomputation is currently broken, after #42043 switched to using an integration of the accelerate functions to simplify them and use the already availableall_tied_weights_keysinstead of computing them again and again. But it does not compute the double list oftied_parameterscorrectly, in the format that theacceleratefunctions are used to, i.e. groups of all similar parameters.Also, now that we don't tie weights before device_map computation, we need to exclude them explicitly from the loop in
compute_model_sizes, as they will be present in the iterated params.This PR fixes it. This is needed in #42242, which is where I noticed it.
Also fixes a more long-standing issue, where
compute_model_sizeswas not taking non-persistent buffers into account, but they actually take some space!