-
Notifications
You must be signed in to change notification settings - Fork 31.2k
Fix bnb for the weights refactor #42043
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@bot /style |
|
Style bot fixed some files and pushed the changes. |
ArthurZucker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.Just tie twice is my nightmare but good otherwise
| if hf_quantizer is not None and hf_quantizer.param_needs_quantization(model, t): | ||
| converter.quantization_operation = hf_quantizer.get_quantize_ops() | ||
| _dtype = dtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
|
run-slow: bnb |
|
This comment contains models: [] |
I've upstreamed some code from accelerate to fix tied weights. This should be easier this way and we can better tweak device map related code in the future. |
|
@bot /style |
|
Style fix is beginning .... View the workflow run here. |
|
@bot /style |
|
Style bot fixed some files and pushed the changes. |
CI Results✅ No failing test specific to this PR 🎉 ! |
|
run-slow: bnb |
|
This comment contains models: [] |
CI Results✅ No failing test specific to this PR 🎉 ! |
| if tied_parameters is None and len(model.all_tied_weights_keys) > 0: | ||
| # create a list of list of tied params | ||
| tied_parameters = [list(t) for t in model.all_tied_weights_keys.items()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed this
| def infer_auto_device_map( | ||
| model: nn.Module, | ||
| max_memory: Optional[dict[Union[int, str], Union[int, str]]] = None, | ||
| no_split_module_classes: Optional[list[str]] = None, | ||
| verbose: bool = False, | ||
| clean_result: bool = True, | ||
| offload_buffers: bool = False, | ||
| tied_parameters: Optional[list[list[str]]] = None, | ||
| hf_quantizer: "HfQuantizer | None" = None, | ||
| ): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed dtype and special_dtype to rely on hf_quantizer instead when computing compute_module_sizes
|
[For maintainers] Suggested jobs to run (before merge) run-slow: bnb, finegrained_fp8 |
What does this PR do?
This PR fixes bnb support (8bit + 4bit) in the new weight loading logic.
Testing