Skip to content

Conversation

@williamberman
Copy link
Contributor

re: #1246

training/fine tuning shouldn't be done with fp16 weights1, fp16 inputs are ok with amp + gradient scaling. fp16 weights throw an error when used with amp + gradient scaling. We should check the dtype of the loaded model and throw an informative error before training begins.

We add a guard checking the datatype of the unet. We also add a guard checking the datatype of the text encoder if we are training the text encoder.

Footnotes

  1. precision issues when adding small gradient updates to fp16 weights. Reason why training with amp recommends to keep weights as fp32 for gradient updates and makes a copy in half precision for forward and backward passes.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jan 4, 2023

The documentation is not available anymore as the PR was closed or merged.

@williamberman
Copy link
Contributor Author

@patil-suraj I added docs to the relevant cli flags. Are there any other additional locations we should update docs around loaded model precision?

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

Copy link
Contributor

@patil-suraj patil-suraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@patil-suraj
Copy link
Contributor

@patil-suraj I added docs to the relevant cli flags. Are there any other additional locations we should update docs around loaded model precision?

No, I think this covers it :)

@patil-suraj patil-suraj merged commit 247b5fe into huggingface:main Jan 5, 2023
@franz101
Copy link

franz101 commented Jan 6, 2023

Am I missing something from the readme?
Getting the following errors.
UnfilteredStackTrace: TypeError: add_noise() takes 4 positional arguments but 5
or
TypeError: get_scheduler() got an unexpected keyword argument 'num_cycles'

@patil-suraj @patrickvonplaten

Or is it something related to the readme?

Will try out the solution of #1817

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants