-
Notifications
You must be signed in to change notification settings - Fork 248
README: Add a model customization guide #962
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/962
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 2 Unrelated FailuresAs of commit a340f0c with merge base 900b6d4 ( NEW FAILURE - The following job has failed:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
digantdesai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love it. For compile, it is not really a model customization, if we can, to be more complete, extend that category of this doc to encompass that to (et | aoti | eager | compile) it might be more comprehensive / useful.
Also It would be great if you have a "support matrix" view of these four knobs for a flagship model for torchchat which would be llama3-8b I suppose.
docs/model_customization.md
Outdated
| @@ -0,0 +1,60 @@ | |||
| # Model Customization | |||
|
|
|||
| By default, torchchat (and PyTorch) default to unquantized [eager execution](https://pytorch.org/blog/optimizing-production-pytorch-performance-with-graph-transformations/). | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
helpful to be specific about unquantized d-type i.e. fp32 or fp16 or bf16?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the checkpoint dtype, so it'll vary
|
|
||
| This page goes over the different options torchchat provides for customizing the model execution for inference. | ||
| - Device | ||
| - Compilation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't fit very well IMHO in this otherwise awesome high level, almost orthogonal optimization knob categorization. It could be just me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, it's not an optimization knob, it's a model customization knob
Luckily Quant/optimization gets it's own page for that
This adds an initial index for model customization options accessible from the README.
The introduced a "model_customization.md" that will be iterated and expanded upon over time
Note that some of the content is extracted/inspired by quantization.md and the ADVANCED_USERS docs, which are marked as outdated/unstable.
README's
https://github.com/pytorch/torchchat/blob/readme-model-customization-guide/README.md
https://github.com/pytorch/torchchat/blob/readme-model-customization-guide/docs/model_customization.md