-
Notifications
You must be signed in to change notification settings - Fork 601
Closed
Description
Hi! In your code you mention https://arxiv.org/abs/2204.02311 (PaLM) paper for MFU computation which use total number of parameters with embedding
Line of code:
Line 225 in 7afe902
| get_num_params(whole_model, exclude_embedding=True), |
You instead compute based on number of parameters with embedding excluded (older OpenAI style). This is wrong and leads to larger than usual numbers.
Metadata
Metadata
Assignees
Labels
No labels