Skip to content

Conversation

@Pzzzzz5142
Copy link
Contributor

@Pzzzzz5142 Pzzzzz5142 commented Jun 4, 2024

For nmt model, if it uses pre-norm architecture, then it will have final layernorm. (Ref: fairseq encoder init, forward and fairseq decoder init, forward).

We don't need to modify the rest of the script since fairseq will use the original final_layer_norm before ffn despite they call it final_layer_norm. (Ref: https://github.com/facebookresearch/fairseq/blob/main/fairseq/modules/transformer_layer.py#L212)

@nv-guomingz
Copy link
Collaborator

Hi @Pzzzzz5142 thanks for you contributing, we'll evaluate your changes internal firstly.

@nv-guomingz nv-guomingz added triaged Issue has been triaged by maintainers Investigating labels Jun 5, 2024
@nv-guomingz
Copy link
Collaborator

hi @Pzzzzz5142 , thanks for your contribution, this PR has been merged and will upstream to main branch next week,

@nv-guomingz nv-guomingz closed this Jun 6, 2024
@Pzzzzz5142 Pzzzzz5142 deleted the dev-pzzzzz-fix-pre-norm branch June 6, 2024 11:52
@kaiyux kaiyux mentioned this pull request Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Investigating Merged triaged Issue has been triaged by maintainers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants