Skip to content

Conversation

@tianyu-l
Copy link
Contributor

@tianyu-l tianyu-l commented Mar 2, 2024

Stack from ghstack (oldest at bottom):

In 2D case (FSDP + SP), loss metric should be computed by doing all-reduce only on the DP submesh. Previously it was doing all-reduce on the world mesh; this PR fixes it.

tianyu-l added a commit that referenced this pull request Mar 2, 2024
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 2, 2024
tianyu-l added a commit that referenced this pull request Mar 2, 2024
@tianyu-l tianyu-l requested a review from gnadathur March 2, 2024 01:29
@tianyu-l tianyu-l merged commit b05fad1 into gh/tianyu-l/2/base Mar 2, 2024
tianyu-l added a commit that referenced this pull request Mar 2, 2024
@tianyu-l tianyu-l deleted the gh/tianyu-l/2/head branch March 2, 2024 01:32
dp_degree = world_mesh.size(0)
dp_rank = world_mesh.get_local_rank(0)
dp_mesh = world_mesh["dp"]
dp_degree = dp_mesh.size()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also a config called data_parallel_degree. Should we use that ?

lessw2020 pushed a commit that referenced this pull request Apr 18, 2024
philippguevorguian pushed a commit to YerevaNN/YNNtitan that referenced this pull request Aug 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants