Skip to content

Conversation

@jemitche1
Copy link
Collaborator

Fixes #ISSUE_NUMBER

This unit test assumes the outputs will be different because CUDA doesn't implicitly synchronize collectives.
On XPU, XCCL backend does implicitly synchronize during certain tensor ops so the mismatch doesn't happen.

As a sanity check , I tested the output :

[RANK 0] out_ref sum: {4096000000.0}
[RANK 1] out_ref sum: {4096000000.0}
[RANK 0] out_compiled sum: {4096000000.0}
[RANK 1] out_compiled sum: {4096000000.0}
[RANK 0] Equal? {True}

To fix, check for xpu available to skip assertion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants