Fix asseration error ut #16

jemitche1 · 2025-09-29T18:35:10Z

Fixes #ISSUE_NUMBER

This unit test assumes the outputs will be different because CUDA doesn't implicitly synchronize collectives.
On XPU, XCCL backend does implicitly synchronize during certain tensor ops so the mismatch doesn't happen.

As a sanity check , I tested the output :

[RANK 0] out_ref sum: {4096000000.0}
[RANK 1] out_ref sum: {4096000000.0}
[RANK 0] out_compiled sum: {4096000000.0}
[RANK 1] out_compiled sum: {4096000000.0}
[RANK 0] Equal? {True}

To fix, check for xpu available to skip assertion

sdp added 4 commits September 19, 2025 12:35

Fix ProcessGroupXCCL::gather error

64864a1

fix assertion error

3546223

fix assertion error

13d98ba

fix assertion error

0a46986

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix asseration error ut #16

Fix asseration error ut #16

Uh oh!

jemitche1 commented Sep 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix asseration error ut #16

Are you sure you want to change the base?

Fix asseration error ut #16

Uh oh!

Conversation

jemitche1 commented Sep 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants