Skip to content

Segmentation - Incorrect IoU being printed on master process #7915

@siddharth9820

Description

@siddharth9820

reduce_across_processes(self.mat)

The expected behavior of this line is to all-reduce self.mat in-place. However, the reduce_across_all_processes method creates a copy of the tensor (
t = torch.tensor(val, device="cuda")
) and runs the allreduce on that copy. Thus self.mat is not updated and the master process prints incorrect IoUs.

I guess this is a potential fix -

self.mat = reduce_across_processes(self.mat).to(torch.int64)

PR - #7916 attempts to fix this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions