Skip to content

Commit e736fd2

Browse files
lioblexierule
authored andcommitted
Fix Checkpoint issue when using Horovod distributed backend (PyTorchLightning#6947) (#6958)
Co-Authored-By: Adrian Wälchli <[email protected]> Co-authored-by: Adrian Wälchli <[email protected]> (cherry picked from commit b37b58a)
1 parent 455ed43 commit e736fd2

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

pytorch_lightning/plugins/training_type/horovod.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -136,7 +136,9 @@ def reduce(self, output, group: Optional[Any] = None, reduce_op: Optional[Union[
136136
"Unset `group`."
137137
)
138138

139-
if reduce_op is None or reduce_op == "sum":
139+
if reduce_op in (None, "avg", "mean"):
140+
reduce_op = hvd.Average
141+
elif reduce_op in ("sum", ReduceOp.SUM):
140142
reduce_op = hvd.Sum
141143
elif isinstance(reduce_op, str) and reduce_op in ("avg", "mean"):
142144
reduce_op = hvd.Average

0 commit comments

Comments
 (0)