change reduce_op to ReduceOp (#1048)

Tabrizian · holly1238 · web-flow · commit 86ee4d0d1310 · 2021-04-12T14:25:50.000-07:00
Co-authored-by: holly1238 &lt;77758406+holly1238@users.noreply.github.com&gt;
diff --git a/intermediate_source/dist_tuto.rst b/intermediate_source/dist_tuto.rst
@@ -210,19 +210,19 @@ to obtain the sum of all tensors at all processes, we can use the
         """ Simple point-to-point communication. """
         group = dist.new_group([0, 1])
         tensor = torch.ones(1)
-        dist.all_reduce(tensor, op=dist.reduce_op.SUM, group=group)
+        dist.all_reduce(tensor, op=dist.ReduceOp.SUM, group=group)
         print('Rank ', rank, ' has data ', tensor[0])
 
 Since we want the sum of all tensors in the group, we use
-``dist.reduce_op.SUM`` as the reduce operator. Generally speaking, any
+``dist.ReduceOp.SUM`` as the reduce operator. Generally speaking, any
 commutative mathematical operation can be used as an operator.
 Out-of-the-box, PyTorch comes with 4 such operators, all working at the
 element-wise level:
 
--  ``dist.reduce_op.SUM``,
--  ``dist.reduce_op.PRODUCT``,
--  ``dist.reduce_op.MAX``,
--  ``dist.reduce_op.MIN``.
+-  ``dist.ReduceOp.SUM``,
+-  ``dist.ReduceOp.PRODUCT``,
+-  ``dist.ReduceOp.MAX``,
+-  ``dist.ReduceOp.MIN``.
 
 In addition to ``dist.all_reduce(tensor, op, group)``, there are a total
 of 6 collectives currently implemented in PyTorch.
@@ -376,7 +376,7 @@ world.
     def average_gradients(model):
         size = float(dist.get_world_size())
         for param in model.parameters():
-            dist.all_reduce(param.grad.data, op=dist.reduce_op.SUM)
+            dist.all_reduce(param.grad.data, op=dist.ReduceOp.SUM)
             param.grad.data /= size
 
 *Et voilà*! We successfully implemented distributed synchronous SGD and