PaddlePaddle
diff --git a/‎docs/api/paddle/distributed/launch_cn.rst‎
Lines changed: 27 additions & 27 deletions b/‎docs/api/paddle/distributed/launch_cn.rst‎
Lines changed: 27 additions & 27 deletions
@@ -7,6 +7,8 @@ launch
 
 使用 ``python -m paddle.distributed.launch`` 方法启动分布式训练任务。
 
+Launch 模块是在每个节点运行，负责分布式协同和本地进程管理的模块。使用 launch 启动分布式训练可以简化参数配置，进行稳定可靠的分布式组网训练，同时使用优化的调试和日志收集功能。另外一些高级的分布式功能如容错和弹性都依赖 launch 启动。
+
 使用方法
 :::::::::
 .. code-block:: bash
@@ -29,7 +31,7 @@ launch
 
     - ``--rank``: 节点序号, 可以通过主节点进行分配。默认值 ``--rank=-1``.
 
-    - ``--log_level``: 日志级别, 可选值为 CRITICAL/ERROR/WARNING/INFO/DEBUG/NOTSET, 不区分大小写. 0 号节点的日志默认不输出到标准输出，需要开启输出请使用 debug 模式。默认值 ``--log_level=INFO``.
+    - ``--log_level``: 日志级别, 可选值为 CRITICAL/ERROR/WARNING/INFO/DEBUG/NOTSET, 不区分大小写。默认值 ``--log_level=INFO``.
 
     - ``--nnodes``: 节点数量，支持区间设定以开启弹性模式，比如 ``--nnodes=2:3``. 默认值 ``--nnodes=1``.
 
@@ -91,84 +93,82 @@ Elastic 参数
 .. code-block:: bash
     :name: code-block-example-bash0
 
-    # For training on multi node, run the following command in one of the nodes
+    # 在其中一个节点上运行如下命令以启动 2 机任务
 
     python -m paddle.distributed.launch --nnodes 2 train.py
 
-    # Then the following info will be print
+    # 这时，日志会打印如下信息，
 
     # Copy the following command to other nodes to run.
     # --------------------------------------------------------------------------------
     # python -m paddle.distributed.launch --master 10.0.0.1:38714 --nnodes 2 train.py
     # --------------------------------------------------------------------------------
 
-    # Follow the instruction above and paste the command in other nodes can launch a multi nodes training job.
+    # 按照提示，复制命令在另外的节点上运行命令即可启动分布式训练。
 
-    # There are two ways to launch a job with the same command for multi nodes training
-    # 1) using the following command in every nodes, make sure the ip is one of the training node and the port is available on that node
+    # 要想在每个节点上运行同样的命令启动分布式训练有如下两种方法：
+    # 1) 使用预配置的 master 信息，其中 master 的 ip 为其中一个训练节点，端口为可用端口
     # python -m paddle.distributed.launch --master 10.0.0.1:38714 --nnodes 2 train.py
-    # 2) using the following command in every nodes with a independent etcd service
+    # 2) 使用额外部署的 etcd 服务作为 master
     # python -m paddle.distributed.launch --master etcd://10.0.0.1:2379 --nnodes 2 train.py
 
-    # This functionality works will for both collective and ps mode and even with other arguments.
+    # 以上功能介绍可用配合别的参数使用。
 
 
 代码示例一 (collective, 单机)
 :::::::::
 .. code-block:: bash
     :name: code-block-example-bash1
 
-    # For training on single node using 4 gpus.
+    # 启动单机4卡任务
 
-    python -m paddle.distributed.launch --gpus=0,1,2,3 train.py --lr=0.01
+    python -m paddle.distributed.launch --devices=0,1,2,3 train.py --lr=0.01
 
 代码示例二 (collective, 多机)
 :::::::::
 .. code-block:: bash
     :name: code-block-example-bash2
     
-    # The parameters of --gpus and --ips must be consistent in each node.
-
-    # For training on multiple nodes, e.g., 192.168.0.16, 192.168.0.17 
+    # 启动两机任务，其中机器 ip 为 192.168.0.16, 192.168.0.17 
 
     # On 192.168.0.16:
 
-    python -m paddle.distributed.launch --gpus=0,1,2,3 --ips=192.168.0.16,192.168.0.17 train.py --lr=0.01
+    python -m paddle.distributed.launch --devices=0,1,2,3 --master=192.168.0.16:8090 --nnodes=2 train.py --lr=0.01
 
     # On 192.168.0.17:
     
-    python -m paddle.distributed.launch --gpus=0,1,2,3 --ips=192.168.0.16,192.168.0.17 train.py --lr=0.01
+    python -m paddle.distributed.launch --devices=0,1,2,3 --master=192.168.0.16:8090 --nnodes=2 train.py --lr=0.01
 
 代码示例三 (ps, cpu, 单机)
 :::::::::
 .. code-block:: bash
     :name: code-block-example-bash3
 
-    # To simulate distributed environment using single node, e.g., 2 servers and 4 workers.
+    # 在单机上启动多个 server 和 trainer
     
-    python -m paddle.distributed.launch --server_num=2 --worker_num=4 train.py --lr=0.01
+    python -m paddle.distributed.launch --server_num=2 --trainer_num=4 train.py --lr=0.01
 
 代码示例四 (ps, cpu, 多机)
 :::::::::
 .. code-block:: bash
     :name: code-block-example-bash4
 
-    # For training on multiple nodes, e.g., 192.168.0.16, 192.168.0.17 where each node with 1 server and 2 workers.
+    # 在多机上启动, 例如在 192.168.0.16, 192.168.0.17 分别启动1个 server 和2个 trainer
 
     # On 192.168.0.16:
 
-    python -m paddle.distributed.launch --servers="192.168.0.16:6170,192.168.0.17:6170" --workers="192.168.0.16:6171,192.168.0.16:6172,192.168.0.17:6171,192.168.0.17:6172" train.py --lr=0.01
+    python -m paddle.distributed.launch --master=192.168.0.16:8090 --nnodes=2 --server_num=1 --trainer_num=2 train.py --lr=0.01
 
     # On 192.168.0.17:
 
-    python -m paddle.distributed.launch --servers="192.168.0.16:6170,192.168.0.17:6170" --workers="192.168.0.16:6171,192.168.0.16:6172,192.168.0.17:6171,192.168.0.17:6172" train.py --lr=0.01
+    python -m paddle.distributed.launch --master=192.168.0.16:8090 --nnodes=2 --server_num=1 --trainer_num=2 train.py --lr=0.01
 
 代码示例五 (ps, gpu, 单机)
 :::::::::
 .. code-block:: bash
     :name: code-block-example-bash5
 
-    # To simulate distributed environment using single node, e.g., 2 servers and 4 workers, each worker use single gpu.
+    # 当启动 gpu ps 时，需要指定使用的 gpu，
 
     export CUDA_VISIBLE_DEVICES=0,1,2,3
     python -m paddle.distributed.launch --server_num=2 --worker_num=4 train.py --lr=0.01
@@ -178,7 +178,7 @@ Elastic 参数
 .. code-block:: bash
     :name: code-block-example-bash6
 
-    # For training on multiple nodes, e.g., 192.168.0.16, 192.168.0.17 where each node with 1 server and 2 workers.
+    # 使用如下命令启动多机 gpu ps
 
     # On 192.168.0.16:
 
@@ -195,7 +195,7 @@ Elastic 参数
 .. code-block:: bash
     :name: code-block-example-bash7
 
-    # To simulate distributed environment using single node, e.g., 2 servers and 4 workers, two workers use gpu, two workers use cpu.
+    # 使用如下命令启动单机 heter ps
 
     export CUDA_VISIBLE_DEVICES=0,1
     python -m paddle.distributed.launch --server_num=2 --worker_num=2 --heter_worker_num=2 train.py --lr=0.01
@@ -205,7 +205,7 @@ Elastic 参数
 .. code-block:: bash
     :name: code-block-example-bash8
 
-    # For training on multiple nodes, e.g., 192.168.0.16, 192.168.0.17 where each node with 1 server, 1 gpu worker, 1 cpu worker.
+    # 使用如下命令启动多机 heter ps
     
     # On 192.168.0.16:
 
@@ -222,8 +222,8 @@ Elastic 参数
 .. code-block:: bash
     :name: code-block-example-bash9
 
-    # With the following command, the job will begin to run immediately if 4 nodes are ready,
-    # or it will run after elastic_timeout if only 2 or 3 nodes ready
+    # 使用如下命令启动弹性训练
+    # 当 4 个节点 ready 时，训练立即开始，当只有 2 或 3 个节点 ready 时，将等待超时然后开始训练
     python -m paddle.distributed.launch --master etcd://10.0.0.1:2379 --nnodes 2:4 train.py
     
-    # once the number of nodes changes between 2:4 during training, the strategy holds
+    # 在训练过程中如果节点发生变化，上述逻辑不变。