PaddlePaddle · fuyinno4 · Jul 26, 2022 · Jul 26, 2022
diff --git a/.../performance_improving/multinode_training_improving/cpu_train_best_practice.rst b/.../performance_improving/multinode_training_improving/cpu_train_best_practice.rst
@@ -90,8 +90,8 @@ CPU分布式训练速度进一步提高的核心在于选择合适的分布式
     import paddle.fluid.incubate.fleet.base.role_maker as role_maker
     from paddle.fluid.incubate.fleet.parameter_server.distribute_transpiler.distributed_strategy_factory import DistributedStrategyFactory
 
-然后指定CPU分布式运行的训练策略，目前可选配置有四种：同步训练（Sync）、异步训练（Async）、半异步训练（Half-Async）以及GEO训练。不同策略的细节，可以查看设计文档：
-https://github.com/PaddlePaddle/Fleet/blob/develop/markdown_doc/transpiler/transpiler_cpu.md
+然后指定CPU分布式运行的训练策略，目前可选配置有四种：同步训练（Sync）、异步训练（Async）、半异步训练（Half-Async）以及GEO训练。
+
 
 通过如下代码引入上述策略的默认配置，并进行CPU分布式训练：
 

diff --git a/...rformance_improving/multinode_training_improving/cpu_train_best_practice_en.rst b/...rformance_improving/multinode_training_improving/cpu_train_best_practice_en.rst
@@ -94,8 +94,8 @@ First, we need to introduce relevant libraries into the code:
     import paddle.fluid.incubate.fleet.base.role_maker as role_maker
     from paddle.fluid.incubate.fleet.parameter_server.distribute_transpiler.distributed_strategy_factory import DistributedStrategyFactory
 
-At present, there are four kinds of training strategies: synchronous training, asynchronous, half asynchronous training and GEO training. For details of different strategies, you can view the design documents:
-https://github.com/PaddlePaddle/Fleet/blob/develop/markdown_doc/transpiler/transpiler_cpu.md
+At present, there are four kinds of training strategies: synchronous training, asynchronous, half asynchronous training and GEO training.
+
 
 The default configuration of the above policy is introduced by the following code:
 

diff --git a/..._guide/performance_improving/multinode_training_improving/dist_training_gpu.rst b/..._guide/performance_improving/multinode_training_improving/dist_training_gpu.rst
@@ -39,7 +39,7 @@ PaddlePaddle Fluid支持在现代GPU [#]_ 服务器集群上完成高性能分
           data_loader.reset()
 
 
-另外，可以使用DALI库提升数据处理性能。DALI是NVIDIA开发的数据加载库，更多内容请参考 `官网文档 <https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html>`_ 。飞桨中如何结合使用DALI库请参考 `使用示例 <https://github.com/PaddlePaddle/Fleet/tree/develop/benchmark/collective/resnet>`_ 。
+另外，可以使用DALI库提升数据处理性能。DALI是NVIDIA开发的数据加载库，更多内容请参考 `官网文档 <https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html>`_ 。飞桨中如何结合使用DALI库请参考 `使用示例 <https://github.com/PaddlePaddle/FleetX/tree/old_develop/deprecated/benchmark/collective/resnet>`_ 。
 
 2、训练策略设置
 ===========
@@ -115,7 +115,7 @@ GPU多机多卡同步训练过程中存在慢trainer现象，即每步中训练
 - Local SGD的warmup步长 :code:`local_sgd_is_warm_steps` 影响最终模型的泛化能力，一般需要等到模型参数稳定之后在进行Local SGD训练，经验值可以将学习率第一次下降时的epoch作为warmup步长，之后再进行Local SGD训练。
 - Local SGD步长 :code:`local_sgd_steps` ，一般该值越大，通信次数越少，训练速度越快，但随之而来的时模型精度下降。经验值设置为2或者4。
 
-具体的Local SGD的训练代码可以参考：https://github.com/PaddlePaddle/Fleet/tree/develop/examples/local_sgd/resnet
+具体的Local SGD的训练代码可以参考：https://github.com/PaddlePaddle/FleetX/tree/old_develop/deprecated/examples/local_sgd/resnet
 
 
 2、使用混合精度训练

diff --git a/...e_improving/multinode_training_improving/gpu_training_with_low_bandwidth_dgc.md b/...e_improving/multinode_training_improving/gpu_training_with_low_bandwidth_dgc.md
@@ -14,7 +14,7 @@ optimizer = fluid.optimizer.DGCMomentumOptimizer(
     learning_rate=0.001, momentum=0.9, rampup_begin_step=0)
 optimizer.minimize(cost)
 ```
-在fleet中我们提供了[DGC的示例](https://github.com/PaddlePaddle/Fleet/tree/develop/examples/dgc_example)。示例中以数字手写体识别为例，将程序移植为分布式版本（注：DGC亦支持单机多卡），再加上DGC优化器。可参照此示例将单机单卡程序迁移到DGC。在单机单卡迁移到DGC过程中，一般需要先对齐多机Momentum的精度，再对齐DGC的精度。
+在fleet中我们提供了[DGC的示例](https://github.com/PaddlePaddle/FleetX/tree/old_develop/deprecated/examples/dgc_example)。示例中以数字手写体识别为例，将程序移植为分布式版本（注：DGC亦支持单机多卡），再加上DGC优化器。可参照此示例将单机单卡程序迁移到DGC。在单机单卡迁移到DGC过程中，一般需要先对齐多机Momentum的精度，再对齐DGC的精度。
 
 ## 3. 调参&适用场景
 ### 3.1 预热调参

diff --git a/...formance_improving/multinode_training_improving/gpu_training_with_recompute.rst b/...formance_improving/multinode_training_improving/gpu_training_with_recompute.rst
@@ -101,7 +101,7 @@ Recompute原则上适用于所有Optimizer。
 
 **2. 在Fleet API中使用Recompute**
 
-`Fleet API <https://github.com/PaddlePaddle/Fleet>`_ 
+`Fleet API <https://github.com/PaddlePaddle/FleetX>`_ 
 是基于Fluid的分布式计算高层API。在Fleet API中添加RecomputeOptimizer
 仅需要2步：
 
@@ -121,7 +121,7 @@ Recompute原则上适用于所有Optimizer。
 为了帮助您快速地用Fleet API使用Recompute任务，我们提供了一些例子，
 并且给出了这些例子的计算速度、效果和显存节省情况：
 
-- 用Recompute做Bert Fine-tuning:  `source <https://github.com/PaddlePaddle/Fleet/tree/develop/examples/recompute/bert>`_
+- 用Recompute做Bert Fine-tuning:  `source <https://github.com/PaddlePaddle/FleetX/tree/old_develop/deprecated/examples/recompute/bert>`_
 
 - 用Recompute做目标检测：开发中.
 
@@ -136,7 +136,7 @@ Q&A
 - **有没有更多Recompute的官方例子？**
 
   更多Recompute的例子将更新在 `examples <https://github.com/PaddlePaddle/examples/tree/master/community_examples/recompute>`_ 
-  和 `Fleet <https://github.com/PaddlePaddle/Fleet>`_ 库下，欢迎关注。
+  和 `Fleet <https://github.com/PaddlePaddle/FleetX>`_ 库下，欢迎关注。
 
 - **有没有添加checkpoints的建议？**
 

diff --git a/...mance_improving/multinode_training_improving/gpu_training_with_recompute_en.rst b/...mance_improving/multinode_training_improving/gpu_training_with_recompute_en.rst
@@ -132,7 +132,7 @@ In principle, recompute is for all kinds of optimizers in Paddle.
 
 **2. Using Recompute in Fleet API**
 
-`Fleet API <https://github.com/PaddlePaddle/Fleet>`_ 
+`Fleet API <https://github.com/PaddlePaddle/FleetX>`_ 
 is a high-level API for distributed training in Fluid. Adding
 RecomputeOptimizer to Fluid takes two steps:
 
@@ -154,7 +154,7 @@ We also post corresponding training speed,
 test results and memory usages of these examples for reference.
 
 
-- Fine-tuning Bert Large model with recomputing:  `source <https://github.com/PaddlePaddle/Fleet/tree/develop/examples/recompute/bert>`_
+- Fine-tuning Bert Large model with recomputing:  `source <https://github.com/PaddlePaddle/FleetX/tree/old_develop/deprecated/examples/recompute/bert>`_
 
 - Training object detection models with recomputing：developing.
 
@@ -171,7 +171,7 @@ first-computation and recomputation consistent.
 - **Are there more official examples of Recompute?**
 
   More examples will be updated at `examples <https://github.com/PaddlePaddle/examples/tree/master/community_examples/recompute>`_
-and `Fleet <https://github.com/PaddlePaddle/Fleet>`_ . Feel free to
+and `Fleet <https://github.com/PaddlePaddle/FleetX>`_ . Feel free to
 raise issues if you get any problem with these examples.
 
 - **How should I set checkpoints?**

diff --git a/docs/guides/06_distributed_training/cluster_quick_start_ps_cn.rst b/docs/guides/06_distributed_training/cluster_quick_start_ps_cn.rst
@@ -21,9 +21,9 @@
 
 本节将采用推荐领域非常经典的模型wide_and_deep为例，介绍如何使用飞桨分布式完成参数服务器训练任务。
 
-参数服务器训练基于飞桨静态图，为方便用户理解，我们准备了一个wide_and_deep模型的单机静态图示例：\ `单机静态图示例 <https://github.com/PaddlePaddle/FleetX/tree/develop/eval/rec/wide_and_deep_single_static>`_\。
+参数服务器训练基于飞桨静态图，为方便用户理解，我们准备了一个wide_and_deep模型的单机静态图示例：\ `单机静态图示例 <https://github.com/PaddlePaddle/FleetX/tree/old_develop/eval/rec/wide_and_deep_single_static>`_\。
 
-在单机静态图示例基础上，通过1.2章节的操作方法，可以将其修改为参数服务器训练示例，本次快速开始的完整示例代码参考：\ `参数服务器完整示例 <https://github.com/PaddlePaddle/FleetX/tree/develop/examples/wide_and_deep_dataset>`_\。
+在单机静态图示例基础上，通过1.2章节的操作方法，可以将其修改为参数服务器训练示例，本次快速开始的完整示例代码参考：\ `参数服务器完整示例 <https://github.com/PaddlePaddle/FleetX/tree/old_develop/examples/wide_and_deep_dataset>`_\。
 
 同时，我们在AIStudio上建立了一个参数服务器快速开始的项目：\ `参数服务器快速开始 <https://aistudio.baidu.com/aistudio/projectdetail/4189047?channelType=0&channel=0>`_\，用户可以跳转到AIStudio上直接运行参数服务器的训练代码。
 

diff --git a/docs/guides/06_distributed_training/data_parallel/amp_cn.rst b/docs/guides/06_distributed_training/data_parallel/amp_cn.rst
@@ -206,4 +206,4 @@
    使用AMP模式耗时:
    共计耗时 = 1.222 sec
 
-上述例子存放在：\ `example/amp/amp_dygraph.py <https://github.com/PaddlePaddle/FleetX/blob/develop/examples/amp/amp_dygraph.py>`_\ 。
+上述例子存放在：\ `example/amp/amp_dygraph.py <https://github.com/PaddlePaddle/FleetX/blob/old_develop/examples/amp/amp_dygraph.py>`_\ 。
diff --git a/docs/guides/06_distributed_training/data_parallel/recompute_cn.rst b/docs/guides/06_distributed_training/data_parallel/recompute_cn.rst
@@ -171,7 +171,7 @@ batch size = seq * seq_max_len
 
     python recompute_dygraph.py
 
-recompute动态图代码：`代码示例 <https://github.com/PaddlePaddle/FleetX/tree/develop/examples/recompute>`__。
+recompute动态图代码：`代码示例 <https://github.com/PaddlePaddle/FleetX/tree/old_develop/examples/recompute>`__。
 
 输出:
 

diff --git a/docs/guides/06_distributed_training/model_parallel_cn.rst b/docs/guides/06_distributed_training/model_parallel_cn.rst
@@ -309,7 +309,7 @@
       optimizer.clear_grad()
       print("loss", loss.numpy())
 
-模型并行的动态图代码：`example/model_parallelism/mp_dygraph.py <https://github.com/PaddlePaddle/FleetX/tree/develop/examples/model_parallelism>`_。
+模型并行的动态图代码：`example/model_parallelism/mp_dygraph.py <https://github.com/PaddlePaddle/FleetX/tree/old_develop/examples/model_parallelism>`_。
 
 
 运行方式（需要保证当前机器有两张gpu）：

diff --git a/docs/guides/06_distributed_training/pipeline_parallel_cn.rst b/docs/guides/06_distributed_training/pipeline_parallel_cn.rst
@@ -261,7 +261,7 @@ model.train_batch(...)：这一步主要就是执行1F1B的流水线并行方式
   export CUDA_VISIBLE_DEVICES=0,1
   python -m paddle.distributed.launch alexnet_dygraph_pipeline.py # alexnet_dygraph_pipeline.py是用户运行动态图流水线的python文件
 
-基于AlexNet的完整的流水线并行动态图代码：`alex <https://github.com/PaddlePaddle/FleetX/tree/develop/examples/pipeline>`_。
+基于AlexNet的完整的流水线并行动态图代码：`alex <https://github.com/PaddlePaddle/FleetX/tree/old_develop/examples/pipeline>`_。
 
 控制台输出信息如下：
-Original file line number
+Diff line change
@@ Expand Up / @@ -171,7 +171,7 @@ batch size = seq * seq_max_len @@
         python recompute_dygraph.py
-    recompute动态图代码：`代码示例 <https://github.com/PaddlePaddle/FleetX/tree/develop/examples/recompute>`__。
+    recompute动态图代码：`代码示例 <https://github.com/PaddlePaddle/FleetX/tree/old_develop/examples/recompute>`__。
     输出:
@@ Expand Down @@