From 307a66c39c2903e091fe2a6e06ee84fa9818c310 Mon Sep 17 00:00:00 2001
From: zy0531 <zhaoyu0531@126.com>
Date: Sat, 16 Mar 2019 12:23:19 +0800
Subject: [PATCH 1/7] add api_guides low_level backward parameter program_en

---
 doc/fluid/api_guides/index.rst                |   3 +
 doc/fluid/api_guides/index_en.rst             |   3 +
 .../api_guides/low_level/backward_en.rst      |  23 +++
 doc/fluid/api_guides/low_level/parameter.rst  |   2 +-
 .../api_guides/low_level/parameter_en.rst     | 175 ++++++++++++++++++
 doc/fluid/api_guides/low_level/program_en.rst |  78 ++++++++
 6 files changed, 283 insertions(+), 1 deletion(-)
 create mode 100644 doc/fluid/api_guides/low_level/backward_en.rst
 create mode 100644 doc/fluid/api_guides/low_level/parameter_en.rst
 create mode 100644 doc/fluid/api_guides/low_level/program_en.rst

diff --git a/doc/fluid/api_guides/index.rst b/doc/fluid/api_guides/index.rst
index 465a316cd5f..f8ce32af906 100755
--- a/doc/fluid/api_guides/index.rst
+++ b/doc/fluid/api_guides/index.rst
@@ -18,3 +18,6 @@ API使用指南分功能向您介绍PaddlePaddle Fluid的API体系和用法，
     low_level/memory_optimize.rst
     low_level/nets.rst
     low_level/parallel_executor.rst
+    low_level/backward.rst
+    low_level/parameter.rst
+    low_level/program.rst
diff --git a/doc/fluid/api_guides/index_en.rst b/doc/fluid/api_guides/index_en.rst
index 4e33a54a75a..7251206d491 100755
--- a/doc/fluid/api_guides/index_en.rst
+++ b/doc/fluid/api_guides/index_en.rst
@@ -18,3 +18,6 @@ This section introduces the Fluid API structure and usage, to help you quickly g
     low_level/memory_optimize_en.rst
     low_level/nets_en.rst
     low_level/parallel_executor_en.rst
+    low_level/backward_en.rst
+    low_level/parameter_en.rst
+    low_level/program_en.rst
diff --git a/doc/fluid/api_guides/low_level/backward_en.rst b/doc/fluid/api_guides/low_level/backward_en.rst
new file mode 100644
index 00000000000..25708c33473
--- /dev/null
+++ b/doc/fluid/api_guides/low_level/backward_en.rst
@@ -0,0 +1,23 @@
+.. _api_guide_backward_en:
+
+
+########
+Back Propagation
+########
+
+The ability of neural network to define model depends on optimization algorithm. Optimization is a process of calculating gradient continuously and adjusting learnable parameters. You can refer to  :ref:`api_guide_optimizer_en` to learn more about optimization algorithm in Fluid.
+
+In the training process of network, gradient calculation is divided into two steps: forward computing and `back propagation <https://en.wikipedia.org/wiki/Backpropagation>`_ .
+
+Forward computing transfers the state of the input unit to the output unit according to the network structure you build.
+
+Back propagation calculates the derivatives of two or more compound functions by means of `chain rule <https://en.wikipedia.org/wiki/Chain_rule>`_ . The gradient of output unit is propagated back to input unit. According to the calculated gradient, the learning parameters of the network are adjusted.
+
+
+You could refer to `back propagation algorithm <http://deeplearning.stanford.edu/wiki/index.php/%E5%8F%8D%E5%90%91%E4%BC%A0%E5%AF%BC%E7%AE%97%E6%B3%95>`_ for detialed implementation process.
+
+We do not recommend directly calling backpropagation-related APIs in  :code:`fluid` , as these are very low-level APIs. Consider using the relevant APIs in :ref:`api_guide_optimizer_en` instead. When you use optimizer APIs, Fluid automatically calculates the complex back-propagation for you.
+
+If you want to implement it yourself, you can also use: :code:`callback` in :ref:`api_fluid_backward_append_backward` to define the customized gradient form of Operator. 
+For more information, please refer to: :ref:`api_fluid_backward_append_backward`
+ 
\ No newline at end of file
diff --git a/doc/fluid/api_guides/low_level/parameter.rst b/doc/fluid/api_guides/low_level/parameter.rst
index e28215422ae..af7b63f1fd6 100644
--- a/doc/fluid/api_guides/low_level/parameter.rst
+++ b/doc/fluid/api_guides/low_level/parameter.rst
@@ -4,7 +4,7 @@
 模型参数
 #########
 
-模型参数为模型中的weight和bias统称，在fluid中对应fluid.Parameter类，继承自fluid.Variable，是一种可持久化的variable。模型的训练就是不断学习更新模型参数的过程。模型参数相关的属性可以通过 :ref:`cn_api_fluid_param_attr_ParamAttr` 来配置，可配置内容有：
+模型参数为模型中的weight和bias统称，在fluid中对应fluid.Parameter类，继承自fluid.Variable，是一种可持久化的variable。模型的训练就是不断学习更新模型参数的过程。模型参数相关的属性可以通过 :ref:`cn_api_fluid_ParamAttr` 来配置，可配置内容有：
 
 - 初始化方式
 - 正则化
diff --git a/doc/fluid/api_guides/low_level/parameter_en.rst b/doc/fluid/api_guides/low_level/parameter_en.rst
new file mode 100644
index 00000000000..f372c43fa87
--- /dev/null
+++ b/doc/fluid/api_guides/low_level/parameter_en.rst
@@ -0,0 +1,175 @@
+..  _api_guide_parameter_en:
+
+#########
+Model Parameters
+#########
+
+Model parameters are weights and bias in the model. In fluid, we use fluid. Parameter class which is inherited from fluid, a persistent variable, to define our model parameters. Model training is the process of learning and updating model parameters. The attributes related to model parameters can be configured by :ref:`api_fluid_ParamAttr` . The configurable contents are as follows:
+
+
+- Initialization method
+
+- Regularization
+
+- gradient shear
+
+- Model Average
+
+
+
+Initialization method
+=================
+
+Fluid initializes a single parameter by setting attributes of :code:`initializer` in :code:`ParamAttr` .
+
+examples：
+
+  .. code-block:: python
+
+      param_attrs = fluid.ParamAttr(name="fc_weight",
+                                initializer=fluid.initializer.ConstantInitializer(1.0))
+      y_predict = fluid.layers.fc(input=x, size=10, param_attr=param_attrs)
+
+
+
+The following is the initialization method supported by fluid:
+
+1. BilinearInitializer
+-----------------------
+
+Linear initialization. The deconvolution operation initialized by this method can be used as a linear interpolation operation.
+
+Alias：Bilinear
+
+API reference： :ref:`api_fluid_initializer_BilinearInitializer`
+
+2. ConstantInitializer
+----------------------
+
+Constant initialization. Initialize the parameter to the specified value.
+
+Alias：Constant
+
+API reference： :ref:`api_fluid_initializer_ConstantInitializer`
+
+3. MSRAInitializer
+------------------
+
+Please refer to https://arxiv.org/abs/1502.01852 for initialization.
+
+Alias：MSRA
+
+API reference： :ref:`api_fluid_initializer_MSRAInitializer`
+
+4. NormalInitializer
+---------------------
+
+Initialization method of random Gaussian distribution.
+
+Alias：Normal
+
+API reference： :ref:`api_fluid_initializer_NormalInitializer`
+
+5. TruncatedNormalInitializer
+-----------------------------
+
+Initialization method of stochastic truncated Gauss distribution.
+
+Alias：TruncatedNormal
+
+API reference： :ref:`api_fluid_initializer_TruncatedNormalInitializer`
+
+6. UniformInitializer
+--------------------
+
+Initialization method of random uniform distribution.
+
+Alias：Uniform
+
+API reference： :ref:`api_fluid_initializer_UniformInitializer`
+
+7. XavierInitializer
+--------------------
+
+Please refer to http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf for initialization.
+
+Alias：Xavier
+
+API reference： :ref:`api_fluid_initializer_XavierInitializer`
+
+Regularization
+=============
+
+Fluid regularizes a single parameter by setting attributes of :code:`regularizer` in :code:`ParamAttr` .
+
+  .. code-block:: python
+
+      param_attrs = fluid.ParamAttr(name="fc_weight",
+                                regularizer=fluid.regularizer.L1DecayRegularizer(0.1))
+      y_predict = fluid.layers.fc(input=x, size=10, param_attr=param_attrs)
+
+The following is the regularization approach supported by fluid:
+
+-  :ref:`api_fluid_regularizer_L1DecayRegularizer` (Alias：L1Decay)
+-  :ref:`api_fluid_regularizer_L2DecayRegularizer` (Alias：L2Decay)
+
+Clipping
+==========
+
+Fluid sets clipping method for a single parameter by setting attributes of :code:`gradient_clip` in :code:`ParamAttr` .
+
+  .. code-block:: python
+
+      param_attrs = fluid.ParamAttr(name="fc_weight",
+                                regularizer=fluid.regularizer.L1DecayRegularizer(0.1))
+      y_predict = fluid.layers.fc(input=x, size=10, param_attr=param_attrs)
+
+
+
+The following is the clipping method supported by fluid:
+
+1. ErrorClipByValue
+-------------------
+
+Used to clipping the value of a tensor to a specified range.
+
+API reference： :ref:`api_fluid_clip_ErrorClipByValue`
+
+2. GradientClipByGlobalNorm
+---------------------------
+
+Used to limit the global-norm of multiple Tensors to :code:`clip_norm`.
+
+API reference： :ref:`api_fluid_clip_GradientClipByGlobalNorm`
+
+3. GradientClipByNorm
+---------------------
+Limit the L2-norm of Tensor to :code:`max_norm` . If Tensor's L2-norm exceeds: :code:`max_norm` ,
+it will calculate a  :code:`scale` . And then all values of the Tensor multiply the :code:`scale` .
+
+API reference： :ref:`api_fluid_clip_GradientClipByNorm`
+
+4. GradientClipByValue
+----------------------
+
+Limit the value of gradient corresponding to a parameter to [min, max].
+
+API reference： :ref:`api_fluid_clip_GradientClipByValue`
+
+Model Averaging
+========
+
+Fluid determines whether to average a single parameter by setting attributes of :code:`do_model_average` in :code:`ParamAttr` .
+Examples:
+
+  .. code-block:: python
+
+      param_attrs = fluid.ParamAttr(name="fc_weight",
+                                do_model_average=true)
+      y_predict = fluid.layers.fc(input=x, size=10, param_attr=param_attrs)
+
+In the miniBatch training process, parameters will be updated once after each batch, and the average model averages the parameters generated by the latest K updates.
+
+The averaged parameters are only used for testing and prediction, and they do not get involved in the actual training process.
+
+API reference  :ref:`api_fluid_optimizer_ModelAverage` 
diff --git a/doc/fluid/api_guides/low_level/program_en.rst b/doc/fluid/api_guides/low_level/program_en.rst
new file mode 100644
index 00000000000..ee433da91ae
--- /dev/null
+++ b/doc/fluid/api_guides/low_level/program_en.rst
@@ -0,0 +1,78 @@
+.. _api_guide_Program_en:
+
+###############################
+Program/Block/Operator/Variable
+###############################
+
+==================
+Program
+==================
+
+:code:`Fluid` describes the user's neural network configuration in the form of abstract grammar tree similar to programming language, and the user's description of computation will be written into a Program. Program in Fluid replaces the concept of model in traditional framework. It can describe any complex model by supporting three execution structures: sequential execution, conditional selection and loop execution. Writing :code:`Program` is very close to writing a general program. If you have some programming experience, you will naturally transfer your knowledge.
+
+In brief：
+
+* A model is a Fluid :code:`Program`  and can contain more than one :code:`Program` ;
+
+* :code:`Program` consists of nested :code:`Block` , and the concept of :code:`Block` can be analogized to a pair of braces in C++ or Java, or an indentation block in Python.
+
+
+* Computing in :code:`Block` is composed of three ways: sequential execution, conditional selection or loop execution, which constitutes complex computational logic.
+
+
+* :code:`Block` contains descriptions of computation and computational objects. The description of computation is called Operator; the object of computation (or the input and output of Operator) is unified as Tensor. In Fluid, Tensor is represented by 0-leveled `LoD-Tensor <http://paddlepaddle.org/documentation/docs/zh/1.2/user_guides/howto/prepare_data/lod_tensor.html#permalink-4-lod-tensor>`_ .
+
+
+=========
+Block
+=========
+
+:code:`Block` is the concept of variable scope in advanced languages. In programming languages, Block is a pair of braces, which contains local variable definitions and a series of instructions or operators. Control flow structures :code:`if-else` and :code:`for` in programming languages can be equivalent to in deep learning:
+
++----------------------+-------------------------+
+| programming languages| Fluid                   |
++======================+=========================+
+| for, while loop      | RNN,WhileOP             |
++----------------------+-------------------------+
+| if-else, switch      | IfElseOp, SwitchOp      |
++----------------------+-------------------------+
+| execute sequentially | a series of layers      | 
++----------------------+-------------------------+
+
+As mentioned above,  :code:`Block` in Fluid describes a set of Operators that include sequential execution, conditional selection or loop execution, and the operating object of Operator: Tensor.
+
+
+
+=============
+Operator
+=============
+
+In Fluid, all operations of data are represented by :code:`Operator` . In Python, :code:`Operator` in Fluid is encapsulated into modules such as: :code:`paddle.fluid.layers` , :code:`paddle.fluid.nets` , and so on.
+
+This is because some common operations on Tensor may consist of more basic operations. In order to improve the convenience of use, some encapsulation of the basic Operator is carried out inside the framework, including the creation of Operator dependent on learnable parameters, the initialization details of learnable parameters, and so on, so as to reduce the cost of repeating development by users.
+
+
+
+More information can be read for reference. `Fluid Design Idea <../../advanced_usage/design_idea/fluid_design_idea.html>`_ 
+
+
+=========
+Variable
+=========
+
+In Fluid， :code:`Variable` can contain any type of value -- in most cases a LoD-Tensor.
+
+All the learnable parameters in the model are kept in the memory space in form of :code:`Variable` . In most cases, you do not need to create the learnable parameters in the network by yourself. Fluid provides encapsulation for almost common basic computing modules of the neural network. Taking the simplest full connection model as an example, calling :code:`fluid.layers.fc` directly creates two learnable parameters for the full connection layer, namely, connection weight (W) and bias, without explicitly calling :code:`variable` related interfaces to create learnable parameters.
+
+=========
+Related API
+=========
+
+
+* A single neural network for user configuration is called :ref:`api_fluid_Program` . It is noteworthy that when training neural networks, users often need to configure and operate multiple :code:`Program` . For example,  :code:`Program` for parameter initialization, :code:`Program` for training,  :code:`Program` for testing, etc.
+
+
+* Users can also use :ref:`api_fluid_program_guard` with :code:`with` to modify the configured :ref:`api_fluid_default_startup_program` and :ref:`api_fluid_default_main_program` .
+
+
+* In Fluid，Block internal execution order is determined by control flow，such as :ref:`api_fluid_layers_IfElse` , :ref:`api_fluid_layers_While` and :ref:`api_fluid_layers_Switch` . For more information, please reference to： :ref:`api_guide_control_flow_en` 

From c9f1c1f7341fb05c443bab951b682f20fcbc62d6 Mon Sep 17 00:00:00 2001
From: Hao Wang <31058429+haowang101779990@users.noreply.github.com>
Date: Sat, 16 Mar 2019 13:22:40 +0800
Subject: [PATCH 2/7] Apply suggestions from code review

Co-Authored-By: zy0531 <48094155+zy0531@users.noreply.github.com>
---
 doc/fluid/api_guides/low_level/backward_en.rst  | 4 ++--
 doc/fluid/api_guides/low_level/parameter_en.rst | 6 +++---
 doc/fluid/api_guides/low_level/program_en.rst   | 2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/doc/fluid/api_guides/low_level/backward_en.rst b/doc/fluid/api_guides/low_level/backward_en.rst
index 25708c33473..e839d8feac3 100644
--- a/doc/fluid/api_guides/low_level/backward_en.rst
+++ b/doc/fluid/api_guides/low_level/backward_en.rst
@@ -18,6 +18,6 @@ You could refer to `back propagation algorithm <http://deeplearning.stanford.edu
 
 We do not recommend directly calling backpropagation-related APIs in  :code:`fluid` , as these are very low-level APIs. Consider using the relevant APIs in :ref:`api_guide_optimizer_en` instead. When you use optimizer APIs, Fluid automatically calculates the complex back-propagation for you.
 
-If you want to implement it yourself, you can also use: :code:`callback` in :ref:`api_fluid_backward_append_backward` to define the customized gradient form of Operator. 
+If you want to implement it by yourself, you can also use: :code:`callback` in :ref:`api_fluid_backward_append_backward` to define the customized gradient form of Operator. 
 For more information, please refer to: :ref:`api_fluid_backward_append_backward`
- 
\ No newline at end of file
+ 
diff --git a/doc/fluid/api_guides/low_level/parameter_en.rst b/doc/fluid/api_guides/low_level/parameter_en.rst
index f372c43fa87..c1e58440195 100644
--- a/doc/fluid/api_guides/low_level/parameter_en.rst
+++ b/doc/fluid/api_guides/low_level/parameter_en.rst
@@ -4,14 +4,14 @@
 Model Parameters
 #########
 
-Model parameters are weights and bias in the model. In fluid, we use fluid. Parameter class which is inherited from fluid, a persistent variable, to define our model parameters. Model training is the process of learning and updating model parameters. The attributes related to model parameters can be configured by :ref:`api_fluid_ParamAttr` . The configurable contents are as follows:
+Model parameters are weights and biases in a model. In fluid, they are instances of ``fluid.Parameter`` class which is inherited from fluid, and they are all persistable variables. Model training is a process of learning and updating model parameters. The attributes related to model parameters can be configured by :ref:`api_fluid_ParamAttr` . The configurable contents are as follows:
 
 
 - Initialization method
 
 - Regularization
 
-- gradient shear
+- gradient clipping
 
 - Model Average
 
@@ -152,7 +152,7 @@ API reference： :ref:`api_fluid_clip_GradientClipByNorm`
 4. GradientClipByValue
 ----------------------
 
-Limit the value of gradient corresponding to a parameter to [min, max].
+Limit the value of the gradient on a parameter to [min, max].
 
 API reference： :ref:`api_fluid_clip_GradientClipByValue`
 
diff --git a/doc/fluid/api_guides/low_level/program_en.rst b/doc/fluid/api_guides/low_level/program_en.rst
index ee433da91ae..170b5d4d47a 100644
--- a/doc/fluid/api_guides/low_level/program_en.rst
+++ b/doc/fluid/api_guides/low_level/program_en.rst
@@ -8,7 +8,7 @@ Program/Block/Operator/Variable
 Program
 ==================
 
-:code:`Fluid` describes the user's neural network configuration in the form of abstract grammar tree similar to programming language, and the user's description of computation will be written into a Program. Program in Fluid replaces the concept of model in traditional framework. It can describe any complex model by supporting three execution structures: sequential execution, conditional selection and loop execution. Writing :code:`Program` is very close to writing a general program. If you have some programming experience, you will naturally transfer your knowledge.
+:code:`Fluid` describes neural network configuration in the form of abstract grammar tree similar to that of a programming language, and the user's description of computation will be written into a Program. Program in Fluid replaces the concept of models in traditional frameworks. It can describe any complex model through three execution structures: sequential execution, conditional selection and loop execution. Writing :code:`Program` is very close to writing a common program. If you have tried programming before, you will naturally apply your expertise to it.
 
 In brief：
 

From 4b01c92fe81d79df48c53664b8919d72c2cbf6fc Mon Sep 17 00:00:00 2001
From: Hao Wang <31058429+haowang101779990@users.noreply.github.com>
Date: Mon, 18 Mar 2019 15:05:15 +0800
Subject: [PATCH 3/7] Apply suggestions from code review

Co-Authored-By: zy0531 <48094155+zy0531@users.noreply.github.com>
---
 doc/fluid/api_guides/low_level/program_en.rst | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/doc/fluid/api_guides/low_level/program_en.rst b/doc/fluid/api_guides/low_level/program_en.rst
index 170b5d4d47a..25c0d9bf750 100644
--- a/doc/fluid/api_guides/low_level/program_en.rst
+++ b/doc/fluid/api_guides/low_level/program_en.rst
@@ -47,9 +47,9 @@ As mentioned above,  :code:`Block` in Fluid describes a set of Operators that in
 Operator
 =============
 
-In Fluid, all operations of data are represented by :code:`Operator` . In Python, :code:`Operator` in Fluid is encapsulated into modules such as: :code:`paddle.fluid.layers` , :code:`paddle.fluid.nets` , and so on.
+In Fluid, all operations of data are represented by :code:`Operator` . In Python, :code:`Operator` in Fluid is encapsulated into modules like :code:`paddle.fluid.layers` , :code:`paddle.fluid.nets` .
 
-This is because some common operations on Tensor may consist of more basic operations. In order to improve the convenience of use, some encapsulation of the basic Operator is carried out inside the framework, including the creation of Operator dependent on learnable parameters, the initialization details of learnable parameters, and so on, so as to reduce the cost of repeating development by users.
+This is because some common operations on Tensor may consist of more basic operations. For simplicity, some encapsulation of the basic Operator is carried out inside the framework, including the creation of learnable parameters relied by an Operator, the initialization details of learnable parameters, and so on, so as to reduce the cost of further development.
 
 
 
@@ -62,17 +62,17 @@ Variable
 
 In Fluid， :code:`Variable` can contain any type of value -- in most cases a LoD-Tensor.
 
-All the learnable parameters in the model are kept in the memory space in form of :code:`Variable` . In most cases, you do not need to create the learnable parameters in the network by yourself. Fluid provides encapsulation for almost common basic computing modules of the neural network. Taking the simplest full connection model as an example, calling :code:`fluid.layers.fc` directly creates two learnable parameters for the full connection layer, namely, connection weight (W) and bias, without explicitly calling :code:`variable` related interfaces to create learnable parameters.
+All the learnable parameters in the model are kept in the memory space in form of :code:`Variable` . In most cases, you do not need to create the learnable parameters in the network by yourself. Fluid provides encapsulation for almost common basic computing modules of the neural network. Taking the simplest full connection model as an example, calling :code:`fluid.layers.fc` directly creates two learnable parameters for the full connection layer, namely, connection weight (W) and bias, without explicitly calling :code:`Variable` related interfaces to create learnable parameters.
 
 =========
 Related API
 =========
 
 
-* A single neural network for user configuration is called :ref:`api_fluid_Program` . It is noteworthy that when training neural networks, users often need to configure and operate multiple :code:`Program` . For example,  :code:`Program` for parameter initialization, :code:`Program` for training,  :code:`Program` for testing, etc.
+* A single neural network configured by the user is called :ref:`api_fluid_Program` . It is noteworthy that when training neural networks, users often need to configure and operate multiple :code:`Program` . For example,  :code:`Program` for parameter initialization, :code:`Program` for training,  :code:`Program` for testing, etc.
 
 
 * Users can also use :ref:`api_fluid_program_guard` with :code:`with` to modify the configured :ref:`api_fluid_default_startup_program` and :ref:`api_fluid_default_main_program` .
 
 
-* In Fluid，Block internal execution order is determined by control flow，such as :ref:`api_fluid_layers_IfElse` , :ref:`api_fluid_layers_While` and :ref:`api_fluid_layers_Switch` . For more information, please reference to： :ref:`api_guide_control_flow_en` 
+* In Fluid，the execution order in a Block is determined by control flow，such as :ref:`api_fluid_layers_IfElse` , :ref:`api_fluid_layers_While` and :ref:`api_fluid_layers_Switch` . For more information, please refer to： :ref:`api_guide_control_flow_en` 

From cf1febc21141970377bf2f056185d393298f97c7 Mon Sep 17 00:00:00 2001
From: zy0531 <48094155+zy0531@users.noreply.github.com>
Date: Tue, 19 Mar 2019 14:23:35 +0800
Subject: [PATCH 4/7] Update backward_en.rst

---
 doc/fluid/api_guides/low_level/backward_en.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/doc/fluid/api_guides/low_level/backward_en.rst b/doc/fluid/api_guides/low_level/backward_en.rst
index e839d8feac3..022b4900f07 100644
--- a/doc/fluid/api_guides/low_level/backward_en.rst
+++ b/doc/fluid/api_guides/low_level/backward_en.rst
@@ -1,9 +1,9 @@
 .. _api_guide_backward_en:
 
 
-########
+################
 Back Propagation
-########
+################
 
 The ability of neural network to define model depends on optimization algorithm. Optimization is a process of calculating gradient continuously and adjusting learnable parameters. You can refer to  :ref:`api_guide_optimizer_en` to learn more about optimization algorithm in Fluid.
 

From c389cab32b0f397c19190315207e28f18a4e8775 Mon Sep 17 00:00:00 2001
From: zy0531 <48094155+zy0531@users.noreply.github.com>
Date: Tue, 19 Mar 2019 14:24:44 +0800
Subject: [PATCH 5/7] Update parameter_en.rst

---
 .../api_guides/low_level/parameter_en.rst     | 30 +++++++++----------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/doc/fluid/api_guides/low_level/parameter_en.rst b/doc/fluid/api_guides/low_level/parameter_en.rst
index c1e58440195..fe46687ce8c 100644
--- a/doc/fluid/api_guides/low_level/parameter_en.rst
+++ b/doc/fluid/api_guides/low_level/parameter_en.rst
@@ -1,8 +1,8 @@
 ..  _api_guide_parameter_en:
 
-#########
+##################
 Model Parameters
-#########
+##################
 
 Model parameters are weights and biases in a model. In fluid, they are instances of ``fluid.Parameter`` class which is inherited from fluid, and they are all persistable variables. Model training is a process of learning and updating model parameters. The attributes related to model parameters can be configured by :ref:`api_fluid_ParamAttr` . The configurable contents are as follows:
 
@@ -18,7 +18,7 @@ Model parameters are weights and biases in a model. In fluid, they are instances
 
 
 Initialization method
-=================
+========================
 
 Fluid initializes a single parameter by setting attributes of :code:`initializer` in :code:`ParamAttr` .
 
@@ -44,7 +44,7 @@ Alias：Bilinear
 API reference： :ref:`api_fluid_initializer_BilinearInitializer`
 
 2. ConstantInitializer
-----------------------
+--------------------------
 
 Constant initialization. Initialize the parameter to the specified value.
 
@@ -53,7 +53,7 @@ Alias：Constant
 API reference： :ref:`api_fluid_initializer_ConstantInitializer`
 
 3. MSRAInitializer
-------------------
+----------------------
 
 Please refer to https://arxiv.org/abs/1502.01852 for initialization.
 
@@ -62,7 +62,7 @@ Alias：MSRA
 API reference： :ref:`api_fluid_initializer_MSRAInitializer`
 
 4. NormalInitializer
----------------------
+-------------------------
 
 Initialization method of random Gaussian distribution.
 
@@ -71,7 +71,7 @@ Alias：Normal
 API reference： :ref:`api_fluid_initializer_NormalInitializer`
 
 5. TruncatedNormalInitializer
------------------------------
+---------------------------------
 
 Initialization method of stochastic truncated Gauss distribution.
 
@@ -80,7 +80,7 @@ Alias：TruncatedNormal
 API reference： :ref:`api_fluid_initializer_TruncatedNormalInitializer`
 
 6. UniformInitializer
---------------------
+------------------------
 
 Initialization method of random uniform distribution.
 
@@ -89,7 +89,7 @@ Alias：Uniform
 API reference： :ref:`api_fluid_initializer_UniformInitializer`
 
 7. XavierInitializer
---------------------
+------------------------
 
 Please refer to http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf for initialization.
 
@@ -98,7 +98,7 @@ Alias：Xavier
 API reference： :ref:`api_fluid_initializer_XavierInitializer`
 
 Regularization
-=============
+=================
 
 Fluid regularizes a single parameter by setting attributes of :code:`regularizer` in :code:`ParamAttr` .
 
@@ -129,35 +129,35 @@ Fluid sets clipping method for a single parameter by setting attributes of :code
 The following is the clipping method supported by fluid:
 
 1. ErrorClipByValue
--------------------
+----------------------
 
 Used to clipping the value of a tensor to a specified range.
 
 API reference： :ref:`api_fluid_clip_ErrorClipByValue`
 
 2. GradientClipByGlobalNorm
----------------------------
+------------------------------
 
 Used to limit the global-norm of multiple Tensors to :code:`clip_norm`.
 
 API reference： :ref:`api_fluid_clip_GradientClipByGlobalNorm`
 
 3. GradientClipByNorm
----------------------
+------------------------
 Limit the L2-norm of Tensor to :code:`max_norm` . If Tensor's L2-norm exceeds: :code:`max_norm` ,
 it will calculate a  :code:`scale` . And then all values of the Tensor multiply the :code:`scale` .
 
 API reference： :ref:`api_fluid_clip_GradientClipByNorm`
 
 4. GradientClipByValue
-----------------------
+-------------------------
 
 Limit the value of the gradient on a parameter to [min, max].
 
 API reference： :ref:`api_fluid_clip_GradientClipByValue`
 
 Model Averaging
-========
+================
 
 Fluid determines whether to average a single parameter by setting attributes of :code:`do_model_average` in :code:`ParamAttr` .
 Examples:

From e0a5465b76ed962c1b44d40d7122ed2883d5c444 Mon Sep 17 00:00:00 2001
From: zy0531 <48094155+zy0531@users.noreply.github.com>
Date: Tue, 19 Mar 2019 14:25:38 +0800
Subject: [PATCH 6/7] Update program_en.rst

---
 doc/fluid/api_guides/low_level/program_en.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/doc/fluid/api_guides/low_level/program_en.rst b/doc/fluid/api_guides/low_level/program_en.rst
index 25c0d9bf750..122ac5561cf 100644
--- a/doc/fluid/api_guides/low_level/program_en.rst
+++ b/doc/fluid/api_guides/low_level/program_en.rst
@@ -64,9 +64,9 @@ In Fluid， :code:`Variable` can contain any type of value -- in most cases a Lo
 
 All the learnable parameters in the model are kept in the memory space in form of :code:`Variable` . In most cases, you do not need to create the learnable parameters in the network by yourself. Fluid provides encapsulation for almost common basic computing modules of the neural network. Taking the simplest full connection model as an example, calling :code:`fluid.layers.fc` directly creates two learnable parameters for the full connection layer, namely, connection weight (W) and bias, without explicitly calling :code:`Variable` related interfaces to create learnable parameters.
 
-=========
+==================
 Related API
-=========
+==================
 
 
 * A single neural network configured by the user is called :ref:`api_fluid_Program` . It is noteworthy that when training neural networks, users often need to configure and operate multiple :code:`Program` . For example,  :code:`Program` for parameter initialization, :code:`Program` for training,  :code:`Program` for testing, etc.

From ac3955e5ae2fe4b3fab6904e6970ee277c3fa98c Mon Sep 17 00:00:00 2001
From: zy0531 <48094155+zy0531@users.noreply.github.com>
Date: Tue, 19 Mar 2019 17:58:58 +0800
Subject: [PATCH 7/7] Update doc/fluid/api_guides/low_level/program_en.rst

---
 doc/fluid/api_guides/low_level/program_en.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/fluid/api_guides/low_level/program_en.rst b/doc/fluid/api_guides/low_level/program_en.rst
index 122ac5561cf..f4627a9cc83 100644
--- a/doc/fluid/api_guides/low_level/program_en.rst
+++ b/doc/fluid/api_guides/low_level/program_en.rst
@@ -27,7 +27,7 @@ In brief：
 Block
 =========
 
-:code:`Block` is the concept of variable scope in advanced languages. In programming languages, Block is a pair of braces, which contains local variable definitions and a series of instructions or operators. Control flow structures :code:`if-else` and :code:`for` in programming languages can be equivalent to in deep learning:
+:code:`Block` is the concept of variable scope in advanced languages. In programming languages, Block is a pair of braces, which contains local variable definitions and a series of instructions or operators. Control flow structures :code:`if-else` and :code:`for` in programming languages can be equivalent to the following counterparts in deep learning:
 
 +----------------------+-------------------------+
 | programming languages| Fluid                   |