PaddlePaddle · shanyi15 · Mar 20, 2019 · Mar 16, 2019 · Mar 16, 2019 · Mar 18, 2019
diff --git a/doc/fluid/api_guides/index.rst b/doc/fluid/api_guides/index.rst
@@ -18,3 +18,6 @@ API使用指南分功能向您介绍PaddlePaddle Fluid的API体系和用法，
     low_level/memory_optimize.rst
     low_level/nets.rst
     low_level/parallel_executor.rst
+    low_level/backward.rst
+    low_level/parameter.rst
+    low_level/program.rst
diff --git a/doc/fluid/api_guides/index_en.rst b/doc/fluid/api_guides/index_en.rst
@@ -18,3 +18,6 @@ This section introduces the Fluid API structure and usage, to help you quickly g
     low_level/memory_optimize_en.rst
     low_level/nets_en.rst
     low_level/parallel_executor_en.rst
+    low_level/backward_en.rst
+    low_level/parameter_en.rst
+    low_level/program_en.rst
diff --git a/doc/fluid/api_guides/low_level/backward_en.rst b/doc/fluid/api_guides/low_level/backward_en.rst
@@ -0,0 +1,23 @@
+.. _api_guide_backward_en:
+
+
+################
+Back Propagation
+################
+
+The ability of neural network to define model depends on optimization algorithm. Optimization is a process of calculating gradient continuously and adjusting learnable parameters. You can refer to  :ref:`api_guide_optimizer_en` to learn more about optimization algorithm in Fluid.
+
+In the training process of network, gradient calculation is divided into two steps: forward computing and `back propagation <https://en.wikipedia.org/wiki/Backpropagation>`_ .
+
+Forward computing transfers the state of the input unit to the output unit according to the network structure you build.
+
+Back propagation calculates the derivatives of two or more compound functions by means of `chain rule <https://en.wikipedia.org/wiki/Chain_rule>`_ . The gradient of output unit is propagated back to input unit. According to the calculated gradient, the learning parameters of the network are adjusted.
+
+
+You could refer to `back propagation algorithm <http://deeplearning.stanford.edu/wiki/index.php/%E5%8F%8D%E5%90%91%E4%BC%A0%E5%AF%BC%E7%AE%97%E6%B3%95>`_ for detialed implementation process.
+
+We do not recommend directly calling backpropagation-related APIs in  :code:`fluid` , as these are very low-level APIs. Consider using the relevant APIs in :ref:`api_guide_optimizer_en` instead. When you use optimizer APIs, Fluid automatically calculates the complex back-propagation for you.
+
+If you want to implement it by yourself, you can also use: :code:`callback` in :ref:`api_fluid_backward_append_backward` to define the customized gradient form of Operator. 
+For more information, please refer to: :ref:`api_fluid_backward_append_backward`
+
diff --git a/doc/fluid/api_guides/low_level/parameter.rst b/doc/fluid/api_guides/low_level/parameter.rst
@@ -4,7 +4,7 @@
 模型参数
 #########
 
-模型参数为模型中的weight和bias统称，在fluid中对应fluid.Parameter类，继承自fluid.Variable，是一种可持久化的variable。模型的训练就是不断学习更新模型参数的过程。模型参数相关的属性可以通过 :ref:`cn_api_fluid_param_attr_ParamAttr` 来配置，可配置内容有：
+模型参数为模型中的weight和bias统称，在fluid中对应fluid.Parameter类，继承自fluid.Variable，是一种可持久化的variable。模型的训练就是不断学习更新模型参数的过程。模型参数相关的属性可以通过 :ref:`cn_api_fluid_ParamAttr` 来配置，可配置内容有：
 
 - 初始化方式
 - 正则化

diff --git a/doc/fluid/api_guides/low_level/parameter_en.rst b/doc/fluid/api_guides/low_level/parameter_en.rst
@@ -0,0 +1,175 @@
+..  _api_guide_parameter_en:
+
+##################
+Model Parameters
+##################
+
+Model parameters are weights and biases in a model. In fluid, they are instances of ``fluid.Parameter`` class which is inherited from fluid, and they are all persistable variables. Model training is a process of learning and updating model parameters. The attributes related to model parameters can be configured by :ref:`api_fluid_ParamAttr` . The configurable contents are as follows:
+
+
+- Initialization method
+
+- Regularization
+
+- gradient clipping
+
+- Model Average
+
+
+
+Initialization method
+========================
+
+Fluid initializes a single parameter by setting attributes of :code:`initializer` in :code:`ParamAttr` .
+
+examples：
+
+  .. code-block:: python
+
+      param_attrs = fluid.ParamAttr(name="fc_weight",
+                                initializer=fluid.initializer.ConstantInitializer(1.0))
+      y_predict = fluid.layers.fc(input=x, size=10, param_attr=param_attrs)
+
+
+
+The following is the initialization method supported by fluid:
+
+1. BilinearInitializer
+-----------------------
+
+Linear initialization. The deconvolution operation initialized by this method can be used as a linear interpolation operation.
+
+Alias：Bilinear
+
+API reference： :ref:`api_fluid_initializer_BilinearInitializer`
+
+2. ConstantInitializer
+--------------------------
+
+Constant initialization. Initialize the parameter to the specified value.
+
+Alias：Constant
+
+API reference： :ref:`api_fluid_initializer_ConstantInitializer`
+
+3. MSRAInitializer
+----------------------
+
+Please refer to https://arxiv.org/abs/1502.01852 for initialization.
+
+Alias：MSRA
+
+API reference： :ref:`api_fluid_initializer_MSRAInitializer`
+
+4. NormalInitializer
+-------------------------
+
+Initialization method of random Gaussian distribution.
+
+Alias：Normal
+
+API reference： :ref:`api_fluid_initializer_NormalInitializer`
+
+5. TruncatedNormalInitializer
+---------------------------------
+
+Initialization method of stochastic truncated Gauss distribution.
+
+Alias：TruncatedNormal
+
+API reference： :ref:`api_fluid_initializer_TruncatedNormalInitializer`
+
+6. UniformInitializer
+------------------------
+
+Initialization method of random uniform distribution.
+
+Alias：Uniform
+
+API reference： :ref:`api_fluid_initializer_UniformInitializer`
+
+7. XavierInitializer
+------------------------
+
+Please refer to http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf for initialization.
+
+Alias：Xavier
+
+API reference： :ref:`api_fluid_initializer_XavierInitializer`
+
+Regularization
+=================
+
+Fluid regularizes a single parameter by setting attributes of :code:`regularizer` in :code:`ParamAttr` .
+
+  .. code-block:: python
+
+      param_attrs = fluid.ParamAttr(name="fc_weight",
+                                regularizer=fluid.regularizer.L1DecayRegularizer(0.1))
+      y_predict = fluid.layers.fc(input=x, size=10, param_attr=param_attrs)
+
+The following is the regularization approach supported by fluid:
+
+-  :ref:`api_fluid_regularizer_L1DecayRegularizer` (Alias：L1Decay)
+-  :ref:`api_fluid_regularizer_L2DecayRegularizer` (Alias：L2Decay)
+
+Clipping
+==========
+
+Fluid sets clipping method for a single parameter by setting attributes of :code:`gradient_clip` in :code:`ParamAttr` .
+
+  .. code-block:: python
+
+      param_attrs = fluid.ParamAttr(name="fc_weight",
+                                regularizer=fluid.regularizer.L1DecayRegularizer(0.1))
+      y_predict = fluid.layers.fc(input=x, size=10, param_attr=param_attrs)
+
+
+
+The following is the clipping method supported by fluid:
+
+1. ErrorClipByValue
+----------------------
+
+Used to clipping the value of a tensor to a specified range.
+
+API reference： :ref:`api_fluid_clip_ErrorClipByValue`
+
+2. GradientClipByGlobalNorm
+------------------------------
+
+Used to limit the global-norm of multiple Tensors to :code:`clip_norm`.
+
+API reference： :ref:`api_fluid_clip_GradientClipByGlobalNorm`
+
+3. GradientClipByNorm
+------------------------
+Limit the L2-norm of Tensor to :code:`max_norm` . If Tensor's L2-norm exceeds: :code:`max_norm` ,
+it will calculate a  :code:`scale` . And then all values of the Tensor multiply the :code:`scale` .
+
+API reference： :ref:`api_fluid_clip_GradientClipByNorm`
+
+4. GradientClipByValue
+-------------------------
+
+Limit the value of the gradient on a parameter to [min, max].
+
+API reference： :ref:`api_fluid_clip_GradientClipByValue`
+
+Model Averaging
+================
+
+Fluid determines whether to average a single parameter by setting attributes of :code:`do_model_average` in :code:`ParamAttr` .
+Examples:
+
+  .. code-block:: python
+
+      param_attrs = fluid.ParamAttr(name="fc_weight",
+                                do_model_average=true)
+      y_predict = fluid.layers.fc(input=x, size=10, param_attr=param_attrs)
+
+In the miniBatch training process, parameters will be updated once after each batch, and the average model averages the parameters generated by the latest K updates.
+
+The averaged parameters are only used for testing and prediction, and they do not get involved in the actual training process.
+
+API reference  :ref:`api_fluid_optimizer_ModelAverage` 
diff --git a/doc/fluid/api_guides/low_level/program_en.rst b/doc/fluid/api_guides/low_level/program_en.rst
@@ -0,0 +1,78 @@
+.. _api_guide_Program_en:
+
+###############################
+Program/Block/Operator/Variable
+###############################
+
+==================
+Program
+==================
+
+:code:`Fluid` describes neural network configuration in the form of abstract grammar tree similar to that of a programming language, and the user's description of computation will be written into a Program. Program in Fluid replaces the concept of models in traditional frameworks. It can describe any complex model through three execution structures: sequential execution, conditional selection and loop execution. Writing :code:`Program` is very close to writing a common program. If you have tried programming before, you will naturally apply your expertise to it.
+
+In brief：
+
+* A model is a Fluid :code:`Program`  and can contain more than one :code:`Program` ;
+
+* :code:`Program` consists of nested :code:`Block` , and the concept of :code:`Block` can be analogized to a pair of braces in C++ or Java, or an indentation block in Python.
+
+
+* Computing in :code:`Block` is composed of three ways: sequential execution, conditional selection or loop execution, which constitutes complex computational logic.
+
+
+* :code:`Block` contains descriptions of computation and computational objects. The description of computation is called Operator; the object of computation (or the input and output of Operator) is unified as Tensor. In Fluid, Tensor is represented by 0-leveled `LoD-Tensor <http://paddlepaddle.org/documentation/docs/zh/1.2/user_guides/howto/prepare_data/lod_tensor.html#permalink-4-lod-tensor>`_ .
+
+
+=========
+Block
+=========
+
+:code:`Block` is the concept of variable scope in advanced languages. In programming languages, Block is a pair of braces, which contains local variable definitions and a series of instructions or operators. Control flow structures :code:`if-else` and :code:`for` in programming languages can be equivalent to the following counterparts in deep learning:
+
++----------------------+-------------------------+
+| programming languages| Fluid                   |
++======================+=========================+
+| for, while loop      | RNN,WhileOP             |
++----------------------+-------------------------+
+| if-else, switch      | IfElseOp, SwitchOp      |
++----------------------+-------------------------+
+| execute sequentially | a series of layers      | 
++----------------------+-------------------------+
+
+As mentioned above,  :code:`Block` in Fluid describes a set of Operators that include sequential execution, conditional selection or loop execution, and the operating object of Operator: Tensor.
+
+
+
+=============
+Operator
+=============
+
+In Fluid, all operations of data are represented by :code:`Operator` . In Python, :code:`Operator` in Fluid is encapsulated into modules like :code:`paddle.fluid.layers` , :code:`paddle.fluid.nets` .
+
+This is because some common operations on Tensor may consist of more basic operations. For simplicity, some encapsulation of the basic Operator is carried out inside the framework, including the creation of learnable parameters relied by an Operator, the initialization details of learnable parameters, and so on, so as to reduce the cost of further development.
+
+
+
+More information can be read for reference. `Fluid Design Idea <../../advanced_usage/design_idea/fluid_design_idea.html>`_ 
+
+
+=========
+Variable
+=========
+
+In Fluid， :code:`Variable` can contain any type of value -- in most cases a LoD-Tensor.
+
+All the learnable parameters in the model are kept in the memory space in form of :code:`Variable` . In most cases, you do not need to create the learnable parameters in the network by yourself. Fluid provides encapsulation for almost common basic computing modules of the neural network. Taking the simplest full connection model as an example, calling :code:`fluid.layers.fc` directly creates two learnable parameters for the full connection layer, namely, connection weight (W) and bias, without explicitly calling :code:`Variable` related interfaces to create learnable parameters.
+
+==================
+Related API
+==================
+
+
+* A single neural network configured by the user is called :ref:`api_fluid_Program` . It is noteworthy that when training neural networks, users often need to configure and operate multiple :code:`Program` . For example,  :code:`Program` for parameter initialization, :code:`Program` for training,  :code:`Program` for testing, etc.
+
+
+* Users can also use :ref:`api_fluid_program_guard` with :code:`with` to modify the configured :ref:`api_fluid_default_startup_program` and :ref:`api_fluid_default_main_program` .
+
+
+* In Fluid，the execution order in a Block is determined by control flow，such as :ref:`api_fluid_layers_IfElse` , :ref:`api_fluid_layers_While` and :ref:`api_fluid_layers_Switch` . For more information, please refer to： :ref:`api_guide_control_flow_en`