diff --git a/examples/layers_normalizations.ipynb b/examples/layers_normalizations.ipynb
new file mode 100644
index 0000000000..1127138320
--- /dev/null
+++ b/examples/layers_normalizations.ipynb
@@ -0,0 +1,422 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "name": "Normalizations.ipynb",
+      "version": "0.3.2",
+      "provenance": [],
+      "collapsed_sections": [],
+      "toc_visible": true
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "wFPyjGqMQ82Q",
+        "colab_type": "text"
+      },
+      "source": [
+        "##### Copyright 2019 The TensorFlow Authors.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "aNZ7aEDyQIYU",
+        "colab_type": "code",
+        "colab": {}
+      },
+      "source": [
+        "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
+        "# you may not use this file except in compliance with the License.\n",
+        "# You may obtain a copy of the License at\n",
+        "#\n",
+        "# https://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing, software\n",
+        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
+        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
+        "# See the License for the specific language governing permissions and\n",
+        "# limitations under the License.\n",
+        "\n"
+      ],
+      "execution_count": 0,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "uMOmzhPEQh7b",
+        "colab_type": "text"
+      },
+      "source": [
+        "# Normalizations\n",
+        "\n",
+        "<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
+        "\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/addons/blob/master/examples/layers_normalizations.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
+        "  </td>\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" href=\"https://github.com/tensorflow/addons/blob/master/examples/layers_normalizations.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n",
+        "  </td>\n",
+        "</table>\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "cthm5dovQMJl",
+        "colab_type": "text"
+      },
+      "source": [
+        "\n",
+        "## Overview\n",
+        "This notebook gives a brief introduction into the [normalization layers](https://github.com/tensorflow/addons/blob/master/tensorflow_addons/layers/normalizations.py) of TensorFlow. Currently supported layers are:\n",
+        "* **Group Normalization** (TensorFlow Addons)\n",
+        "* **Instance Normalization** (TensorFlow Addons)\n",
+        "* **Layer Normalization** (TensorFlow Core)\n",
+        "\n",
+        "The basic idea behind these layers is to normalize the output of an activation layer to improve the convergence during training. In contrast to [batch normalization](https://keras.io/layers/normalization/) these normalizations do not work on batches, instead they normalize the activations of a single sample, making them suitable for recurrent neual networks as well. \n",
+        "\n",
+        "Typically the normalization is performed by calculating the mean and the standard deviation of a subgroup in your input tensor. It is also possible to apply a scale and an offset factor to this as well.\n",
+        "\n",
+        "\n",
+        "$y_{i} = \\frac{\\gamma ( x_{i} - \\mu )}{\\sigma }+ \\beta$\n",
+        "\n",
+        "$ y$ : Output\n",
+        "\n",
+        "$x$ : Input\n",
+        "\n",
+        "$\\gamma$ : Scale factor\n",
+        "\n",
+        "$\\mu$: mean\n",
+        "\n",
+        "$\\sigma$: standard deviation\n",
+        "\n",
+        "$\\beta$: Offset factor\n",
+        "\n",
+        "\n",
+        "The following image demonstrates the difference between these techniques. Each subplot shows an input tensor, with N as the batch axis, C as the channel axis, and (H, W)\n",
+        "as the spatial axes (Height and Width of a picture for example). The pixels in blue are normalized by the same mean and variance, computed by aggregating the values of these pixels.\n",
+        "\n",
+        "![](https://github.com/shaohua0116/Group-Normalization-Tensorflow/raw/master/figure/gn.png)\n",
+        "\n",
+        "Source: (https://arxiv.org/pdf/1803.08494.pdf)\n",
+        "\n",
+        "The weights gamma and beta are trainable in all normalization layers to compensate for the possible lost of representational ability. You can activate these factors by setting the `center` or the `scale` flag to `True`. Of course you can use `initializers`, `constraints` and `regularizer` for `beta` and `gamma` to tune these values during the training process. \n",
+        "\n",
+        "##Setup"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "kTlbneoEUKrD",
+        "colab_type": "text"
+      },
+      "source": [
+        "### Install Tensorflow 2.0 and Tensorflow-Addons"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "AOExuXLZSZNE",
+        "colab_type": "code",
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 666
+        },
+        "outputId": "6e55e2de-663b-4ce4-fbe7-4e004594516e"
+      },
+      "source": [
+        "!pip install tensorflow==2.0.0-beta1 \n",
+        "!pip install tensorflow-addons\n",
+        "from __future__ import absolute_import, division, print_function\n",
+        "import tensorflow as tf\n",
+        "import tensorflow_addons as tfa"
+      ],
+      "execution_count": 0,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "u82Gz_gOUPDZ",
+        "colab_type": "text"
+      },
+      "source": [
+        "###Preparing Dataset"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "3wso9oidUZZQ",
+        "colab_type": "code",
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 51
+        },
+        "outputId": "1d547d5f-b6c0-4a8e-9806-a29718dd3f85"
+      },
+      "source": [
+        "mnist = tf.keras.datasets.mnist\n",
+        "\n",
+        "(x_train, y_train),(x_test, y_test) = mnist.load_data()\n",
+        "x_train, x_test = x_train / 255.0, x_test / 255.0"
+      ],
+      "execution_count": 2,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "text": [
+            "Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz\n",
+            "11493376/11490434 [==============================] - 0s 0us/step\n"
+          ],
+          "name": "stdout"
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "UTQH56j89POZ",
+        "colab_type": "text"
+      },
+      "source": [
+        "## Group Normalization Tutorial \n",
+        "\n",
+        "### Introduction\n",
+        "Group Normalization(GN) divides the channels of your inputs into smaller sub groups and normalizes these values based on their mean and variance. Since GN works on a single example this technique is batchsize independent. \n",
+        "\n",
+        "GN experimentally scored closed to batch normalization in image classification tasks. It can be beneficial to use GN instead of Batch Normalization in case your overall batch_size is low, which would lead to bad performance of batch normalization  \n",
+        "\n",
+        "###Example\n",
+        "Splitting 10 channels after a Conv2D layer into 5 subgroups in a standard \"channels last\" setting:"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "aIGjLwYWAm0v",
+        "colab_type": "code",
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 156
+        },
+        "outputId": "6b023506-8f21-4fd6-94f9-9d0bfc605d4c"
+      },
+      "source": [
+        "model = tf.keras.models.Sequential([\n",
+        "  # Reshape into \"channels last\" setup.\n",
+        "  tf.keras.layers.Reshape((28,28,1), input_shape=(28,28)),\n",
+        "  tf.keras.layers.Conv2D(filters=10, kernel_size=(3,3),data_format=\"channels_last\"),\n",
+        "  # Groupnorm Layer\n",
+        "  tfa.layers.normalizations.GroupNormalization(groups=5, axis=3),\n",
+        "  tf.keras.layers.Flatten(),\n",
+        "  tf.keras.layers.Dense(128, activation='relu'),\n",
+        "  tf.keras.layers.Dropout(0.2),\n",
+        "  tf.keras.layers.Dense(10, activation='softmax')\n",
+        "])\n",
+        "\n",
+        "model.compile(optimizer='adam',\n",
+        "              loss='sparse_categorical_crossentropy',\n",
+        "              metrics=['accuracy'])\n",
+        "model.fit(x_test, y_test)"
+      ],
+      "execution_count": 3,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "text": [
+            "Train on 10000 samples\n",
+            "10000/10000 [==============================] - 9s 856us/sample - loss: 0.4905 - accuracy: 0.8524\n"
+          ],
+          "name": "stdout"
+        },
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "<tensorflow.python.keras.callbacks.History at 0x7f312f1c5160>"
+            ]
+          },
+          "metadata": {
+            "tags": []
+          },
+          "execution_count": 3
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "QMwUfJUib3ka",
+        "colab_type": "text"
+      },
+      "source": [
+        "## Instance Normalization Tutorial\n",
+        "### Introduction\n",
+        "Instance Normalization is special case of group normalization where the group size is the same size as the channel size (or the axis size).\n",
+        "\n",
+        "Experimental results show that instance normalization performs well on style transfer when replacing batch normalization. Recently, instance normalization has also been used as a replacement for batch normalization in GANs.\n",
+        "\n",
+        "### Example\n",
+        "Applying InstanceNormalization after a Conv2D Layer and using a uniformed initialized scale and offset factor."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "6sLVv-C8f6Kf",
+        "colab_type": "code",
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 68
+        },
+        "outputId": "4001c67c-315d-4cdc-924e-10da07b2ccb7"
+      },
+      "source": [
+        "model = tf.keras.models.Sequential([\n",
+        "  # Reshape into \"channels last\" setup.\n",
+        "  tf.keras.layers.Reshape((28,28,1), input_shape=(28,28)),\n",
+        "  tf.keras.layers.Conv2D(filters=10, kernel_size=(3,3),data_format=\"channels_last\"),\n",
+        "  # LayerNorm Layer\n",
+        "  tfa.layers.normalizations.InstanceNormalization(axis=3, \n",
+        "                                                  center=True, \n",
+        "                                                  scale=True,\n",
+        "                                                  beta_initializer=\"random_uniform\",\n",
+        "                                                  gamma_initializer=\"random_uniform\"),\n",
+        "  tf.keras.layers.Flatten(),\n",
+        "  tf.keras.layers.Dense(128, activation='relu'),\n",
+        "  tf.keras.layers.Dropout(0.2),\n",
+        "  tf.keras.layers.Dense(10, activation='softmax')\n",
+        "])\n",
+        "\n",
+        "model.compile(optimizer='adam',\n",
+        "              loss='sparse_categorical_crossentropy',\n",
+        "              metrics=['accuracy'])\n",
+        "model.fit(x_test, y_test)"
+      ],
+      "execution_count": 5,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "text": [
+            "Train on 10000 samples\n",
+            "10000/10000 [==============================] - 7s 658us/sample - loss: 0.5463 - accuracy: 0.8327\n"
+          ],
+          "name": "stdout"
+        },
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "<tensorflow.python.keras.callbacks.History at 0x7f3128672dd8>"
+            ]
+          },
+          "metadata": {
+            "tags": []
+          },
+          "execution_count": 5
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "qYdnEocRUCll",
+        "colab_type": "text"
+      },
+      "source": [
+        "## Layer Normalization Tutorial\n",
+        "### Introduction\n",
+        "Layer Normalization is special case of group normalization where the group size is 1. The mean and standard deviation is calculated from all activations of a single sample.\n",
+        "\n",
+        "Experimental results show that Layer normalization is well suited for Recurrent Neural Networks, since it works batchsize independt.\n",
+        "\n",
+        "### Example\n",
+        "\n",
+        "Applying Layernormalization after a Conv2D Layer and using a scale and offset factor. "
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "Fh-Pp_e5UB54",
+        "colab_type": "code",
+        "colab": {
+          "base_uri": "https://localhost:8080/",
+          "height": 68
+        },
+        "outputId": "d92f184d-ad64-4e83-bc02-eea3b4d9a004"
+      },
+      "source": [
+        "model = tf.keras.models.Sequential([\n",
+        "  # Reshape into \"channels last\" setup.\n",
+        "  tf.keras.layers.Reshape((28,28,1), input_shape=(28,28)),\n",
+        "  tf.keras.layers.Conv2D(filters=10, kernel_size=(3,3),data_format=\"channels_last\"),\n",
+        "  # LayerNorm Layer\n",
+        "  tf.keras.layers.LayerNormalization(axis=1 , center=True , scale=True),\n",
+        "  tf.keras.layers.Flatten(),\n",
+        "  tf.keras.layers.Dense(128, activation='relu'),\n",
+        "  tf.keras.layers.Dropout(0.2),\n",
+        "  tf.keras.layers.Dense(10, activation='softmax')\n",
+        "])\n",
+        "\n",
+        "model.compile(optimizer='adam',\n",
+        "              loss='sparse_categorical_crossentropy',\n",
+        "              metrics=['accuracy'])\n",
+        "model.fit(x_test, y_test)"
+      ],
+      "execution_count": 4,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "text": [
+            "Train on 10000 samples\n",
+            "10000/10000 [==============================] - 8s 769us/sample - loss: 0.4453 - accuracy: 0.8728\n"
+          ],
+          "name": "stdout"
+        },
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "<tensorflow.python.keras.callbacks.History at 0x7f3128eb1dd8>"
+            ]
+          },
+          "metadata": {
+            "tags": []
+          },
+          "execution_count": 4
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "shvGfnB0WpQQ",
+        "colab_type": "text"
+      },
+      "source": [
+        "## Literature\n",
+        "[Layer norm](https://arxiv.org/pdf/1607.06450.pdf)\n",
+        "\n",
+        "[Instance norm](https://arxiv.org/pdf/1607.08022.pdf)\n",
+        "\n",
+        "[Group Norm](https://arxiv.org/pdf/1803.08494.pdf)\n",
+        "\n",
+        "[Complete Normalizations Overview](http://mlexplained.com/2018/11/30/an-overview-of-normalization-methods-in-deep-learning/)"
+      ]
+    }
+  ]
+}