diff --git a/examples/layers_normalizations.ipynb b/examples/layers_normalizations.ipynb
new file mode 100644
index 0000000000..1127138320
--- /dev/null
+++ b/examples/layers_normalizations.ipynb
@@ -0,0 +1,422 @@
+{
+ "nbformat": 4,
+ "nbformat_minor": 0,
+ "metadata": {
+ "colab": {
+ "name": "Normalizations.ipynb",
+ "version": "0.3.2",
+ "provenance": [],
+ "collapsed_sections": [],
+ "toc_visible": true
+ },
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3"
+ }
+ },
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "wFPyjGqMQ82Q",
+ "colab_type": "text"
+ },
+ "source": [
+ "##### Copyright 2019 The TensorFlow Authors.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "aNZ7aEDyQIYU",
+ "colab_type": "code",
+ "colab": {}
+ },
+ "source": [
+ "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
+ "# you may not use this file except in compliance with the License.\n",
+ "# You may obtain a copy of the License at\n",
+ "#\n",
+ "# https://www.apache.org/licenses/LICENSE-2.0\n",
+ "#\n",
+ "# Unless required by applicable law or agreed to in writing, software\n",
+ "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
+ "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
+ "# See the License for the specific language governing permissions and\n",
+ "# limitations under the License.\n",
+ "\n"
+ ],
+ "execution_count": 0,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "uMOmzhPEQh7b",
+ "colab_type": "text"
+ },
+ "source": [
+ "# Normalizations\n",
+ "\n",
+ "
\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "cthm5dovQMJl",
+ "colab_type": "text"
+ },
+ "source": [
+ "\n",
+ "## Overview\n",
+ "This notebook gives a brief introduction into the [normalization layers](https://github.com/tensorflow/addons/blob/master/tensorflow_addons/layers/normalizations.py) of TensorFlow. Currently supported layers are:\n",
+ "* **Group Normalization** (TensorFlow Addons)\n",
+ "* **Instance Normalization** (TensorFlow Addons)\n",
+ "* **Layer Normalization** (TensorFlow Core)\n",
+ "\n",
+ "The basic idea behind these layers is to normalize the output of an activation layer to improve the convergence during training. In contrast to [batch normalization](https://keras.io/layers/normalization/) these normalizations do not work on batches, instead they normalize the activations of a single sample, making them suitable for recurrent neual networks as well. \n",
+ "\n",
+ "Typically the normalization is performed by calculating the mean and the standard deviation of a subgroup in your input tensor. It is also possible to apply a scale and an offset factor to this as well.\n",
+ "\n",
+ "\n",
+ "$y_{i} = \\frac{\\gamma ( x_{i} - \\mu )}{\\sigma }+ \\beta$\n",
+ "\n",
+ "$ y$ : Output\n",
+ "\n",
+ "$x$ : Input\n",
+ "\n",
+ "$\\gamma$ : Scale factor\n",
+ "\n",
+ "$\\mu$: mean\n",
+ "\n",
+ "$\\sigma$: standard deviation\n",
+ "\n",
+ "$\\beta$: Offset factor\n",
+ "\n",
+ "\n",
+ "The following image demonstrates the difference between these techniques. Each subplot shows an input tensor, with N as the batch axis, C as the channel axis, and (H, W)\n",
+ "as the spatial axes (Height and Width of a picture for example). The pixels in blue are normalized by the same mean and variance, computed by aggregating the values of these pixels.\n",
+ "\n",
+ "\n",
+ "\n",
+ "Source: (https://arxiv.org/pdf/1803.08494.pdf)\n",
+ "\n",
+ "The weights gamma and beta are trainable in all normalization layers to compensate for the possible lost of representational ability. You can activate these factors by setting the `center` or the `scale` flag to `True`. Of course you can use `initializers`, `constraints` and `regularizer` for `beta` and `gamma` to tune these values during the training process. \n",
+ "\n",
+ "##Setup"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "kTlbneoEUKrD",
+ "colab_type": "text"
+ },
+ "source": [
+ "### Install Tensorflow 2.0 and Tensorflow-Addons"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "AOExuXLZSZNE",
+ "colab_type": "code",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 666
+ },
+ "outputId": "6e55e2de-663b-4ce4-fbe7-4e004594516e"
+ },
+ "source": [
+ "!pip install tensorflow==2.0.0-beta1 \n",
+ "!pip install tensorflow-addons\n",
+ "from __future__ import absolute_import, division, print_function\n",
+ "import tensorflow as tf\n",
+ "import tensorflow_addons as tfa"
+ ],
+ "execution_count": 0,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "u82Gz_gOUPDZ",
+ "colab_type": "text"
+ },
+ "source": [
+ "###Preparing Dataset"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "3wso9oidUZZQ",
+ "colab_type": "code",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 51
+ },
+ "outputId": "1d547d5f-b6c0-4a8e-9806-a29718dd3f85"
+ },
+ "source": [
+ "mnist = tf.keras.datasets.mnist\n",
+ "\n",
+ "(x_train, y_train),(x_test, y_test) = mnist.load_data()\n",
+ "x_train, x_test = x_train / 255.0, x_test / 255.0"
+ ],
+ "execution_count": 2,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "text": [
+ "Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz\n",
+ "11493376/11490434 [==============================] - 0s 0us/step\n"
+ ],
+ "name": "stdout"
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "UTQH56j89POZ",
+ "colab_type": "text"
+ },
+ "source": [
+ "## Group Normalization Tutorial \n",
+ "\n",
+ "### Introduction\n",
+ "Group Normalization(GN) divides the channels of your inputs into smaller sub groups and normalizes these values based on their mean and variance. Since GN works on a single example this technique is batchsize independent. \n",
+ "\n",
+ "GN experimentally scored closed to batch normalization in image classification tasks. It can be beneficial to use GN instead of Batch Normalization in case your overall batch_size is low, which would lead to bad performance of batch normalization \n",
+ "\n",
+ "###Example\n",
+ "Splitting 10 channels after a Conv2D layer into 5 subgroups in a standard \"channels last\" setting:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "aIGjLwYWAm0v",
+ "colab_type": "code",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 156
+ },
+ "outputId": "6b023506-8f21-4fd6-94f9-9d0bfc605d4c"
+ },
+ "source": [
+ "model = tf.keras.models.Sequential([\n",
+ " # Reshape into \"channels last\" setup.\n",
+ " tf.keras.layers.Reshape((28,28,1), input_shape=(28,28)),\n",
+ " tf.keras.layers.Conv2D(filters=10, kernel_size=(3,3),data_format=\"channels_last\"),\n",
+ " # Groupnorm Layer\n",
+ " tfa.layers.normalizations.GroupNormalization(groups=5, axis=3),\n",
+ " tf.keras.layers.Flatten(),\n",
+ " tf.keras.layers.Dense(128, activation='relu'),\n",
+ " tf.keras.layers.Dropout(0.2),\n",
+ " tf.keras.layers.Dense(10, activation='softmax')\n",
+ "])\n",
+ "\n",
+ "model.compile(optimizer='adam',\n",
+ " loss='sparse_categorical_crossentropy',\n",
+ " metrics=['accuracy'])\n",
+ "model.fit(x_test, y_test)"
+ ],
+ "execution_count": 3,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "text": [
+ "Train on 10000 samples\n",
+ "10000/10000 [==============================] - 9s 856us/sample - loss: 0.4905 - accuracy: 0.8524\n"
+ ],
+ "name": "stdout"
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 3
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "QMwUfJUib3ka",
+ "colab_type": "text"
+ },
+ "source": [
+ "## Instance Normalization Tutorial\n",
+ "### Introduction\n",
+ "Instance Normalization is special case of group normalization where the group size is the same size as the channel size (or the axis size).\n",
+ "\n",
+ "Experimental results show that instance normalization performs well on style transfer when replacing batch normalization. Recently, instance normalization has also been used as a replacement for batch normalization in GANs.\n",
+ "\n",
+ "### Example\n",
+ "Applying InstanceNormalization after a Conv2D Layer and using a uniformed initialized scale and offset factor."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "6sLVv-C8f6Kf",
+ "colab_type": "code",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 68
+ },
+ "outputId": "4001c67c-315d-4cdc-924e-10da07b2ccb7"
+ },
+ "source": [
+ "model = tf.keras.models.Sequential([\n",
+ " # Reshape into \"channels last\" setup.\n",
+ " tf.keras.layers.Reshape((28,28,1), input_shape=(28,28)),\n",
+ " tf.keras.layers.Conv2D(filters=10, kernel_size=(3,3),data_format=\"channels_last\"),\n",
+ " # LayerNorm Layer\n",
+ " tfa.layers.normalizations.InstanceNormalization(axis=3, \n",
+ " center=True, \n",
+ " scale=True,\n",
+ " beta_initializer=\"random_uniform\",\n",
+ " gamma_initializer=\"random_uniform\"),\n",
+ " tf.keras.layers.Flatten(),\n",
+ " tf.keras.layers.Dense(128, activation='relu'),\n",
+ " tf.keras.layers.Dropout(0.2),\n",
+ " tf.keras.layers.Dense(10, activation='softmax')\n",
+ "])\n",
+ "\n",
+ "model.compile(optimizer='adam',\n",
+ " loss='sparse_categorical_crossentropy',\n",
+ " metrics=['accuracy'])\n",
+ "model.fit(x_test, y_test)"
+ ],
+ "execution_count": 5,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "text": [
+ "Train on 10000 samples\n",
+ "10000/10000 [==============================] - 7s 658us/sample - loss: 0.5463 - accuracy: 0.8327\n"
+ ],
+ "name": "stdout"
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 5
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "qYdnEocRUCll",
+ "colab_type": "text"
+ },
+ "source": [
+ "## Layer Normalization Tutorial\n",
+ "### Introduction\n",
+ "Layer Normalization is special case of group normalization where the group size is 1. The mean and standard deviation is calculated from all activations of a single sample.\n",
+ "\n",
+ "Experimental results show that Layer normalization is well suited for Recurrent Neural Networks, since it works batchsize independt.\n",
+ "\n",
+ "### Example\n",
+ "\n",
+ "Applying Layernormalization after a Conv2D Layer and using a scale and offset factor. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "Fh-Pp_e5UB54",
+ "colab_type": "code",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 68
+ },
+ "outputId": "d92f184d-ad64-4e83-bc02-eea3b4d9a004"
+ },
+ "source": [
+ "model = tf.keras.models.Sequential([\n",
+ " # Reshape into \"channels last\" setup.\n",
+ " tf.keras.layers.Reshape((28,28,1), input_shape=(28,28)),\n",
+ " tf.keras.layers.Conv2D(filters=10, kernel_size=(3,3),data_format=\"channels_last\"),\n",
+ " # LayerNorm Layer\n",
+ " tf.keras.layers.LayerNormalization(axis=1 , center=True , scale=True),\n",
+ " tf.keras.layers.Flatten(),\n",
+ " tf.keras.layers.Dense(128, activation='relu'),\n",
+ " tf.keras.layers.Dropout(0.2),\n",
+ " tf.keras.layers.Dense(10, activation='softmax')\n",
+ "])\n",
+ "\n",
+ "model.compile(optimizer='adam',\n",
+ " loss='sparse_categorical_crossentropy',\n",
+ " metrics=['accuracy'])\n",
+ "model.fit(x_test, y_test)"
+ ],
+ "execution_count": 4,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "text": [
+ "Train on 10000 samples\n",
+ "10000/10000 [==============================] - 8s 769us/sample - loss: 0.4453 - accuracy: 0.8728\n"
+ ],
+ "name": "stdout"
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 4
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "shvGfnB0WpQQ",
+ "colab_type": "text"
+ },
+ "source": [
+ "## Literature\n",
+ "[Layer norm](https://arxiv.org/pdf/1607.06450.pdf)\n",
+ "\n",
+ "[Instance norm](https://arxiv.org/pdf/1607.08022.pdf)\n",
+ "\n",
+ "[Group Norm](https://arxiv.org/pdf/1803.08494.pdf)\n",
+ "\n",
+ "[Complete Normalizations Overview](http://mlexplained.com/2018/11/30/an-overview-of-normalization-methods-in-deep-learning/)"
+ ]
+ }
+ ]
+}