diff --git a/examples/layers_normalizations.ipynb b/examples/layers_normalizations.ipynb new file mode 100644 index 0000000000..1127138320 --- /dev/null +++ b/examples/layers_normalizations.ipynb @@ -0,0 +1,422 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "name": "Normalizations.ipynb", + "version": "0.3.2", + "provenance": [], + "collapsed_sections": [], + "toc_visible": true + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + } + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "wFPyjGqMQ82Q", + "colab_type": "text" + }, + "source": [ + "##### Copyright 2019 The TensorFlow Authors.\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "aNZ7aEDyQIYU", + "colab_type": "code", + "colab": {} + }, + "source": [ + "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License.\n", + "\n" + ], + "execution_count": 0, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "uMOmzhPEQh7b", + "colab_type": "text" + }, + "source": [ + "# Normalizations\n", + "\n", + "\n", + "\n", + " \n", + " \n", + "
\n", + " Run in Google Colab\n", + " \n", + " View source on GitHub\n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cthm5dovQMJl", + "colab_type": "text" + }, + "source": [ + "\n", + "## Overview\n", + "This notebook gives a brief introduction into the [normalization layers](https://github.com/tensorflow/addons/blob/master/tensorflow_addons/layers/normalizations.py) of TensorFlow. Currently supported layers are:\n", + "* **Group Normalization** (TensorFlow Addons)\n", + "* **Instance Normalization** (TensorFlow Addons)\n", + "* **Layer Normalization** (TensorFlow Core)\n", + "\n", + "The basic idea behind these layers is to normalize the output of an activation layer to improve the convergence during training. In contrast to [batch normalization](https://keras.io/layers/normalization/) these normalizations do not work on batches, instead they normalize the activations of a single sample, making them suitable for recurrent neual networks as well. \n", + "\n", + "Typically the normalization is performed by calculating the mean and the standard deviation of a subgroup in your input tensor. It is also possible to apply a scale and an offset factor to this as well.\n", + "\n", + "\n", + "$y_{i} = \\frac{\\gamma ( x_{i} - \\mu )}{\\sigma }+ \\beta$\n", + "\n", + "$ y$ : Output\n", + "\n", + "$x$ : Input\n", + "\n", + "$\\gamma$ : Scale factor\n", + "\n", + "$\\mu$: mean\n", + "\n", + "$\\sigma$: standard deviation\n", + "\n", + "$\\beta$: Offset factor\n", + "\n", + "\n", + "The following image demonstrates the difference between these techniques. Each subplot shows an input tensor, with N as the batch axis, C as the channel axis, and (H, W)\n", + "as the spatial axes (Height and Width of a picture for example). The pixels in blue are normalized by the same mean and variance, computed by aggregating the values of these pixels.\n", + "\n", + "![](https://github.com/shaohua0116/Group-Normalization-Tensorflow/raw/master/figure/gn.png)\n", + "\n", + "Source: (https://arxiv.org/pdf/1803.08494.pdf)\n", + "\n", + "The weights gamma and beta are trainable in all normalization layers to compensate for the possible lost of representational ability. You can activate these factors by setting the `center` or the `scale` flag to `True`. Of course you can use `initializers`, `constraints` and `regularizer` for `beta` and `gamma` to tune these values during the training process. \n", + "\n", + "##Setup" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "kTlbneoEUKrD", + "colab_type": "text" + }, + "source": [ + "### Install Tensorflow 2.0 and Tensorflow-Addons" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "AOExuXLZSZNE", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 666 + }, + "outputId": "6e55e2de-663b-4ce4-fbe7-4e004594516e" + }, + "source": [ + "!pip install tensorflow==2.0.0-beta1 \n", + "!pip install tensorflow-addons\n", + "from __future__ import absolute_import, division, print_function\n", + "import tensorflow as tf\n", + "import tensorflow_addons as tfa" + ], + "execution_count": 0, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "u82Gz_gOUPDZ", + "colab_type": "text" + }, + "source": [ + "###Preparing Dataset" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "3wso9oidUZZQ", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 51 + }, + "outputId": "1d547d5f-b6c0-4a8e-9806-a29718dd3f85" + }, + "source": [ + "mnist = tf.keras.datasets.mnist\n", + "\n", + "(x_train, y_train),(x_test, y_test) = mnist.load_data()\n", + "x_train, x_test = x_train / 255.0, x_test / 255.0" + ], + "execution_count": 2, + "outputs": [ + { + "output_type": "stream", + "text": [ + "Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz\n", + "11493376/11490434 [==============================] - 0s 0us/step\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "UTQH56j89POZ", + "colab_type": "text" + }, + "source": [ + "## Group Normalization Tutorial \n", + "\n", + "### Introduction\n", + "Group Normalization(GN) divides the channels of your inputs into smaller sub groups and normalizes these values based on their mean and variance. Since GN works on a single example this technique is batchsize independent. \n", + "\n", + "GN experimentally scored closed to batch normalization in image classification tasks. It can be beneficial to use GN instead of Batch Normalization in case your overall batch_size is low, which would lead to bad performance of batch normalization \n", + "\n", + "###Example\n", + "Splitting 10 channels after a Conv2D layer into 5 subgroups in a standard \"channels last\" setting:" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "aIGjLwYWAm0v", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 156 + }, + "outputId": "6b023506-8f21-4fd6-94f9-9d0bfc605d4c" + }, + "source": [ + "model = tf.keras.models.Sequential([\n", + " # Reshape into \"channels last\" setup.\n", + " tf.keras.layers.Reshape((28,28,1), input_shape=(28,28)),\n", + " tf.keras.layers.Conv2D(filters=10, kernel_size=(3,3),data_format=\"channels_last\"),\n", + " # Groupnorm Layer\n", + " tfa.layers.normalizations.GroupNormalization(groups=5, axis=3),\n", + " tf.keras.layers.Flatten(),\n", + " tf.keras.layers.Dense(128, activation='relu'),\n", + " tf.keras.layers.Dropout(0.2),\n", + " tf.keras.layers.Dense(10, activation='softmax')\n", + "])\n", + "\n", + "model.compile(optimizer='adam',\n", + " loss='sparse_categorical_crossentropy',\n", + " metrics=['accuracy'])\n", + "model.fit(x_test, y_test)" + ], + "execution_count": 3, + "outputs": [ + { + "output_type": "stream", + "text": [ + "Train on 10000 samples\n", + "10000/10000 [==============================] - 9s 856us/sample - loss: 0.4905 - accuracy: 0.8524\n" + ], + "name": "stdout" + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 3 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QMwUfJUib3ka", + "colab_type": "text" + }, + "source": [ + "## Instance Normalization Tutorial\n", + "### Introduction\n", + "Instance Normalization is special case of group normalization where the group size is the same size as the channel size (or the axis size).\n", + "\n", + "Experimental results show that instance normalization performs well on style transfer when replacing batch normalization. Recently, instance normalization has also been used as a replacement for batch normalization in GANs.\n", + "\n", + "### Example\n", + "Applying InstanceNormalization after a Conv2D Layer and using a uniformed initialized scale and offset factor." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "6sLVv-C8f6Kf", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 68 + }, + "outputId": "4001c67c-315d-4cdc-924e-10da07b2ccb7" + }, + "source": [ + "model = tf.keras.models.Sequential([\n", + " # Reshape into \"channels last\" setup.\n", + " tf.keras.layers.Reshape((28,28,1), input_shape=(28,28)),\n", + " tf.keras.layers.Conv2D(filters=10, kernel_size=(3,3),data_format=\"channels_last\"),\n", + " # LayerNorm Layer\n", + " tfa.layers.normalizations.InstanceNormalization(axis=3, \n", + " center=True, \n", + " scale=True,\n", + " beta_initializer=\"random_uniform\",\n", + " gamma_initializer=\"random_uniform\"),\n", + " tf.keras.layers.Flatten(),\n", + " tf.keras.layers.Dense(128, activation='relu'),\n", + " tf.keras.layers.Dropout(0.2),\n", + " tf.keras.layers.Dense(10, activation='softmax')\n", + "])\n", + "\n", + "model.compile(optimizer='adam',\n", + " loss='sparse_categorical_crossentropy',\n", + " metrics=['accuracy'])\n", + "model.fit(x_test, y_test)" + ], + "execution_count": 5, + "outputs": [ + { + "output_type": "stream", + "text": [ + "Train on 10000 samples\n", + "10000/10000 [==============================] - 7s 658us/sample - loss: 0.5463 - accuracy: 0.8327\n" + ], + "name": "stdout" + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 5 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qYdnEocRUCll", + "colab_type": "text" + }, + "source": [ + "## Layer Normalization Tutorial\n", + "### Introduction\n", + "Layer Normalization is special case of group normalization where the group size is 1. The mean and standard deviation is calculated from all activations of a single sample.\n", + "\n", + "Experimental results show that Layer normalization is well suited for Recurrent Neural Networks, since it works batchsize independt.\n", + "\n", + "### Example\n", + "\n", + "Applying Layernormalization after a Conv2D Layer and using a scale and offset factor. " + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Fh-Pp_e5UB54", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 68 + }, + "outputId": "d92f184d-ad64-4e83-bc02-eea3b4d9a004" + }, + "source": [ + "model = tf.keras.models.Sequential([\n", + " # Reshape into \"channels last\" setup.\n", + " tf.keras.layers.Reshape((28,28,1), input_shape=(28,28)),\n", + " tf.keras.layers.Conv2D(filters=10, kernel_size=(3,3),data_format=\"channels_last\"),\n", + " # LayerNorm Layer\n", + " tf.keras.layers.LayerNormalization(axis=1 , center=True , scale=True),\n", + " tf.keras.layers.Flatten(),\n", + " tf.keras.layers.Dense(128, activation='relu'),\n", + " tf.keras.layers.Dropout(0.2),\n", + " tf.keras.layers.Dense(10, activation='softmax')\n", + "])\n", + "\n", + "model.compile(optimizer='adam',\n", + " loss='sparse_categorical_crossentropy',\n", + " metrics=['accuracy'])\n", + "model.fit(x_test, y_test)" + ], + "execution_count": 4, + "outputs": [ + { + "output_type": "stream", + "text": [ + "Train on 10000 samples\n", + "10000/10000 [==============================] - 8s 769us/sample - loss: 0.4453 - accuracy: 0.8728\n" + ], + "name": "stdout" + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 4 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "shvGfnB0WpQQ", + "colab_type": "text" + }, + "source": [ + "## Literature\n", + "[Layer norm](https://arxiv.org/pdf/1607.06450.pdf)\n", + "\n", + "[Instance norm](https://arxiv.org/pdf/1607.08022.pdf)\n", + "\n", + "[Group Norm](https://arxiv.org/pdf/1803.08494.pdf)\n", + "\n", + "[Complete Normalizations Overview](http://mlexplained.com/2018/11/30/an-overview-of-normalization-methods-in-deep-learning/)" + ] + } + ] +}