Merge pull request #934 from CamiWilliams/basics-recipe-zerogradients

Jessica Lin · web-flow · commit e9e5776e736d · 2020-04-20T12:32:34.000-07:00
Zeroing out gradients recipe
diff --git a/recipes_source/recipes/zeroing_out_gradients.py b/recipes_source/recipes/zeroing_out_gradients.py
@@ -1,48 +1,66 @@
-# -*- coding: utf-8 -*-
-"""Zeroing out gradients in PyTorch.ipynb
-
-Automatically generated by Colaboratory.
-
-Original file is located at
-    https://colab.research.google.com/drive/1K2m6BkzNRB2rAN1sELFOP3yRZquaUAAg
-
+"""
 Zeroing out gradients in PyTorch
-=======================
-It is beneficial to zero out gradients when building a neural network. This is because by default, gradients are accumulated in buffers (i.e, not overwritten) whenever ``.backward()`` is called.
+================================
+It is beneficial to zero out gradients when building a neural network.
+This is because by default, gradients are accumulated in buffers (i.e,
+not overwritten) whenever ``.backward()`` is called.
 
 Introduction
----
-When training your neural network, models are able to increase their accuracy through gradient decent. In short, gradient descent is the process of minimizing our loss (or error) by tweaking the weights and biases in our model.
+------------
+When training your neural network, models are able to increase their
+accuracy through gradient decent. In short, gradient descent is the
+process of minimizing our loss (or error) by tweaking the weights and
+biases in our model.
+
+``torch.Tensor`` is the central class of PyTorch. When you create a
+tensor, if you set its attribute ``.requires_grad`` as ``True``, the
+package tracks all operations on it. This happens on subsequent backward
+passes. The gradient for this tensor will be accumulated into ``.grad``
+attribute. The accumulation (or sum) of all the gradients is calculated
+when .backward() is called on the loss tensor.
+
+There are cases where it may be necessary to zero-out the gradients of a
+tensor. For example: when you start your training loop, you should zero
+out the gradients so that you can perform this tracking correctly.
+In this recipe, we will learn how to zero out gradients using the
+PyTorch library. We will demonstrate how to do this by training a neural
+network on the ``CIFAR10`` dataset built into PyTorch.
 
-``torch.Tensor`` is the central class of PyTorch. When you create a tensor, 
-if you set its attribute ``.requires_grad`` as ``True``, the package tracks all operations on it. This happens on subsequent backward passes. The gradient for this tensor will be accumulated into ``.grad`` attribute. The accumulation (or sum) of all the gradients is calculated when .backward() is called on the loss tensor.
+Setup
+-----
+Since we will be training data in this recipe, if you are in a runable
+notebook, it is best to switch the runtime to GPU or TPU.
+Before we begin, we need to install ``torch`` and ``torchvision`` if
+they aren’t already available.
 
-There are cases where it may be necessary to zero-out the gradients of a tensor. For example: when you start your training loop, you should zero out the gradients so that you can perform this tracking correctly.
+::
 
-In this recipe, we will learn how to zero out gradients using the PyTorch library. We will demonstrate how to do this by training a neural network on the ``CIFAR10`` dataset built into PyTorch.
+   pip install torchvision
 
-Setup
----
-Since we will be training data in this recipe, if you are in a runable notebook, it is best to switch the runtime to GPU or TPU. 
 
-Before we begin, we need to install ``torch`` and ``torchvision`` if they aren't already available.
 """
 
-pip install torchvision
-
-"""Steps
------------------
-Steps 1 through 4 set up our data and neural network for training. The process of zeroing out the gradients happens in step 5. If you already have your data and neural network built, skip to 5.
 
-1. Import all necessary libraries for loading our data
-2. Load and normalize the dataset
-3. Build the neural network
-4. Define the loss function
-5. Zero the gradients while training the network
-
-### **1) Import necessary libraries for loading our data**
-For this recipe, we will just be using ``torch`` and ``torchvision`` to access the dataset.
-"""
+######################################################################
+# Steps
+# -----
+# 
+# Steps 1 through 4 set up our data and neural network for training. The
+# process of zeroing out the gradients happens in step 5. If you already
+# have your data and neural network built, skip to 5.
+# 
+# 1. Import all necessary libraries for loading our data
+# 2. Load and normalize the dataset
+# 3. Build the neural network
+# 4. Define the loss function
+# 5. Zero the gradients while training the network
+# 
+# 1. Import necessary libraries for loading our data
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# 
+# For this recipe, we will just be using ``torch`` and ``torchvision`` to
+# access the dataset.
+# 
 
 import torch
 
@@ -54,9 +72,14 @@
 import torchvision
 import torchvision.transforms as transforms
 
-"""### **2) Load and normalize the dataset**
-PyTorch features various built-in datasets (see the Loading Data recipe for more information).
-"""
+
+######################################################################
+# 2. Load and normalize the dataset
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# 
+# PyTorch features various built-in datasets (see the Loading Data recipe
+# for more information).
+# 
 
 transform = transforms.Compose(
     [transforms.ToTensor(),
@@ -75,9 +98,14 @@
 classes = ('plane', 'car', 'bird', 'cat',
            'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
 
-"""### **3) Build the neural network**
-We will use a convolutional neural network. To learn more see the Defining a Neural Network recipe.
-"""
+
+######################################################################
+# 3. Build the neural network
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# 
+# We will use a convolutional neural network. To learn more see the
+# Defining a Neural Network recipe.
+# 
 
 class Net(nn.Module):
     def __init__(self):
@@ -98,19 +126,30 @@ def forward(self, x):
         x = self.fc3(x)
         return x
 
-"""### **4) Define a Loss function and optimizer**
-Let’s use a Classification Cross-Entropy loss and SGD with momentum.
-"""
+
+######################################################################
+# 4. Define a Loss function and optimizer
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# 
+# Let’s use a Classification Cross-Entropy loss and SGD with momentum.
+# 
 
 net = Net()
 criterion = nn.CrossEntropyLoss()
 optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
 
-"""### **5) Zero the gradients while training the network**
-This is when things start to get interesting. We simply have to loop over our data iterator, and feed the inputs to the network and optimize.
 
-Notice that for each entity of data, we zero out the gradients. This is to ensure that we aren't tracking any unnecessary information when we train our neural network.
-"""
+######################################################################
+# 5. Zero the gradients while training the network
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# 
+# This is when things start to get interesting. We simply have to loop
+# over our data iterator, and feed the inputs to the network and optimize.
+# 
+# Notice that for each entity of data, we zero out the gradients. This is
+# to ensure that we aren’t tracking any unnecessary information when we
+# train our neural network.
+# 
 
 for epoch in range(2):  # loop over the dataset multiple times
 
@@ -137,14 +176,18 @@ def forward(self, x):
 
 print('Finished Training')
 
-"""You can also use ``model.zero_grad()``. This is the same as using ``optimizer.zero_grad()`` as long as all your model parameters are in that optimizer. Use your best judgement to decide which one to use.
-
-Congratulations! You have successfully zeroed out gradients PyTorch.
-
-Learn More
-----------------------------
-Take a look at these other recipes to continue your learning:
 
-*   TBD
-*   TBD
-"""
+######################################################################
+# You can also use ``model.zero_grad()``. This is the same as using
+# ``optimizer.zero_grad()`` as long as all your model parameters are in
+# that optimizer. Use your best judgement to decide which one to use.
+# 
+# Congratulations! You have successfully zeroed out gradients PyTorch.
+# 
+# Learn More
+# ----------
+# 
+# Take a look at these other recipes to continue your learning:
+# 
+# -  TBD
+# -  TBD