From 1d374bd2bf7401d22e643155bf664829a694d0a7 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Adam=20Wr=C3=B3bel?= <adam.wrobel@deepsense.ai>
Date: Fri, 1 Oct 2021 12:02:41 +0200
Subject: [PATCH 1/5] Issue #64 - Pytorch Tutorial 3 - Jupter Notebook and
 Markdown

---
 .../pytorch/tut3_mixed_precision/README.md    | 391 ++++++--
 .../tut3_mixed_precision/walkthrough.ipynb    | 916 ++++++++++++++++++
 .../tut3_mixed_precision/walkthrough.py       | 446 +++++++--
 .../walkthrough_code_only.py                  | 150 +++
 4 files changed, 1747 insertions(+), 156 deletions(-)
 create mode 100644 tutorials/pytorch/tut3_mixed_precision/walkthrough.ipynb
 create mode 100644 tutorials/pytorch/tut3_mixed_precision/walkthrough_code_only.py

diff --git a/tutorials/pytorch/tut3_mixed_precision/README.md b/tutorials/pytorch/tut3_mixed_precision/README.md
index 6096846..68503aa 100644
--- a/tutorials/pytorch/tut3_mixed_precision/README.md
+++ b/tutorials/pytorch/tut3_mixed_precision/README.md
@@ -1,91 +1,108 @@
-Half and mixed precision in PopTorch
-====================================
-
-This tutorial shows how to use half and mixed precision in PopTorch with the example task of training a simple CNN model on a single Graphcore IPU (Mk1 or Mk2).
-
-If you are not familiar with PopTorch, you may need to go through this [introduction to PopTorch tutorial](../tut1_basics) first.
+# Half and mixed precision in PopTorch
+This tutorial shows how to use half and mixed precision in PopTorch with the
+example task of training a simple CNN model on a single Graphcore IPU (Mk1 or 
+Mk2).
 
 Requirements:
-   - an installed Poplar SDK. See the Getting Started guide for your IPU system for details of how to install the SDK;
-   - Other Python modules: `pip install -r requirements.txt`
-
-Table of Contents
-=================
-* [General](#general)
-    + [Motives for half precision](#motives-for-half-precision)
-    + [Numerical stability](#numerical-stability)
-      - [Loss scaling](#loss-scaling)
-      - [Stochastic rounding](#stochastic-rounding)
-* [Train a model in half precision](#train-a-model-in-half-precision)
-    + [Import the packages](#import-the-packages)
-    + [Build the model](#build-the-model)
-      - [Casting a model's parameters](#casting-a-model-s-parameters)
-      - [Casting a single layer's parameters](#casting-a-single-layer-s-parameters)
-    + [Prepare the data](#prepare-the-data)
-    + [Optimizers and loss scaling](#optimizers-and-loss-scaling)
-    + [Set PopTorch's options](#set-poptorch-s-options)
-      - [Stochastic rounding](#stochastic-rounding)
-      - [Partials data type](#partials-data-type)
-    + [Train the model](#train-the-model)
-    + [Evaluate the model](#evaluate-the-model)
-* [Visualise the memory footprint](#visualise-the-memory-footprint)
-* [Debug floating-point exceptions](#debug-floating-point-exceptions)
-* [PopTorch tracing](#poptorch-tracing)
-* [Summary](#summary)
+- an installed Poplar SDK. See the Getting Started guide for your IPU hardware 
+for details of how to install the SDK;
+- Other Python modules: `pip install -r requirements.txt`
 
 # General
 
 ## Motives for half precision
 
-Data is stored in memory, and some formats to store that data require less memory than others. In a device's memory, when it comes to numerical data, we use either integers or real numbers. Real numbers are represented by one of several floating point formats, which vary in how many bits they use to represent each number. Using more bits allows for greater precision and a wider range of representable numbers, whereas using fewer bits allows for faster calculations and reduces memory and power usage. In deep learning applications, where less precise calculations are acceptable and throughput is critical, using a lower precision format can provide substantial gains in performance.
+Data is stored in memory, and some formats to store that data require less 
+memory than others. In a device's memory, when it comes to numerical data, 
+we use either integers or real numbers. Real numbers are represented by one 
+of several floating point formats, which vary in how many bits they use to 
+represent each number. Using more bits allows for greater precision and a 
+wider range of representable numbers, whereas using fewer bits allows for 
+faster calculations and reduces memory and power usage. 
+
+In deep learning applications, where less precise calculations are acceptable 
+and throughput is critical, using a lower precision format can provide 
+substantial gains in performance.
 
 The Graphcore IPU provides native support for two floating-point formats:
 
 - IEEE single-precision, which uses 32 bits for each number (FP32)
 - IEEE half-precision, which uses 16 bits for each number (FP16)
 
-Some applications which use FP16 do all calculations in FP16, whereas others use a mix of FP16 and FP32. The latter approach is known as *mixed precision*.
+Some applications which use FP16 do all calculations in FP16, whereas others 
+use a mix of FP16 and FP32. The latter approach is known as *mixed precision*.
 
-In this tutorial, we are going to talk about real numbers represented in FP32 and FP16, and how to use these data types (dtypes) in PopTorch in order to reduce the memory requirements of a model.
+In this tutorial, we are going to talk about real numbers represented 
+in FP32 and FP16, and how to use these data types (dtypes) in PopTorch in 
+order to reduce the memory requirements of a model.
 
 ## Numerical stability
 
-Numeric stability refers to how a model's performance is affected by the use of a lower-precision dtype. We say an operation is "numerically unstable" in FP16 if running it in this dtype causes the model to have worse accuracy compared to running the operation in FP32. Two techniques that can be used to increase the numerical stability of a model are loss scaling and stochastic rounding.
+Numeric stability refers to how a model's performance is affected by the use 
+of a lower-precision dtype. We say an operation is "numerically unstable" in 
+FP16 if running it in this dtype causes the model to have worse accuracy 
+compared to running the operation in FP32. Two techniques that can be used to 
+increase the numerical stability of a model are  loss scaling and stochastic 
+rounding.
 
 ### Loss scaling
 
-A numerical issue that can occur when training a model in half-precision is that the gradients can underflow. This can be difficult to debug because the model will simply appear to not be training, and can be especially damaging because any gradients which underflow will propagate a value of 0 backwards to other gradient calculations.
+A numerical issue that can occur when training a model in half-precision is 
+that the gradients can underflow. This can be difficult to debug because the 
+model will simply appear to not be training, and can be especially damaging 
+because any gradients which underflow will propagate a value of 0 backwards 
+to other gradient calculations.
 
-The standard solution to this is known as *loss scaling*, which consists of scaling up the loss value right before the start of backpropagation to prevent numerical underflow of the gradients. Instructions on how to use loss scaling will be discussed later in this tutorial.
+The standard solution to this is known as *loss scaling*, which consists of 
+scaling up the loss value right before the start of backpropagation to prevent 
+numerical underflow of the gradients. Instructions on how to use loss scaling 
+will be discussed later in this tutorial.
 
 ### Stochastic rounding
 
-When training in half or mixed precision, numbers multiplied by each other will need to be rounded in order to fit into the floating point format used. Stochastic rounding is the process of using a probabilistic equation for the rounding. Instead of always rounding to the nearest representable number, we round up or down with a probability such that the expected value after rounding is equal to the value before rounding. Since the expected value of an addition after rounding is equal to the exact result of the addition, the expected value of a sum is also its exact value.
+When training in half or mixed precision, numbers multiplied by each other 
+will need to be rounded in order to fit into the floating point format used. 
+Stochastic rounding is the process of using a probabilistic equation for the 
+rounding. Instead of always rounding to the nearest representable number, we 
+round up or down with a probability such that the expected value after 
+rounding is equal to the value before rounding. Since the expected value of 
+an addition after rounding is equal to the exact result of the addition, the 
+expected value of a sum is also its exact value.
 
-This means that on average, the values of the parameters of a network will be close to the values they would have had if a higher-precision format had been used. The added bonus of using stochastic rounding is that the parameters can be stored in FP16, which means the parameters can be stored using half as much memory. This can be especially helpful when training with small batch sizes, where the memory used to store the parameters is proportionally greater than the memory used to store parameters when training with large batch sizes.
+This means that on average, the values of the parameters of a network will be 
+close to the values they would have had if a higher-precision format had been 
+used. The added bonus of using stochastic rounding is that the parameters can 
+be stored in FP16, which means the parameters can be stored using half as much 
+memory. This can be especially helpful when training with small batch sizes, 
+where the memory used to store the parameters is proportionally greater than 
+the memory used to store parameters when training with large batch sizes.
 
-It is highly recommended that you enable this feature when training neural networks with FP16 weights. The instructions to enable it in PopTorch are presented later in this tutorial.
+It is highly recommended that you enable this feature when training neural 
+networks with FP16 weights. The instructions to enable it in PopTorch are 
+presented later in this tutorial.
 
-# Train a model in half precision
+Import the packages
 
-## Import the packages
-
-Among the packages we will use, there is `torchvision` from which we will download a dataset and construct a simple model, and `tqdm` which is a simple package to create progress bars so that we can visually monitor the progress of our training job.
 
 ```python
 import torch
-import poptorch
+import torch.nn as nn
 import torchvision
-from torchvision import transforms
-from tqdm import tqdm
+import torchvision.transforms as transforms
+import poptorch
+from tqdm.auto import tqdm
 ```
 
 ## Build the model
 
-We use the same model as in [the previous tutorials on PopTorch](../). Just like in the [previous tutorial](../tut2_efficient_data_loading), we are using larger images (128x128) to simulate a heavier data load. This will make the difference in memory between FP32 and FP16 meaningful enough to showcase in this tutorial.
+We use the same model as in [the previous tutorials on PopTorch](../). 
+Just like in the [previous tutorial](../tut2_efficient_data_loading), we are 
+using larger images (128x128) to simulate a heavier data load. This will make 
+the difference in memory between FP32 and FP16 meaningful enough to showcase 
+in this tutorial.
+
 
 ```python
-# Build the model
 class CustomModel(nn.Module):
     def __init__(self):
         super().__init__()
@@ -110,44 +127,94 @@ class CustomModel(nn.Module):
         if self.training:
             return x, self.loss(x, labels)
         return x
-
-model = CustomModel()
 ```
 
->**NOTE:** The model inherits `self.training` from `torch.nn.Module` which initialises its value to True. Use `model.eval()` to set it to False and `model.train()` to switch it back to True.
+>**NOTE:** The model inherits `self.training` from `torch.nn.Module` which 
+>initialises its value to True. Use `model.eval()` to set it to False and 
+>`model.train()` to switch it back to True.
+
+Choose parameters. 
+
+>**NOTE** If you wish to modify these parameters for educational purposes, 
+>make sure you re-run all the cells below this one, including this entire cell
+>as well:
+
+
+```python
+# Cast the model parameters to FP16
+model_half = True
+
+# Cast the data to FP16
+data_half = True
+
+# Cast the accumulation of gradients values types of the optimiser to FP16
+optimizer_half = True
+
+# Use stochasting rounding
+stochastic_rounding = True
+
+# Set partials data type to FP16
+partials_half = True
+```
 
 ### Casting a model's parameters
 
-The default data type of the parameters of a PyTorch module is FP32 (`torch.float32`). To convert all the parameters of a model to be represented in FP16 (`torch.float16`), an operation we will call _downcasting_, we simply do:
+The default data type of the parameters of a PyTorch module is FP32 
+(`torch.float32`). To convert all the parameters of a model to be represented 
+in FP16 (`torch.float16`), an operation we will call _downcasting_, we simply 
+do:
+
 
 ```python
-model = model.half()
+model = CustomModel()
+
+if model_half:
+    model = model.half()
 ```
 
 For this tutorial, we will cast all the model's parameters to FP16.
 
 ### Casting a single layer's parameters
 
-For bigger or more complex models, downcasting all the layers may generate numerical instabilities and cause underflows. While the PopTorch and the IPU offer features to alleviate those issues, it is still sensible for those models to cast only the parameters of certain layers and observe how it affects the overall training job. To downcast the parameters of a single layer, we select the layer by its _name_ and use `half()`:
+For bigger or more complex models, downcasting all the layers may generate 
+numerical instabilities and cause underflows. While the PopTorch and the IPU 
+offer features to alleviate those issues, it is still sensible for those 
+models to cast only the parameters of certain layers and observe how it 
+affects the overall training job. To downcast the parameters of a single 
+layer, we select the layer by its _name_ and use `half()`:
+
 
 ```python
 model.conv1 = model.conv1.half()
 ```
 
 If you would like to upcast a layer instead, you can use `model.conv1.float()`.
-
->**NOTE**: One can print out a list of the components of a PyTorch model, with their names, by doing `print(model)`.
+>**NOTE**: One can print out a list of the components of a PyTorch model, 
+>with their names, by doing `print(model)`.
 
 ## Prepare the data
 
-We will use the FashionMNIST dataset that we download from `torchvision`. The last stage of the pipeline will have to convert the data type of the tensors representing the images to `torch.half` (equivalent to `torch.float16`) so that our input data is also in FP16. This has the advantage of reducing the bandwidth needed between the host and the IPU.
+We will use the FashionMNIST dataset that we download from `torchvision`. The 
+last stage of the pipeline will have to convert the data type of the tensors 
+representing the images to `torch.half` (equivalent to `torch.float16`) so that 
+our input data is also in FP16. This has the advantage of reducing the 
+bandwidth needed between the host and the IPU.
+
 
 ```python
-transform = transforms.Compose([transforms.Resize(128),
-                                transforms.ToTensor(),
-                                transforms.Normalize((0.5,), (0.5,)),
-                                transforms.ConvertImageDtype(torch.half)])
+transform_list = [transforms.Resize(128),
+                  transforms.ToTensor(),
+                  transforms.Normalize((0.5,), (0.5,))]
+if data_half:
+    transform_list.append(transforms.ConvertImageDtype(torch.half))
+
+transform = transforms.Compose(transform_list)
+```
+
+Pull the datasets if they are not available locally:
+
 
+```python
 train_dataset = torchvision.datasets.FashionMNIST("./datasets/",
                                                   transform=transform,
                                                   download=True,
@@ -158,69 +225,117 @@ test_dataset = torchvision.datasets.FashionMNIST("./datasets/",
                                                  train=False)
 ```
 
-If the model has not been converted to half precision, but the input data has, then some layers of the model may be converted to use FP16. Conversely, if the input data has not been converted, but the model has, then the input tensors will be converted to FP16 on the IPU. This behaviour is the opposite of PyTorch's default behaviour.
+If the model has not been converted to half precision, but the input data has, 
+then some layers of the model may be converted to use FP16. Conversely, if the 
+input data has not been converted, but the model has, then the input tensors 
+will be converted to FP16 on the IPU. This behaviour is the opposite of 
+PyTorch's default behaviour.
 
->**NOTE**: To stop PopTorch automatically downcasting tensors and parameters, so that it preserves PyTorch's default behaviour (upcasting), use the option `opts.Precision.halfFloatCasting(poptorch.HalfFloatCastingBehavior.HalfUpcastToFloat)`.
+>**NOTE**: To stop PopTorch automatically downcasting tensors and parameters, 
+>so that it preserves PyTorch's default behaviour (upcasting), use the option:
+>`opts.Precision.halfFloatCasting(poptorch.HalfFloatCastingBehavior.HalfUpcastToFloat)`.
 
 ## Optimizers and loss scaling
 
-The value of the loss scaling factor can be passed as a parameter to the optimisers in `poptorch.optim`. In this tutorial, we will set it to 1024 for an AdamW optimizer. For all optimisers (except `poptorch.optim.SGD`), using a model in FP16 requires the argument `accum_type` to be set to `torch.float16` as well:
+The value of the loss scaling factor can be passed as a parameter to the 
+optimisers in `poptorch.optim`. In this tutorial, we will set it to 1024 for 
+an AdamW optimizer. For all optimisers (except `poptorch.optim.SGD`), 
+using a model in FP16 requires the argument `accum_type` to be set to 
+`torch.float16` as well:
+
 
 ```python
-optimizer = poptorch.optim.AdamW(model.parameters(),
+accum, loss_scaling = \
+    (torch.float16, 1024) if optimizer_half else (torch.float32, None)
+
+optimizer = poptorch.optim.AdamW(params=model.parameters(),
                                  lr=0.001,
-                                 loss_scaling=1024,
-                                 accum_type=torch.float16)
+                                 accum_type=accum,
+                                 loss_scaling=loss_scaling)
 ```
 
-While higher values of `loss_scaling` minimize underflows, values that are too high can also generate overflows as well as hurt convergence of the loss. The optimal value depends on the model and the training job. This is therefore a hyperparameter for you to tune.
+While higher values of `loss_scaling` minimize underflows, values that are 
+too high can also generate overflows as well as hurt convergence of the loss. 
+The optimal value depends on the model and the training job. This is therefore 
+a hyperparameter for you to tune.
 
 ## Set PopTorch's options
 
-To configure some features of the IPU and to be able to use PopTorch's classes in the next sections, we will need to create an instance of `poptorch.Options` which stores the options we will be using. We covered some of the available options in the [introductory tutorial for PopTorch](https://github.com/graphcore/examples/tree/master/tutorials/pytorch/tut1_basics).
+To configure some features of the IPU and to be able to use PopTorch's classes 
+in the next sections, we will need to create an instance of `poptorch.Options` 
+which stores the options we will be using. We covered some of the available 
+options in: [introductory tutorial for PopTorch](https://github.com/graphcore/examples/tree/master/tutorials/pytorch/tut1_basics).
+
+Let's initialise our options object before we talk about the options 
+we will use:
 
-Let's initialise our options object before we talk about the options we will use:
 
 ```python
 opts = poptorch.Options()
 ```
 
->**NOTE**: This tutorial has been designed to be run on a single IPU. If you do not have access to an IPU, you can use the option [`useIpuModel`](https://docs.graphcore.ai/projects/poptorch-user-guide/en/latest/overview.html#poptorch.Options.useIpuModel) to run a simulation on CPU instead. You can read more on the IPU Model and its limitations [here](https://docs.graphcore.ai/projects/poplar-user-guide/en/latest/poplar_programs.html#programming-with-poplar).
+>**NOTE**: This tutorial has been designed to be run on a single IPU. 
+>If you do not have access to an IPU, you can use the option:
+> -[`useIpuModel`](https://docs.graphcore.ai/projects/poptorch-user-guide/en/latest/overview.html#poptorch.Options.useIpuModel)
+>to run a simulation on CPU instead. You can read more on the IPU Model 
+>and its limitations [here](https://docs.graphcore.ai/projects/poplar-user-guide/en/latest/poplar_programs.html#programming-with-poplar).
 
 ### Stochastic rounding
 
-With the IPU, stochastic rounding is implemented directly in the hardware and only requires you to enable it. To do so, there is the option `enableStochasticRounding` in the `Precision` namespace of `poptorch.Options`. This namespace holds other options for using mixed precision that we will talk about. To enable stochastic rounding, we do:
+With the IPU, stochastic rounding is implemented directly in the hardware and 
+only requires you to enable it. To do so, there is the option 
+`enableStochasticRounding` in the `Precision` namespace of `poptorch.Options`. 
+This namespace holds other options for using mixed precision that we will talk 
+about. To enable stochastic rounding, we do:
+
 
 ```python
-opts.Precision.enableStochasticRounding(True)
+if stochastic_rounding:
+    opts.Precision.enableStochasticRounding(True)
 ```
 
-With the IPU Model, this option won't change anything since stochastic rounding is implemented on the IPU.
+With the IPU Model, this option won't change anything since stochastic 
+rounding is implemented on the IPU.
 
 ### Partials data type
 
-Matrix multiplications and convolutions have intermediate states we call _partials_. Those partials can be stored in FP32 or FP16. There is a memory benefit to using FP16 partials but the main benefit is that it can increase the throughput for some models without affecting accuracy. However there is a risk of increasing numerical instability if the values being multiplied are small, due to underflows. The default data type of partials is the input's data type(FP16). For this tutorial, we set partials to FP32 just to showcase how it can be done. We use the option `setPartialsType` to do it:
+Matrix multiplications and convolutions have intermediate states we 
+call _partials_. Those partials can be stored in FP32 or FP16. There is 
+a memory benefit to using FP16 partials but the main benefit is that it can 
+increase the throughput for some models without affecting accuracy. However 
+there is a risk of increasing numerical instability if the values being 
+multiplied are small, due to underflows. The default data type of partials is 
+the input's data type(FP16). For this tutorial, we set partials to FP32 just 
+to showcase how it can be done. We use the option `setPartialsType` to do it:
+
 
 ```python
-opts.Precision.setPartialsType(torch.float)
+if partials_half:
+    opts.Precision.setPartialsType(torch.half)
+else:
+    opts.Precision.setPartialsType(torch.float)
 ```
 
 ## Train the model
 
-We can now train the model. After we have set all our options, we reuse our `poptorch.Options` instance for the training `poptorch.DataLoader` that we will be using:
+We can now train the model. After we have set all our options, we reuse 
+our `poptorch.Options` instance for the training `poptorch.DataLoader` 
+that we will be using:
+
 
 ```python
 train_dataloader = poptorch.DataLoader(opts,
                                        train_dataset,
                                        batch_size=12,
                                        shuffle=True,
-                                       num_workers=40)
+                                       num_workers=4)
 ```
 
-We first make sure our model is in training mode, and then wrap it with `poptorch.trainingModel`.
+We first make sure our model is in training mode, and then wrap it 
+with `poptorch.trainingModel`.
+
 
 ```python
-model.train()
 poptorch_model = poptorch.trainingModel(model,
                                         options=opts,
                                         optimizer=optimizer)
@@ -228,6 +343,7 @@ poptorch_model = poptorch.trainingModel(model,
 
 Let's run the training loop for 10 epochs.
 
+
 ```python
 epochs = 10
 for epoch in tqdm(range(epochs), desc="epochs"):
@@ -237,44 +353,84 @@ for epoch in tqdm(range(epochs), desc="epochs"):
         total_loss += loss
 ```
 
+Release IPU resources:
+
+
+```python
+poptorch_model.detachFromDevice()
+```
+
 Our new model is now trained and we can start its evaluation.
 
 ## Evaluate the model
 
-Some PyTorch's operations, such as CNNs, are not supported in FP16 on the CPU, so we will evaluate our fine-tuned model in mixed precision on an IPU using `poptorch.inferenceModel`.
+Some PyTorch's operations, such as CNNs, are not supported in FP16 on the CPU, 
+so we will evaluate our fine-tuned model in mixed precision on an IPU 
+using `poptorch.inferenceModel`.
+
 
 ```python
 model.eval()
 poptorch_model_inf = poptorch.inferenceModel(model, options=opts)
-
 test_dataloader = poptorch.DataLoader(opts,
                                       test_dataset,
                                       batch_size=32,
-                                      num_workers=40)
+                                      num_workers=4)
+```
+
+Run inference on the labelled data
 
+
+```python
 predictions, labels = [], []
 for data, label in test_dataloader:
-    predictions += poptorch_model_inf(data).data.max(dim=1).indices
+    predictions += poptorch_model_inf(data).data.float().max(dim=1).indices
     labels += label
+```
+
+Release IPU resources:
+
 
-print(f"Eval accuracy on IPU: {100 * (1 - torch.count_nonzero(torch.sub(torch.tensor(labels), torch.tensor(predictions))) / len(labels)):.2f}%")
+```python
+poptorch_model_inf.detachFromDevice()
 ```
 
 We obtained an accuracy of approximately 84% on the test dataset.
 
+
+```python
+print(f"""Eval accuracy on IPU: {100 *
+                (1 - torch.count_nonzero(torch.sub(torch.tensor(labels),
+                torch.tensor(predictions))) / len(labels)):.2f}%""")
+```
+
+    Eval accuracy on IPU: 85.38%
+
+
 # Visualise the memory footprint
 
-We can visually compare the memory footprint on the IPU of the model trained in FP16 and FP32, thanks to Graphcore's [PopVision Graph Analyser](https://docs.graphcore.ai/projects/graphcore-popvision-user-guide/en/latest/graph/graph.html).
+We can visually compare the memory footprint on the IPU of the model trained 
+in FP16 and FP32, thanks to Graphcore's [PopVision Graph Analyser](https://docs.graphcore.ai/projects/graphcore-popvision-user-guide/en/latest/graph/graph.html).
 
-We generated memory reports of the same training session as covered in this tutorial for both cases: with and without downcasting the model with `model.half()`. Here is the figure of both memory footprints, where "source" and "target" represent the model trained in FP16 and FP32 respectively:
+We generated memory reports of the same training session as covered in this 
+tutorial for both cases: with and without downcasting the model with 
+`model.half()`. Here is the figure of both memory footprints, where "source" 
+and "target" represent the model trained in FP16 and FP32 respectively:
 
 ![Comparison of memory footprints](static/MemoryDiffReport.png)
 
-We observed a ~26% reduction in memory usage with the settings of this tutorial, including from peak to peak. The impact on the accuracy was also small, with less than 1% lost!
+We observed a ~26% reduction in memory usage with the settings of this 
+tutorial, including from peak to peak. The impact on the accuracy was also 
+small, with less than 1% lost!
 
 # Debug floating-point exceptions
 
-Floating-point issues can be difficult to debug because the model will simply appear to not be training without specific information about what went wrong. For more detailed information on the issue we set `debug.floatPointOpException` to true in the environment variable `POPLAR_ENGINE_OPTIONS`. To set this, you can add the folowing before the command you use to run your model:
+Floating-point issues can be difficult to debug because the model will simply 
+appear to not be training without specific information about what went wrong. 
+For more detailed information on the issue we set 
+`debug.floatPointOpException` to true in the environment variable 
+`POPLAR_ENGINE_OPTIONS`. To set this, you can add the folowing before 
+the command  you use to run your model:
 
 ```python
 POPLAR_ENGINE_OPTIONS='{"debug.floatPointOpException": "true"}'
@@ -282,9 +438,22 @@ POPLAR_ENGINE_OPTIONS='{"debug.floatPointOpException": "true"}'
 
 # PopTorch tracing and casting
 
-Because PopTorch relies on the `torch.jit.trace` API, it is limited to tracing operations which run on the CPU. Many of these operations do not support FP16 inputs due to numerical stability issues. To allow the full range of operations, PopTorch converts all FP16 inputs to FP32 before tracing and then restores them to FP16. This is because the model must always be traced with FP16 inputs converted to FP32.
+Because PopTorch relies on the `torch.jit.trace` API, it is limited to tracing 
+operations which run on the CPU. Many of these operations do not support FP16 
+inputs due to numerical stability issues. To allow the full range 
+of operations, PopTorch converts all FP16 inputs to FP32 before tracing and 
+then restores them to FP16. This is because the model must always be traced 
+with FP16 inputs converted to FP32.
+
+PopTorch’s default casting functionality is to output in FP16 if any input 
+of the operation is FP16. This is opposite to PyTorch, which outputs in FP32 
+if any input of the operations is in FP32. To achieve the same behaviour 
+in PopTorch, one can use: 
+`opts.Precision.halfFloatCasting(poptorch.HalfFloatCastingBehavior.HalfUpcastToFloat)`.
+
+Below you can see the difference between native PyTorch and 
+PopTorch (with and without the option mentioned above):
 
-PopTorch’s default casting functionality is to output in FP16 if any input of the operation is FP16. This is opposite to PyTorch, which outputs in FP32 if any input of the operations is in FP32. To achieve the same behaviour in PopTorch, one can use `opts.Precision.halfFloatCasting(poptorch.HalfFloatCastingBehavior.HalfUpcastToFloat)`. Below you can see the difference between native PyTorch and PopTorch (with and without the option mentioned above):
 
 ```python
 class Model(torch.nn.Module):
@@ -295,24 +464,48 @@ native_model = Model()
 
 float16_tensor = torch.tensor([1.0], dtype=torch.float16)
 float32_tensor = torch.tensor([1.0], dtype=torch.float32)
+```
 
-# Native PyTorch results in a FP32 tensor
+Native PyTorch results in a FP32 tensor:
+
+
+```python
 assert native_model(float32_tensor, float16_tensor).dtype == torch.float32
+```
+
+Let's instantiate default PopTorch `Options` for IPUs:
 
+
+```python
 opts = poptorch.Options()
+```
+
+PopTorch results in a FP16 tensor:
+
 
-# PopTorch results in a FP16 tensor
+```python
 poptorch_model = poptorch.inferenceModel(native_model, opts)
 assert poptorch_model(float32_tensor, float16_tensor).dtype == torch.float16
+```
+
+This option makes the same PopTorch example result in an FP32 tensor:
 
+
+```python
 opts.Precision.halfFloatCasting(
     poptorch.HalfFloatCastingBehavior.HalfUpcastToFloat)
 
-# The option above makes the same PopTorch example result in an FP32 tensor
 poptorch_model = poptorch.inferenceModel(native_model, opts)
 assert poptorch_model(float32_tensor, float16_tensor).dtype == torch.float32
 ```
 
+Release IPU resources:
+
+
+```python
+poptorch_model.detachFromDevice()
+```
+
 # Summary
 - Use half and mixed precision when you need to save memory on the IPU.
 - You can cast a PyTorch model or a specific layer to FP16 using:
@@ -322,8 +515,10 @@ assert poptorch_model(float32_tensor, float16_tensor).dtype == torch.float32
     # Layer
     model.layer.half()
     ```
-- Several features are available in PopTorch to improve the numerical stability of a model in FP16:
+- Several features are available in PopTorch to improve the numerical 
+stability of a model in FP16:
     - Loss scaling: `poptorch.optim.SGD(..., loss_scaling=1000)`
     - Stochastic rounding: `opts.Precision.enableStochasticRounding(True)`
     - Upcast partials data types: `opts.Precision.setPartialsType(torch.float)`
-- The [PopVision Graph Analyser](https://docs.graphcore.ai/projects/graphcore-popvision-user-guide/en/latest/graph/graph.html) can be used to inspect the memory usage of a model and to help debug issues.
+- The [PopVision Graph Analyser](https://docs.graphcore.ai/projects/graphcore-popvision-user-guide/en/latest/graph/graph.html) 
+can be used to inspect the memory usage of a model and to help debug issues.
diff --git a/tutorials/pytorch/tut3_mixed_precision/walkthrough.ipynb b/tutorials/pytorch/tut3_mixed_precision/walkthrough.ipynb
new file mode 100644
index 0000000..86ba49f
--- /dev/null
+++ b/tutorials/pytorch/tut3_mixed_precision/walkthrough.ipynb
@@ -0,0 +1,916 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "c763a15b",
+   "metadata": {},
+   "source": [
+    "Copyright (c) 2021 Graphcore Ltd. All rights reserved."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "413c7082",
+   "metadata": {},
+   "source": [
+    "# Half and mixed precision in PopTorch\n",
+    "This tutorial shows how to use half and mixed precision in PopTorch with the\n",
+    "example task of training a simple CNN model on a single Graphcore IPU (Mk1 or \n",
+    "Mk2)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "103fb5d6",
+   "metadata": {},
+   "source": [
+    "Requirements:\n",
+    "- an installed Poplar SDK. See the Getting Started guide for your IPU hardware \n",
+    "for details of how to install the SDK;\n",
+    "- Other Python modules: `pip install -r requirements.txt`"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b526aa7e",
+   "metadata": {},
+   "source": [
+    "# General\n",
+    "\n",
+    "## Motives for half precision\n",
+    "\n",
+    "Data is stored in memory, and some formats to store that data require less \n",
+    "memory than others. In a device's memory, when it comes to numerical data, \n",
+    "we use either integers or real numbers. Real numbers are represented by one \n",
+    "of several floating point formats, which vary in how many bits they use to \n",
+    "represent each number. Using more bits allows for greater precision and a \n",
+    "wider range of representable numbers, whereas using fewer bits allows for \n",
+    "faster calculations and reduces memory and power usage. \n",
+    "\n",
+    "In deep learning applications, where less precise calculations are acceptable \n",
+    "and throughput is critical, using a lower precision format can provide \n",
+    "substantial gains in performance.\n",
+    "\n",
+    "The Graphcore IPU provides native support for two floating-point formats:\n",
+    "\n",
+    "- IEEE single-precision, which uses 32 bits for each number (FP32)\n",
+    "- IEEE half-precision, which uses 16 bits for each number (FP16)\n",
+    "\n",
+    "Some applications which use FP16 do all calculations in FP16, whereas others \n",
+    "use a mix of FP16 and FP32. The latter approach is known as *mixed precision*.\n",
+    "\n",
+    "In this tutorial, we are going to talk about real numbers represented \n",
+    "in FP32 and FP16, and how to use these data types (dtypes) in PopTorch in \n",
+    "order to reduce the memory requirements of a model.\n",
+    "\n",
+    "## Numerical stability\n",
+    "\n",
+    "Numeric stability refers to how a model's performance is affected by the use \n",
+    "of a lower-precision dtype. We say an operation is \"numerically unstable\" in \n",
+    "FP16 if running it in this dtype causes the model to have worse accuracy \n",
+    "compared to running the operation in FP32. Two techniques that can be used to \n",
+    "increase the numerical stability of a model are  loss scaling and stochastic \n",
+    "rounding.\n",
+    "\n",
+    "### Loss scaling\n",
+    "\n",
+    "A numerical issue that can occur when training a model in half-precision is \n",
+    "that the gradients can underflow. This can be difficult to debug because the \n",
+    "model will simply appear to not be training, and can be especially damaging \n",
+    "because any gradients which underflow will propagate a value of 0 backwards \n",
+    "to other gradient calculations.\n",
+    "\n",
+    "The standard solution to this is known as *loss scaling*, which consists of \n",
+    "scaling up the loss value right before the start of backpropagation to prevent \n",
+    "numerical underflow of the gradients. Instructions on how to use loss scaling \n",
+    "will be discussed later in this tutorial.\n",
+    "\n",
+    "### Stochastic rounding\n",
+    "\n",
+    "When training in half or mixed precision, numbers multiplied by each other \n",
+    "will need to be rounded in order to fit into the floating point format used. \n",
+    "Stochastic rounding is the process of using a probabilistic equation for the \n",
+    "rounding. Instead of always rounding to the nearest representable number, we \n",
+    "round up or down with a probability such that the expected value after \n",
+    "rounding is equal to the value before rounding. Since the expected value of \n",
+    "an addition after rounding is equal to the exact result of the addition, the \n",
+    "expected value of a sum is also its exact value.\n",
+    "\n",
+    "This means that on average, the values of the parameters of a network will be \n",
+    "close to the values they would have had if a higher-precision format had been \n",
+    "used. The added bonus of using stochastic rounding is that the parameters can \n",
+    "be stored in FP16, which means the parameters can be stored using half as much \n",
+    "memory. This can be especially helpful when training with small batch sizes, \n",
+    "where the memory used to store the parameters is proportionally greater than \n",
+    "the memory used to store parameters when training with large batch sizes.\n",
+    "\n",
+    "It is highly recommended that you enable this feature when training neural \n",
+    "networks with FP16 weights. The instructions to enable it in PopTorch are \n",
+    "presented later in this tutorial."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3e733e5a",
+   "metadata": {},
+   "source": [
+    "Import the packages"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7d2c6071",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "import torch.nn as nn\n",
+    "import torchvision\n",
+    "import torchvision.transforms as transforms\n",
+    "import poptorch\n",
+    "from tqdm.auto import tqdm"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c3f05fbf",
+   "metadata": {},
+   "source": [
+    "## Build the model\n",
+    "\n",
+    "We use the same model as in [the previous tutorials on PopTorch](../). \n",
+    "Just like in the [previous tutorial](../tut2_efficient_data_loading), we are \n",
+    "using larger images (128x128) to simulate a heavier data load. This will make \n",
+    "the difference in memory between FP32 and FP16 meaningful enough to showcase \n",
+    "in this tutorial."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d7299b55",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class CustomModel(nn.Module):\n",
+    "    def __init__(self):\n",
+    "        super().__init__()\n",
+    "        self.conv1 = nn.Conv2d(1, 5, 3)\n",
+    "        self.pool = nn.MaxPool2d(2, 2)\n",
+    "        self.conv2 = nn.Conv2d(5, 12, 5)\n",
+    "        self.norm = nn.GroupNorm(3, 12)\n",
+    "        self.fc1 = nn.Linear(41772, 100)\n",
+    "        self.relu = nn.ReLU()\n",
+    "        self.fc2 = nn.Linear(100, 10)\n",
+    "        self.log_softmax = nn.LogSoftmax(dim=0)\n",
+    "        self.loss = nn.NLLLoss()\n",
+    "\n",
+    "    def forward(self, x, labels=None):\n",
+    "        x = self.pool(self.relu(self.conv1(x)))\n",
+    "        x = self.norm(self.relu(self.conv2(x)))\n",
+    "        x = torch.flatten(x, start_dim=1)\n",
+    "        x = self.relu(self.fc1(x))\n",
+    "        x = self.log_softmax(self.fc2(x))\n",
+    "        # The model is responsible for the calculation\n",
+    "        # of the loss when using an IPU. We do it this way:\n",
+    "        if self.training:\n",
+    "            return x, self.loss(x, labels)\n",
+    "        return x"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4ead1d52",
+   "metadata": {},
+   "source": [
+    ">**NOTE:** The model inherits `self.training` from `torch.nn.Module` which \n",
+    ">initialises its value to True. Use `model.eval()` to set it to False and \n",
+    ">`model.train()` to switch it back to True."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "924cd4fd",
+   "metadata": {},
+   "source": [
+    "Choose parameters. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b20441df",
+   "metadata": {},
+   "source": [
+    ">**NOTE** If you wish to modify these parameters for educational purposes, \n",
+    ">make sure you re-run all the cells below this one, including this entire cell\n",
+    ">as well:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f2d1af74",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Cast the model parameters to FP16\n",
+    "model_half = True\n",
+    "\n",
+    "# Cast the data to FP16\n",
+    "data_half = True\n",
+    "\n",
+    "# Cast the accumulation of gradients values types of the optimiser to FP16\n",
+    "optimizer_half = True\n",
+    "\n",
+    "# Use stochasting rounding\n",
+    "stochastic_rounding = True\n",
+    "\n",
+    "# Set partials data type to FP16\n",
+    "partials_half = True"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6c132f5f",
+   "metadata": {},
+   "source": [
+    "### Casting a model's parameters\n",
+    "\n",
+    "The default data type of the parameters of a PyTorch module is FP32 \n",
+    "(`torch.float32`). To convert all the parameters of a model to be represented \n",
+    "in FP16 (`torch.float16`), an operation we will call _downcasting_, we simply \n",
+    "do:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8e78f833",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model = CustomModel()\n",
+    "\n",
+    "if model_half:\n",
+    "    model = model.half()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b320d37f",
+   "metadata": {},
+   "source": [
+    "For this tutorial, we will cast all the model's parameters to FP16."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3edd9232",
+   "metadata": {},
+   "source": [
+    "### Casting a single layer's parameters\n",
+    "\n",
+    "For bigger or more complex models, downcasting all the layers may generate \n",
+    "numerical instabilities and cause underflows. While the PopTorch and the IPU \n",
+    "offer features to alleviate those issues, it is still sensible for those \n",
+    "models to cast only the parameters of certain layers and observe how it \n",
+    "affects the overall training job. To downcast the parameters of a single \n",
+    "layer, we select the layer by its _name_ and use `half()`:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3b76567b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model.conv1 = model.conv1.half()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5d3c1b33",
+   "metadata": {},
+   "source": [
+    "If you would like to upcast a layer instead, you can use `model.conv1.float()`.\n",
+    ">**NOTE**: One can print out a list of the components of a PyTorch model, \n",
+    ">with their names, by doing `print(model)`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ad340e11",
+   "metadata": {},
+   "source": [
+    "## Prepare the data\n",
+    "\n",
+    "We will use the FashionMNIST dataset that we download from `torchvision`. The \n",
+    "last stage of the pipeline will have to convert the data type of the tensors \n",
+    "representing the images to `torch.half` (equivalent to `torch.float16`) so that \n",
+    "our input data is also in FP16. This has the advantage of reducing the \n",
+    "bandwidth needed between the host and the IPU."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c09a9bb8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "transform_list = [transforms.Resize(128),\n",
+    "                  transforms.ToTensor(),\n",
+    "                  transforms.Normalize((0.5,), (0.5,))]\n",
+    "if data_half:\n",
+    "    transform_list.append(transforms.ConvertImageDtype(torch.half))\n",
+    "\n",
+    "transform = transforms.Compose(transform_list)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a55f8f87",
+   "metadata": {},
+   "source": [
+    "Pull the datasets if they are not available locally:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e9ed38b5",
+   "metadata": {
+    "tags": [
+     "sst_hide_output"
+    ]
+   },
+   "outputs": [],
+   "source": [
+    "train_dataset = torchvision.datasets.FashionMNIST(\"./datasets/\",\n",
+    "                                                  transform=transform,\n",
+    "                                                  download=True,\n",
+    "                                                  train=True)\n",
+    "test_dataset = torchvision.datasets.FashionMNIST(\"./datasets/\",\n",
+    "                                                 transform=transform,\n",
+    "                                                 download=True,\n",
+    "                                                 train=False)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c06a482e",
+   "metadata": {},
+   "source": [
+    "If the model has not been converted to half precision, but the input data has, \n",
+    "then some layers of the model may be converted to use FP16. Conversely, if the \n",
+    "input data has not been converted, but the model has, then the input tensors \n",
+    "will be converted to FP16 on the IPU. This behaviour is the opposite of \n",
+    "PyTorch's default behaviour.\n",
+    "\n",
+    ">**NOTE**: To stop PopTorch automatically downcasting tensors and parameters, \n",
+    ">so that it preserves PyTorch's default behaviour (upcasting), use the option:\n",
+    ">`opts.Precision.halfFloatCasting(poptorch.HalfFloatCastingBehavior.HalfUpcastToFloat)`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b3f8c6d",
+   "metadata": {},
+   "source": [
+    "## Optimizers and loss scaling\n",
+    "\n",
+    "The value of the loss scaling factor can be passed as a parameter to the \n",
+    "optimisers in `poptorch.optim`. In this tutorial, we will set it to 1024 for \n",
+    "an AdamW optimizer. For all optimisers (except `poptorch.optim.SGD`), \n",
+    "using a model in FP16 requires the argument `accum_type` to be set to \n",
+    "`torch.float16` as well:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "058dc529",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "accum, loss_scaling = \\\n",
+    "    (torch.float16, 1024) if optimizer_half else (torch.float32, None)\n",
+    "\n",
+    "optimizer = poptorch.optim.AdamW(params=model.parameters(),\n",
+    "                                 lr=0.001,\n",
+    "                                 accum_type=accum,\n",
+    "                                 loss_scaling=loss_scaling)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "344ad251",
+   "metadata": {},
+   "source": [
+    "While higher values of `loss_scaling` minimize underflows, values that are \n",
+    "too high can also generate overflows as well as hurt convergence of the loss. \n",
+    "The optimal value depends on the model and the training job. This is therefore \n",
+    "a hyperparameter for you to tune."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b047f12e",
+   "metadata": {},
+   "source": [
+    "## Set PopTorch's options\n",
+    "\n",
+    "To configure some features of the IPU and to be able to use PopTorch's classes \n",
+    "in the next sections, we will need to create an instance of `poptorch.Options` \n",
+    "which stores the options we will be using. We covered some of the available \n",
+    "options in: [introductory tutorial for PopTorch](https://github.com/graphcore/examples/tree/master/tutorials/pytorch/tut1_basics).\n",
+    "\n",
+    "Let's initialise our options object before we talk about the options \n",
+    "we will use:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a1429abe",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "opts = poptorch.Options()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0e3eed4e",
+   "metadata": {},
+   "source": [
+    ">**NOTE**: This tutorial has been designed to be run on a single IPU. \n",
+    ">If you do not have access to an IPU, you can use the option:\n",
+    "> -[`useIpuModel`](https://docs.graphcore.ai/projects/poptorch-user-guide/en/latest/overview.html#poptorch.Options.useIpuModel)\n",
+    ">to run a simulation on CPU instead. You can read more on the IPU Model \n",
+    ">and its limitations [here](https://docs.graphcore.ai/projects/poplar-user-guide/en/latest/poplar_programs.html#programming-with-poplar)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e9d133d6",
+   "metadata": {},
+   "source": [
+    "### Stochastic rounding\n",
+    "\n",
+    "With the IPU, stochastic rounding is implemented directly in the hardware and \n",
+    "only requires you to enable it. To do so, there is the option \n",
+    "`enableStochasticRounding` in the `Precision` namespace of `poptorch.Options`. \n",
+    "This namespace holds other options for using mixed precision that we will talk \n",
+    "about. To enable stochastic rounding, we do:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3547f7c0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "if stochastic_rounding:\n",
+    "    opts.Precision.enableStochasticRounding(True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "44ad503a",
+   "metadata": {},
+   "source": [
+    "With the IPU Model, this option won't change anything since stochastic \n",
+    "rounding is implemented on the IPU."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4f30b68b",
+   "metadata": {},
+   "source": [
+    "### Partials data type\n",
+    "\n",
+    "Matrix multiplications and convolutions have intermediate states we \n",
+    "call _partials_. Those partials can be stored in FP32 or FP16. There is \n",
+    "a memory benefit to using FP16 partials but the main benefit is that it can \n",
+    "increase the throughput for some models without affecting accuracy. However \n",
+    "there is a risk of increasing numerical instability if the values being \n",
+    "multiplied are small, due to underflows. The default data type of partials is \n",
+    "the input's data type(FP16). For this tutorial, we set partials to FP32 just \n",
+    "to showcase how it can be done. We use the option `setPartialsType` to do it:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9e685d84",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "if partials_half:\n",
+    "    opts.Precision.setPartialsType(torch.half)\n",
+    "else:\n",
+    "    opts.Precision.setPartialsType(torch.float)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "18075058",
+   "metadata": {},
+   "source": [
+    "## Train the model\n",
+    "\n",
+    "We can now train the model. After we have set all our options, we reuse \n",
+    "our `poptorch.Options` instance for the training `poptorch.DataLoader` \n",
+    "that we will be using:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3dc15aa0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "train_dataloader = poptorch.DataLoader(opts,\n",
+    "                                       train_dataset,\n",
+    "                                       batch_size=12,\n",
+    "                                       shuffle=True,\n",
+    "                                       num_workers=4)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "15afcbba",
+   "metadata": {},
+   "source": [
+    "We first make sure our model is in training mode, and then wrap it \n",
+    "with `poptorch.trainingModel`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f991a738",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "poptorch_model = poptorch.trainingModel(model,\n",
+    "                                        options=opts,\n",
+    "                                        optimizer=optimizer)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3fd256a9",
+   "metadata": {},
+   "source": [
+    "Let's run the training loop for 10 epochs."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ddeee51c",
+   "metadata": {
+    "tags": [
+     "sst_hide_output"
+    ]
+   },
+   "outputs": [],
+   "source": [
+    "epochs = 10\n",
+    "for epoch in tqdm(range(epochs), desc=\"epochs\"):\n",
+    "    total_loss = 0.0\n",
+    "    for data, labels in tqdm(train_dataloader, desc=\"batches\", leave=False):\n",
+    "        output, loss = poptorch_model(data, labels)\n",
+    "        total_loss += loss"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "14615613",
+   "metadata": {},
+   "source": [
+    "Release IPU resources:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "df050b9b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "poptorch_model.detachFromDevice()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9862f6c1",
+   "metadata": {},
+   "source": [
+    "Our new model is now trained and we can start its evaluation."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f22358c6",
+   "metadata": {},
+   "source": [
+    "## Evaluate the model\n",
+    "\n",
+    "Some PyTorch's operations, such as CNNs, are not supported in FP16 on the CPU, \n",
+    "so we will evaluate our fine-tuned model in mixed precision on an IPU \n",
+    "using `poptorch.inferenceModel`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6f8d81de",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model.eval()\n",
+    "poptorch_model_inf = poptorch.inferenceModel(model, options=opts)\n",
+    "test_dataloader = poptorch.DataLoader(opts,\n",
+    "                                      test_dataset,\n",
+    "                                      batch_size=32,\n",
+    "                                      num_workers=4)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "717a8245",
+   "metadata": {},
+   "source": [
+    "Run inference on the labelled data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "bdb56600",
+   "metadata": {
+    "tags": [
+     "sst_hide_output"
+    ]
+   },
+   "outputs": [],
+   "source": [
+    "predictions, labels = [], []\n",
+    "for data, label in test_dataloader:\n",
+    "    predictions += poptorch_model_inf(data).data.float().max(dim=1).indices\n",
+    "    labels += label"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8eb50688",
+   "metadata": {},
+   "source": [
+    "Release IPU resources:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fdae998f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "poptorch_model_inf.detachFromDevice()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b66560fb",
+   "metadata": {},
+   "source": [
+    "We obtained an accuracy of approximately 84% on the test dataset."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f057eb38",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(f\"\"\"Eval accuracy on IPU: {100 *\n",
+    "                (1 - torch.count_nonzero(torch.sub(torch.tensor(labels),\n",
+    "                torch.tensor(predictions))) / len(labels)):.2f}%\"\"\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "df567031",
+   "metadata": {},
+   "source": [
+    "# Visualise the memory footprint\n",
+    "\n",
+    "We can visually compare the memory footprint on the IPU of the model trained \n",
+    "in FP16 and FP32, thanks to Graphcore's [PopVision Graph Analyser](https://docs.graphcore.ai/projects/graphcore-popvision-user-guide/en/latest/graph/graph.html).\n",
+    "\n",
+    "We generated memory reports of the same training session as covered in this \n",
+    "tutorial for both cases: with and without downcasting the model with \n",
+    "`model.half()`. Here is the figure of both memory footprints, where \"source\" \n",
+    "and \"target\" represent the model trained in FP16 and FP32 respectively:\n",
+    "\n",
+    "![Comparison of memory footprints](static/MemoryDiffReport.png)\n",
+    "\n",
+    "We observed a ~26% reduction in memory usage with the settings of this \n",
+    "tutorial, including from peak to peak. The impact on the accuracy was also \n",
+    "small, with less than 1% lost!\n",
+    "\n",
+    "# Debug floating-point exceptions\n",
+    "\n",
+    "Floating-point issues can be difficult to debug because the model will simply \n",
+    "appear to not be training without specific information about what went wrong. \n",
+    "For more detailed information on the issue we set \n",
+    "`debug.floatPointOpException` to true in the environment variable \n",
+    "`POPLAR_ENGINE_OPTIONS`. To set this, you can add the folowing before \n",
+    "the command  you use to run your model:\n",
+    "\n",
+    "```python\n",
+    "POPLAR_ENGINE_OPTIONS='{\"debug.floatPointOpException\": \"true\"}'\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f011bb8e",
+   "metadata": {},
+   "source": [
+    "# PopTorch tracing and casting\n",
+    "\n",
+    "Because PopTorch relies on the `torch.jit.trace` API, it is limited to tracing \n",
+    "operations which run on the CPU. Many of these operations do not support FP16 \n",
+    "inputs due to numerical stability issues. To allow the full range \n",
+    "of operations, PopTorch converts all FP16 inputs to FP32 before tracing and \n",
+    "then restores them to FP16. This is because the model must always be traced \n",
+    "with FP16 inputs converted to FP32.\n",
+    "\n",
+    "PopTorch’s default casting functionality is to output in FP16 if any input \n",
+    "of the operation is FP16. This is opposite to PyTorch, which outputs in FP32 \n",
+    "if any input of the operations is in FP32. To achieve the same behaviour \n",
+    "in PopTorch, one can use: \n",
+    "`opts.Precision.halfFloatCasting(poptorch.HalfFloatCastingBehavior.HalfUpcastToFloat)`.\n",
+    "\n",
+    "Below you can see the difference between native PyTorch and \n",
+    "PopTorch (with and without the option mentioned above):"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "bdfa5241",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class Model(torch.nn.Module):\n",
+    "    def forward(self, x, y):\n",
+    "        return x + y\n",
+    "\n",
+    "native_model = Model()\n",
+    "\n",
+    "float16_tensor = torch.tensor([1.0], dtype=torch.float16)\n",
+    "float32_tensor = torch.tensor([1.0], dtype=torch.float32)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e384a946",
+   "metadata": {},
+   "source": [
+    "Native PyTorch results in a FP32 tensor:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "88178f85",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "assert native_model(float32_tensor, float16_tensor).dtype == torch.float32"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c0aafef2",
+   "metadata": {},
+   "source": [
+    "Let's instantiate default PopTorch `Options` for IPUs:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "95c60c61",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "opts = poptorch.Options()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "39f92f2d",
+   "metadata": {},
+   "source": [
+    "PopTorch results in a FP16 tensor:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "83732984",
+   "metadata": {
+    "tags": [
+     "sst_hide_output"
+    ]
+   },
+   "outputs": [],
+   "source": [
+    "poptorch_model = poptorch.inferenceModel(native_model, opts)\n",
+    "assert poptorch_model(float32_tensor, float16_tensor).dtype == torch.float16"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "32979eef",
+   "metadata": {},
+   "source": [
+    "This option makes the same PopTorch example result in an FP32 tensor:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a73b50ce",
+   "metadata": {
+    "tags": [
+     "sst_hide_output"
+    ]
+   },
+   "outputs": [],
+   "source": [
+    "opts.Precision.halfFloatCasting(\n",
+    "    poptorch.HalfFloatCastingBehavior.HalfUpcastToFloat)\n",
+    "\n",
+    "poptorch_model = poptorch.inferenceModel(native_model, opts)\n",
+    "assert poptorch_model(float32_tensor, float16_tensor).dtype == torch.float32"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4dd68ce1",
+   "metadata": {},
+   "source": [
+    "Release IPU resources:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a76ddb20",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "poptorch_model.detachFromDevice()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1eaf3c08",
+   "metadata": {},
+   "source": [
+    "# Summary\n",
+    "- Use half and mixed precision when you need to save memory on the IPU.\n",
+    "- You can cast a PyTorch model or a specific layer to FP16 using:\n",
+    "    ```python\n",
+    "    # Model\n",
+    "    model.half()\n",
+    "    # Layer\n",
+    "    model.layer.half()\n",
+    "    ```\n",
+    "- Several features are available in PopTorch to improve the numerical \n",
+    "stability of a model in FP16:\n",
+    "    - Loss scaling: `poptorch.optim.SGD(..., loss_scaling=1000)`\n",
+    "    - Stochastic rounding: `opts.Precision.enableStochasticRounding(True)`\n",
+    "    - Upcast partials data types: `opts.Precision.setPartialsType(torch.float)`\n",
+    "- The [PopVision Graph Analyser](https://docs.graphcore.ai/projects/graphcore-popvision-user-guide/en/latest/graph/graph.html) \n",
+    "can be used to inspect the memory usage of a model and to help debug issues."
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/tutorials/pytorch/tut3_mixed_precision/walkthrough.py b/tutorials/pytorch/tut3_mixed_precision/walkthrough.py
index f8b8c64..5078704 100644
--- a/tutorials/pytorch/tut3_mixed_precision/walkthrough.py
+++ b/tutorials/pytorch/tut3_mixed_precision/walkthrough.py
@@ -1,30 +1,111 @@
 #!/usr/bin/env python3
-# Copyright (c) 2021 Graphcore Ltd. All rights reserved.
-
+"""
+Copyright (c) 2021 Graphcore Ltd. All rights reserved.
+"""
+"""
 # Half and mixed precision in PopTorch
+This tutorial shows how to use half and mixed precision in PopTorch with the
+example task of training a simple CNN model on a single Graphcore IPU (Mk1 or 
+Mk2).
+"""
+"""
+Requirements:
+- an installed Poplar SDK. See the Getting Started guide for your IPU hardware 
+for details of how to install the SDK;
+- Other Python modules: `pip install -r requirements.txt`
+"""
+"""
+# General
+
+## Motives for half precision
+
+Data is stored in memory, and some formats to store that data require less 
+memory than others. In a device's memory, when it comes to numerical data, 
+we use either integers or real numbers. Real numbers are represented by one 
+of several floating point formats, which vary in how many bits they use to 
+represent each number. Using more bits allows for greater precision and a 
+wider range of representable numbers, whereas using fewer bits allows for 
+faster calculations and reduces memory and power usage. 
+
+In deep learning applications, where less precise calculations are acceptable 
+and throughput is critical, using a lower precision format can provide 
+substantial gains in performance.
+
+The Graphcore IPU provides native support for two floating-point formats:
+
+- IEEE single-precision, which uses 32 bits for each number (FP32)
+- IEEE half-precision, which uses 16 bits for each number (FP16)
+
+Some applications which use FP16 do all calculations in FP16, whereas others 
+use a mix of FP16 and FP32. The latter approach is known as *mixed precision*.
+
+In this tutorial, we are going to talk about real numbers represented 
+in FP32 and FP16, and how to use these data types (dtypes) in PopTorch in 
+order to reduce the memory requirements of a model.
+
+## Numerical stability
+
+Numeric stability refers to how a model's performance is affected by the use 
+of a lower-precision dtype. We say an operation is "numerically unstable" in 
+FP16 if running it in this dtype causes the model to have worse accuracy 
+compared to running the operation in FP32. Two techniques that can be used to 
+increase the numerical stability of a model are  loss scaling and stochastic 
+rounding.
+
+### Loss scaling
+
+A numerical issue that can occur when training a model in half-precision is 
+that the gradients can underflow. This can be difficult to debug because the 
+model will simply appear to not be training, and can be especially damaging 
+because any gradients which underflow will propagate a value of 0 backwards 
+to other gradient calculations.
 
-# This tutorial shows how to use half and mixed precision in PopTorch with the
-# example task of training a simple CNN model on a single
-# Graphcore IPU (Mk1 or Mk2).
+The standard solution to this is known as *loss scaling*, which consists of 
+scaling up the loss value right before the start of backpropagation to prevent 
+numerical underflow of the gradients. Instructions on how to use loss scaling 
+will be discussed later in this tutorial.
 
-# Requirements:
-#   - an installed Poplar SDK. See the Getting Started guide for your IPU
-#     hardware for details of how to install the SDK;
-#   - Other Python modules: `pip install -r requirements.txt`
+### Stochastic rounding
 
-# Import the packages
+When training in half or mixed precision, numbers multiplied by each other 
+will need to be rounded in order to fit into the floating point format used. 
+Stochastic rounding is the process of using a probabilistic equation for the 
+rounding. Instead of always rounding to the nearest representable number, we 
+round up or down with a probability such that the expected value after 
+rounding is equal to the value before rounding. Since the expected value of 
+an addition after rounding is equal to the exact result of the addition, the 
+expected value of a sum is also its exact value.
+
+This means that on average, the values of the parameters of a network will be 
+close to the values they would have had if a higher-precision format had been 
+used. The added bonus of using stochastic rounding is that the parameters can 
+be stored in FP16, which means the parameters can be stored using half as much 
+memory. This can be especially helpful when training with small batch sizes, 
+where the memory used to store the parameters is proportionally greater than 
+the memory used to store parameters when training with large batch sizes.
+
+It is highly recommended that you enable this feature when training neural 
+networks with FP16 weights. The instructions to enable it in PopTorch are 
+presented later in this tutorial.
+"""
+"""
+Import the packages
+"""
 import torch
 import torch.nn as nn
-
 import torchvision
 import torchvision.transforms as transforms
-
 import poptorch
-import argparse
-from tqdm import tqdm
-
+from tqdm.auto import tqdm
+"""
+## Build the model
 
-# Build the model
+We use the same model as in [the previous tutorials on PopTorch](../). 
+Just like in the [previous tutorial](../tut2_efficient_data_loading), we are 
+using larger images (128x128) to simulate a heavier data load. This will make 
+the difference in memory between FP32 and FP16 meaningful enough to showcase 
+in this tutorial.
+"""
 class CustomModel(nn.Module):
     def __init__(self):
         super().__init__()
@@ -49,32 +130,83 @@ def forward(self, x, labels=None):
         if self.training:
             return x, self.loss(x, labels)
         return x
+"""
+>**NOTE:** The model inherits `self.training` from `torch.nn.Module` which 
+>initialises its value to True. Use `model.eval()` to set it to False and 
+>`model.train()` to switch it back to True.
+"""
+"""
+Choose parameters. 
+"""
+"""
+>**NOTE** If you wish to modify these parameters for educational purposes, 
+>make sure you re-run all the cells below this one, including this entire cell
+>as well:
+"""
+# Cast the model parameters to FP16
+model_half = True
 
-model = CustomModel()
+# Cast the data to FP16
+data_half = True
 
-parser = argparse.ArgumentParser()
-parser.add_argument('--model-half', dest='model_half', action='store_true', help='Cast the model parameters to FP16')
-parser.add_argument('--data-half', dest='data_half', action='store_true', help='Cast the data to FP16')
-parser.add_argument('--optimizer-half', dest='optimizer_half', action='store_true', help='Cast the accumulation type of the optimiser to FP16')
-parser.add_argument('--stochastic-rounding', dest='stochastic_rounding', action='store_true', help='Use stochasting rounding')
-parser.add_argument('--partials-half', dest='partials_half', action='store_true', help='Set partials data type to FP16')
-args = parser.parse_args()
+# Cast the accumulation of gradients values types of the optimiser to FP16
+optimizer_half = True
 
-# Casting a model's parameters
-if args.model_half:
+# Use stochasting rounding
+stochastic_rounding = True
+
+# Set partials data type to FP16
+partials_half = True
+"""
+### Casting a model's parameters
+
+The default data type of the parameters of a PyTorch module is FP32 
+(`torch.float32`). To convert all the parameters of a model to be represented 
+in FP16 (`torch.float16`), an operation we will call _downcasting_, we simply 
+do:
+"""
+model = CustomModel()
+
+if model_half:
     model = model.half()
+"""
+For this tutorial, we will cast all the model's parameters to FP16.
+"""
+"""
+### Casting a single layer's parameters
 
-# Prepare the data
-if args.data_half:
-    transform = transforms.Compose([transforms.Resize(128),
-                                    transforms.ToTensor(),
-                                    transforms.Normalize((0.5,), (0.5,)),
-                                    transforms.ConvertImageDtype(torch.half)])
-else:
-    transform = transforms.Compose([transforms.Resize(128),
-                                    transforms.ToTensor(),
-                                    transforms.Normalize((0.5,), (0.5,))])
+For bigger or more complex models, downcasting all the layers may generate 
+numerical instabilities and cause underflows. While the PopTorch and the IPU 
+offer features to alleviate those issues, it is still sensible for those 
+models to cast only the parameters of certain layers and observe how it 
+affects the overall training job. To downcast the parameters of a single 
+layer, we select the layer by its _name_ and use `half()`:
+"""
+model.conv1 = model.conv1.half()
+"""
+If you would like to upcast a layer instead, you can use `model.conv1.float()`.
+>**NOTE**: One can print out a list of the components of a PyTorch model, 
+>with their names, by doing `print(model)`.
+"""
+"""
+## Prepare the data
+
+We will use the FashionMNIST dataset that we download from `torchvision`. The 
+last stage of the pipeline will have to convert the data type of the tensors 
+representing the images to `torch.half` (equivalent to `torch.float16`) so that 
+our input data is also in FP16. This has the advantage of reducing the 
+bandwidth needed between the host and the IPU.
+"""
+transform_list = [transforms.Resize(128),
+                  transforms.ToTensor(),
+                  transforms.Normalize((0.5,), (0.5,))]
+if data_half:
+    transform_list.append(transforms.ConvertImageDtype(torch.half))
 
+transform = transforms.Compose(transform_list)
+"""
+Pull the datasets if they are not available locally:
+"""
 train_dataset = torchvision.datasets.FashionMNIST("./datasets/",
                                                   transform=transform,
                                                   download=True,
@@ -83,61 +215,259 @@ def forward(self, x, labels=None):
                                                  transform=transform,
                                                  download=True,
                                                  train=False)
+# sst_hide_output
+"""
+If the model has not been converted to half precision, but the input data has, 
+then some layers of the model may be converted to use FP16. Conversely, if the 
+input data has not been converted, but the model has, then the input tensors 
+will be converted to FP16 on the IPU. This behaviour is the opposite of 
+PyTorch's default behaviour.
 
-# Optimizer and loss scaling
-if args.optimizer_half:
-    optimizer = poptorch.optim.AdamW(model.parameters(),
-                                     lr=0.001,
-                                     loss_scaling=1024,
-                                     accum_type=torch.float16)
-else:
-    optimizer = poptorch.optim.AdamW(model.parameters(),
-                                     lr=0.001,
-                                     accum_type=torch.float32)
+>**NOTE**: To stop PopTorch automatically downcasting tensors and parameters, 
+>so that it preserves PyTorch's default behaviour (upcasting), use the option:
+>`opts.Precision.halfFloatCasting(poptorch.HalfFloatCastingBehavior.HalfUpcastToFloat)`.
+"""
+"""
+## Optimizers and loss scaling
+
+The value of the loss scaling factor can be passed as a parameter to the 
+optimisers in `poptorch.optim`. In this tutorial, we will set it to 1024 for 
+an AdamW optimizer. For all optimisers (except `poptorch.optim.SGD`), 
+using a model in FP16 requires the argument `accum_type` to be set to 
+`torch.float16` as well:
+"""
+accum, loss_scaling = \
+    (torch.float16, 1024) if optimizer_half else (torch.float32, None)
+
+optimizer = poptorch.optim.AdamW(params=model.parameters(),
+                                 lr=0.001,
+                                 accum_type=accum,
+                                 loss_scaling=loss_scaling)
+"""
+While higher values of `loss_scaling` minimize underflows, values that are 
+too high can also generate overflows as well as hurt convergence of the loss. 
+The optimal value depends on the model and the training job. This is therefore 
+a hyperparameter for you to tune.
+"""
+"""
+## Set PopTorch's options
 
+To configure some features of the IPU and to be able to use PopTorch's classes 
+in the next sections, we will need to create an instance of `poptorch.Options` 
+which stores the options we will be using. We covered some of the available 
+options in: [introductory tutorial for PopTorch](https://github.com/graphcore/examples/tree/master/tutorials/pytorch/tut1_basics).
 
-# Set PopTorch's options
+Let's initialise our options object before we talk about the options 
+we will use:
+"""
 opts = poptorch.Options()
+"""
+>**NOTE**: This tutorial has been designed to be run on a single IPU. 
+>If you do not have access to an IPU, you can use the option:
+> -[`useIpuModel`](https://docs.graphcore.ai/projects/poptorch-user-guide/en/latest/overview.html#poptorch.Options.useIpuModel)
+>to run a simulation on CPU instead. You can read more on the IPU Model 
+>and its limitations [here](https://docs.graphcore.ai/projects/poplar-user-guide/en/latest/poplar_programs.html#programming-with-poplar).
+"""
+"""
+### Stochastic rounding
 
-# Stochastic rounding
-if args.stochastic_rounding:
+With the IPU, stochastic rounding is implemented directly in the hardware and 
+only requires you to enable it. To do so, there is the option 
+`enableStochasticRounding` in the `Precision` namespace of `poptorch.Options`. 
+This namespace holds other options for using mixed precision that we will talk 
+about. To enable stochastic rounding, we do:
+"""
+if stochastic_rounding:
     opts.Precision.enableStochasticRounding(True)
-# Partials data type
-if args.partials_half:
+"""
+With the IPU Model, this option won't change anything since stochastic 
+rounding is implemented on the IPU.
+"""
+"""
+### Partials data type
+
+Matrix multiplications and convolutions have intermediate states we 
+call _partials_. Those partials can be stored in FP32 or FP16. There is 
+a memory benefit to using FP16 partials but the main benefit is that it can 
+increase the throughput for some models without affecting accuracy. However 
+there is a risk of increasing numerical instability if the values being 
+multiplied are small, due to underflows. The default data type of partials is 
+the input's data type(FP16). For this tutorial, we set partials to FP32 just 
+to showcase how it can be done. We use the option `setPartialsType` to do it:
+"""
+if partials_half:
     opts.Precision.setPartialsType(torch.half)
 else:
     opts.Precision.setPartialsType(torch.float)
+"""
+## Train the model
 
-# Train the model
+We can now train the model. After we have set all our options, we reuse 
+our `poptorch.Options` instance for the training `poptorch.DataLoader` 
+that we will be using:
+"""
 train_dataloader = poptorch.DataLoader(opts,
                                        train_dataset,
                                        batch_size=12,
                                        shuffle=True,
-                                       num_workers=40)
+                                       num_workers=4)
+"""
+We first make sure our model is in training mode, and then wrap it 
+with `poptorch.trainingModel`.
+"""
 poptorch_model = poptorch.trainingModel(model,
                                         options=opts,
                                         optimizer=optimizer)
-
+"""
+Let's run the training loop for 10 epochs.
+"""
 epochs = 10
 for epoch in tqdm(range(epochs), desc="epochs"):
     total_loss = 0.0
     for data, labels in tqdm(train_dataloader, desc="batches", leave=False):
         output, loss = poptorch_model(data, labels)
         total_loss += loss
+# sst_hide_output
+"""
+Release IPU resources:
+"""
+poptorch_model.detachFromDevice()
+"""
+Our new model is now trained and we can start its evaluation.
+"""
+"""
+## Evaluate the model
 
-# Evaluate the model
+Some PyTorch's operations, such as CNNs, are not supported in FP16 on the CPU, 
+so we will evaluate our fine-tuned model in mixed precision on an IPU 
+using `poptorch.inferenceModel`.
+"""
 model.eval()
 poptorch_model_inf = poptorch.inferenceModel(model, options=opts)
 test_dataloader = poptorch.DataLoader(opts,
                                       test_dataset,
                                       batch_size=32,
-                                      num_workers=40)
-
+                                      num_workers=4)
+"""
+Run inference on the labelled data
+"""
 predictions, labels = [], []
 for data, label in test_dataloader:
     predictions += poptorch_model_inf(data).data.float().max(dim=1).indices
     labels += label
-
+# sst_hide_output
+"""
+Release IPU resources:
+"""
+poptorch_model_inf.detachFromDevice()
+"""
+We obtained an accuracy of approximately 84% on the test dataset.
+"""
 print(f"""Eval accuracy on IPU: {100 *
                 (1 - torch.count_nonzero(torch.sub(torch.tensor(labels),
                 torch.tensor(predictions))) / len(labels)):.2f}%""")
+"""
+# Visualise the memory footprint
+
+We can visually compare the memory footprint on the IPU of the model trained 
+in FP16 and FP32, thanks to Graphcore's [PopVision Graph Analyser](https://docs.graphcore.ai/projects/graphcore-popvision-user-guide/en/latest/graph/graph.html).
+
+We generated memory reports of the same training session as covered in this 
+tutorial for both cases: with and without downcasting the model with 
+`model.half()`. Here is the figure of both memory footprints, where "source" 
+and "target" represent the model trained in FP16 and FP32 respectively:
+
+![Comparison of memory footprints](static/MemoryDiffReport.png)
+
+We observed a ~26% reduction in memory usage with the settings of this 
+tutorial, including from peak to peak. The impact on the accuracy was also 
+small, with less than 1% lost!
+
+# Debug floating-point exceptions
+
+Floating-point issues can be difficult to debug because the model will simply 
+appear to not be training without specific information about what went wrong. 
+For more detailed information on the issue we set 
+`debug.floatPointOpException` to true in the environment variable 
+`POPLAR_ENGINE_OPTIONS`. To set this, you can add the folowing before 
+the command  you use to run your model:
+
+```python
+POPLAR_ENGINE_OPTIONS='{"debug.floatPointOpException": "true"}'
+```
+"""
+"""
+# PopTorch tracing and casting
+
+Because PopTorch relies on the `torch.jit.trace` API, it is limited to tracing 
+operations which run on the CPU. Many of these operations do not support FP16 
+inputs due to numerical stability issues. To allow the full range 
+of operations, PopTorch converts all FP16 inputs to FP32 before tracing and 
+then restores them to FP16. This is because the model must always be traced 
+with FP16 inputs converted to FP32.
+
+PopTorch’s default casting functionality is to output in FP16 if any input 
+of the operation is FP16. This is opposite to PyTorch, which outputs in FP32 
+if any input of the operations is in FP32. To achieve the same behaviour 
+in PopTorch, one can use: 
+`opts.Precision.halfFloatCasting(poptorch.HalfFloatCastingBehavior.HalfUpcastToFloat)`.
+
+Below you can see the difference between native PyTorch and 
+PopTorch (with and without the option mentioned above):
+
+"""
+class Model(torch.nn.Module):
+    def forward(self, x, y):
+        return x + y
+
+native_model = Model()
+
+float16_tensor = torch.tensor([1.0], dtype=torch.float16)
+float32_tensor = torch.tensor([1.0], dtype=torch.float32)
+
+"""
+Native PyTorch results in a FP32 tensor:
+"""
+assert native_model(float32_tensor, float16_tensor).dtype == torch.float32
+
+"""
+Let's instantiate default PopTorch `Options` for IPUs:
+"""
+opts = poptorch.Options()
+"""
+PopTorch results in a FP16 tensor:
+"""
+poptorch_model = poptorch.inferenceModel(native_model, opts)
+assert poptorch_model(float32_tensor, float16_tensor).dtype == torch.float16
+# sst_hide_output
+"""
+This option makes the same PopTorch example result in an FP32 tensor:
+"""
+opts.Precision.halfFloatCasting(
+    poptorch.HalfFloatCastingBehavior.HalfUpcastToFloat)
+
+poptorch_model = poptorch.inferenceModel(native_model, opts)
+assert poptorch_model(float32_tensor, float16_tensor).dtype == torch.float32
+# sst_hide_output
+"""
+Release IPU resources:
+"""
+poptorch_model.detachFromDevice()
+"""
+# Summary
+- Use half and mixed precision when you need to save memory on the IPU.
+- You can cast a PyTorch model or a specific layer to FP16 using:
+    ```python
+    # Model
+    model.half()
+    # Layer
+    model.layer.half()
+    ```
+- Several features are available in PopTorch to improve the numerical 
+stability of a model in FP16:
+    - Loss scaling: `poptorch.optim.SGD(..., loss_scaling=1000)`
+    - Stochastic rounding: `opts.Precision.enableStochasticRounding(True)`
+    - Upcast partials data types: `opts.Precision.setPartialsType(torch.float)`
+- The [PopVision Graph Analyser](https://docs.graphcore.ai/projects/graphcore-popvision-user-guide/en/latest/graph/graph.html) 
+can be used to inspect the memory usage of a model and to help debug issues.
+"""
diff --git a/tutorials/pytorch/tut3_mixed_precision/walkthrough_code_only.py b/tutorials/pytorch/tut3_mixed_precision/walkthrough_code_only.py
new file mode 100644
index 0000000..e21766c
--- /dev/null
+++ b/tutorials/pytorch/tut3_mixed_precision/walkthrough_code_only.py
@@ -0,0 +1,150 @@
+# Copyright (c) 2021 Graphcore Ltd. All rights reserved.
+import torch
+import torch.nn as nn
+import torchvision
+import torchvision.transforms as transforms
+import poptorch
+from tqdm.auto import tqdm
+
+class CustomModel(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.conv1 = nn.Conv2d(1, 5, 3)
+        self.pool = nn.MaxPool2d(2, 2)
+        self.conv2 = nn.Conv2d(5, 12, 5)
+        self.norm = nn.GroupNorm(3, 12)
+        self.fc1 = nn.Linear(41772, 100)
+        self.relu = nn.ReLU()
+        self.fc2 = nn.Linear(100, 10)
+        self.log_softmax = nn.LogSoftmax(dim=0)
+        self.loss = nn.NLLLoss()
+
+    def forward(self, x, labels=None):
+        x = self.pool(self.relu(self.conv1(x)))
+        x = self.norm(self.relu(self.conv2(x)))
+        x = torch.flatten(x, start_dim=1)
+        x = self.relu(self.fc1(x))
+        x = self.log_softmax(self.fc2(x))
+        # The model is responsible for the calculation
+        # of the loss when using an IPU. We do it this way:
+        if self.training:
+            return x, self.loss(x, labels)
+        return x
+
+# Cast the model parameters to FP16
+model_half = True
+
+# Cast the data to FP16
+data_half = True
+
+# Cast the accumulation of gradients values types of the optimiser to FP16
+optimizer_half = True
+
+# Use stochasting rounding
+stochastic_rounding = True
+
+# Set partials data type to FP16
+partials_half = True
+
+model = CustomModel()
+
+if model_half:
+    model = model.half()
+
+model.conv1 = model.conv1.half()
+
+transform_list = [transforms.Resize(128),
+                  transforms.ToTensor(),
+                  transforms.Normalize((0.5,), (0.5,))]
+if data_half:
+    transform_list.append(transforms.ConvertImageDtype(torch.half))
+
+transform = transforms.Compose(transform_list)
+
+train_dataset = torchvision.datasets.FashionMNIST("./datasets/",
+                                                  transform=transform,
+                                                  download=True,
+                                                  train=True)
+test_dataset = torchvision.datasets.FashionMNIST("./datasets/",
+                                                 transform=transform,
+                                                 download=True,
+                                                 train=False)
+
+accum, loss_scaling = \
+    (torch.float16, 1024) if optimizer_half else (torch.float32, None)
+
+optimizer = poptorch.optim.AdamW(params=model.parameters(),
+                                 lr=0.001,
+                                 accum_type=accum,
+                                 loss_scaling=loss_scaling)
+
+opts = poptorch.Options()
+
+if stochastic_rounding:
+    opts.Precision.enableStochasticRounding(True)
+
+if partials_half:
+    opts.Precision.setPartialsType(torch.half)
+else:
+    opts.Precision.setPartialsType(torch.float)
+
+train_dataloader = poptorch.DataLoader(opts,
+                                       train_dataset,
+                                       batch_size=12,
+                                       shuffle=True,
+                                       num_workers=4)
+
+poptorch_model = poptorch.trainingModel(model,
+                                        options=opts,
+                                        optimizer=optimizer)
+
+epochs = 10
+for epoch in tqdm(range(epochs), desc="epochs"):
+    total_loss = 0.0
+    for data, labels in tqdm(train_dataloader, desc="batches", leave=False):
+        output, loss = poptorch_model(data, labels)
+        total_loss += loss
+
+poptorch_model.detachFromDevice()
+
+model.eval()
+poptorch_model_inf = poptorch.inferenceModel(model, options=opts)
+test_dataloader = poptorch.DataLoader(opts,
+                                      test_dataset,
+                                      batch_size=32,
+                                      num_workers=4)
+
+predictions, labels = [], []
+for data, label in test_dataloader:
+    predictions += poptorch_model_inf(data).data.float().max(dim=1).indices
+    labels += label
+
+poptorch_model_inf.detachFromDevice()
+
+print(f"""Eval accuracy on IPU: {100 *
+                (1 - torch.count_nonzero(torch.sub(torch.tensor(labels),
+                torch.tensor(predictions))) / len(labels)):.2f}%""")
+
+class Model(torch.nn.Module):
+    def forward(self, x, y):
+        return x + y
+
+native_model = Model()
+
+float16_tensor = torch.tensor([1.0], dtype=torch.float16)
+float32_tensor = torch.tensor([1.0], dtype=torch.float32)
+
+assert native_model(float32_tensor, float16_tensor).dtype == torch.float32
+
+opts = poptorch.Options()
+
+poptorch_model = poptorch.inferenceModel(native_model, opts)
+assert poptorch_model(float32_tensor, float16_tensor).dtype == torch.float16
+
+opts.Precision.halfFloatCasting(
+    poptorch.HalfFloatCastingBehavior.HalfUpcastToFloat)
+
+poptorch_model = poptorch.inferenceModel(native_model, opts)
+assert poptorch_model(float32_tensor, float16_tensor).dtype == torch.float32
+
+poptorch_model.detachFromDevice()

From e8f61b42730a5f9e01361f7f6ada9606c8db515e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Adam=20Wr=C3=B3bel?= <adam.wrobel@deepsense.ai>
Date: Mon, 11 Oct 2021 10:50:15 +0200
Subject: [PATCH 2/5] Pytorch Tutorial 3 - review fixes and resolving warnings

---
 .../pytorch/tut3_mixed_precision/README.md    |  28 ++-
 .../tut3_mixed_precision/walkthrough.ipynb    | 180 +++++++++---------
 .../tut3_mixed_precision/walkthrough.py       |  27 +--
 .../walkthrough_code_only.py                  |  11 +-
 4 files changed, 122 insertions(+), 124 deletions(-)

diff --git a/tutorials/pytorch/tut3_mixed_precision/README.md b/tutorials/pytorch/tut3_mixed_precision/README.md
index 68503aa..f1aa1db 100644
--- a/tutorials/pytorch/tut3_mixed_precision/README.md
+++ b/tutorials/pytorch/tut3_mixed_precision/README.md
@@ -4,7 +4,7 @@ example task of training a simple CNN model on a single Graphcore IPU (Mk1 or
 Mk2).
 
 Requirements:
-- an installed Poplar SDK. See the Getting Started guide for your IPU hardware 
+- an installed Poplar SDK. See the Getting Started guide for your IPU system 
 for details of how to install the SDK;
 - Other Python modules: `pip install -r requirements.txt`
 
@@ -81,7 +81,8 @@ It is highly recommended that you enable this feature when training neural
 networks with FP16 weights. The instructions to enable it in PopTorch are 
 presented later in this tutorial.
 
-Import the packages
+# Train a model in half precision
+## Import the packages
 
 
 ```python
@@ -133,7 +134,7 @@ class CustomModel(nn.Module):
 >initialises its value to True. Use `model.eval()` to set it to False and 
 >`model.train()` to switch it back to True.
 
-Choose parameters. 
+## Choose parameters. 
 
 >**NOTE** If you wish to modify these parameters for educational purposes, 
 >make sure you re-run all the cells below this one, including this entire cell
@@ -154,7 +155,7 @@ optimizer_half = True
 stochastic_rounding = True
 
 # Set partials data type to FP16
-partials_half = True
+partials_half = False
 ```
 
 ### Casting a model's parameters
@@ -182,12 +183,9 @@ offer features to alleviate those issues, it is still sensible for those
 models to cast only the parameters of certain layers and observe how it 
 affects the overall training job. To downcast the parameters of a single 
 layer, we select the layer by its _name_ and use `half()`:
-
-
 ```python
 model.conv1 = model.conv1.half()
 ```
-
 If you would like to upcast a layer instead, you can use `model.conv1.float()`.
 >**NOTE**: One can print out a list of the components of a PyTorch model, 
 >with their names, by doing `print(model)`.
@@ -264,7 +262,7 @@ a hyperparameter for you to tune.
 To configure some features of the IPU and to be able to use PopTorch's classes 
 in the next sections, we will need to create an instance of `poptorch.Options` 
 which stores the options we will be using. We covered some of the available 
-options in: [introductory tutorial for PopTorch](https://github.com/graphcore/examples/tree/master/tutorials/pytorch/tut1_basics).
+options in the: [Introductory Tutorial for PopTorch](https://github.com/graphcore/examples/tree/master/tutorials/pytorch/tut1_basics).
 
 Let's initialise our options object before we talk about the options 
 we will use:
@@ -328,7 +326,7 @@ train_dataloader = poptorch.DataLoader(opts,
                                        train_dataset,
                                        batch_size=12,
                                        shuffle=True,
-                                       num_workers=4)
+                                       num_workers=40)
 ```
 
 We first make sure our model is in training mode, and then wrap it 
@@ -336,6 +334,7 @@ with `poptorch.trainingModel`.
 
 
 ```python
+model.train()
 poptorch_model = poptorch.trainingModel(model,
                                         options=opts,
                                         optimizer=optimizer)
@@ -353,11 +352,12 @@ for epoch in tqdm(range(epochs), desc="epochs"):
         total_loss += loss
 ```
 
-Release IPU resources:
+Release resources:
 
 
 ```python
 poptorch_model.detachFromDevice()
+train_dataloader.terminate()
 ```
 
 Our new model is now trained and we can start its evaluation.
@@ -375,7 +375,7 @@ poptorch_model_inf = poptorch.inferenceModel(model, options=opts)
 test_dataloader = poptorch.DataLoader(opts,
                                       test_dataset,
                                       batch_size=32,
-                                      num_workers=4)
+                                      num_workers=40)
 ```
 
 Run inference on the labelled data
@@ -388,11 +388,12 @@ for data, label in test_dataloader:
     labels += label
 ```
 
-Release IPU resources:
+Release resources:
 
 
 ```python
 poptorch_model_inf.detachFromDevice()
+test_dataloader.terminate()
 ```
 
 We obtained an accuracy of approximately 84% on the test dataset.
@@ -404,9 +405,6 @@ print(f"""Eval accuracy on IPU: {100 *
                 torch.tensor(predictions))) / len(labels)):.2f}%""")
 ```
 
-    Eval accuracy on IPU: 85.38%
-
-
 # Visualise the memory footprint
 
 We can visually compare the memory footprint on the IPU of the model trained 
diff --git a/tutorials/pytorch/tut3_mixed_precision/walkthrough.ipynb b/tutorials/pytorch/tut3_mixed_precision/walkthrough.ipynb
index 86ba49f..8a22a23 100644
--- a/tutorials/pytorch/tut3_mixed_precision/walkthrough.ipynb
+++ b/tutorials/pytorch/tut3_mixed_precision/walkthrough.ipynb
@@ -2,7 +2,7 @@
  "cells": [
   {
    "cell_type": "markdown",
-   "id": "c763a15b",
+   "id": "361135e5",
    "metadata": {},
    "source": [
     "Copyright (c) 2021 Graphcore Ltd. All rights reserved."
@@ -10,7 +10,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "413c7082",
+   "id": "42598a76",
    "metadata": {},
    "source": [
     "# Half and mixed precision in PopTorch\n",
@@ -21,18 +21,18 @@
   },
   {
    "cell_type": "markdown",
-   "id": "103fb5d6",
+   "id": "b1c6e9c1",
    "metadata": {},
    "source": [
     "Requirements:\n",
-    "- an installed Poplar SDK. See the Getting Started guide for your IPU hardware \n",
+    "- an installed Poplar SDK. See the Getting Started guide for your IPU system \n",
     "for details of how to install the SDK;\n",
     "- Other Python modules: `pip install -r requirements.txt`"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "b526aa7e",
+   "id": "64c522f4",
    "metadata": {},
    "source": [
     "# General\n",
@@ -111,16 +111,17 @@
   },
   {
    "cell_type": "markdown",
-   "id": "3e733e5a",
+   "id": "c013519b",
    "metadata": {},
    "source": [
-    "Import the packages"
+    "# Train a model in half precision\n",
+    "## Import the packages"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "7d2c6071",
+   "id": "7d1c9c43",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -134,7 +135,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "c3f05fbf",
+   "id": "f0c7db76",
    "metadata": {},
    "source": [
     "## Build the model\n",
@@ -149,7 +150,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "d7299b55",
+   "id": "00e3d1da",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -181,7 +182,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4ead1d52",
+   "id": "09ed56b4",
    "metadata": {},
    "source": [
     ">**NOTE:** The model inherits `self.training` from `torch.nn.Module` which \n",
@@ -191,15 +192,15 @@
   },
   {
    "cell_type": "markdown",
-   "id": "924cd4fd",
+   "id": "024a4b6f",
    "metadata": {},
    "source": [
-    "Choose parameters. "
+    "## Choose parameters. "
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "b20441df",
+   "id": "edf340ec",
    "metadata": {},
    "source": [
     ">**NOTE** If you wish to modify these parameters for educational purposes, \n",
@@ -210,7 +211,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "f2d1af74",
+   "id": "e2054ea3",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -227,12 +228,12 @@
     "stochastic_rounding = True\n",
     "\n",
     "# Set partials data type to FP16\n",
-    "partials_half = True"
+    "partials_half = False"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "6c132f5f",
+   "id": "f089adaf",
    "metadata": {},
    "source": [
     "### Casting a model's parameters\n",
@@ -246,7 +247,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "8e78f833",
+   "id": "70859fe9",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -258,7 +259,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "b320d37f",
+   "id": "1a29ae87",
    "metadata": {},
    "source": [
     "For this tutorial, we will cast all the model's parameters to FP16."
@@ -266,7 +267,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "3edd9232",
+   "id": "c5d22d42",
    "metadata": {},
    "source": [
     "### Casting a single layer's parameters\n",
@@ -276,24 +277,10 @@
     "offer features to alleviate those issues, it is still sensible for those \n",
     "models to cast only the parameters of certain layers and observe how it \n",
     "affects the overall training job. To downcast the parameters of a single \n",
-    "layer, we select the layer by its _name_ and use `half()`:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "3b76567b",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "model.conv1 = model.conv1.half()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "5d3c1b33",
-   "metadata": {},
-   "source": [
+    "layer, we select the layer by its _name_ and use `half()`:\n",
+    "```python\n",
+    "model.conv1 = model.conv1.half()\n",
+    "```\n",
     "If you would like to upcast a layer instead, you can use `model.conv1.float()`.\n",
     ">**NOTE**: One can print out a list of the components of a PyTorch model, \n",
     ">with their names, by doing `print(model)`."
@@ -301,7 +288,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "ad340e11",
+   "id": "e111b1c4",
    "metadata": {},
    "source": [
     "## Prepare the data\n",
@@ -316,7 +303,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "c09a9bb8",
+   "id": "51193478",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -331,7 +318,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a55f8f87",
+   "id": "4d1f1c90",
    "metadata": {},
    "source": [
     "Pull the datasets if they are not available locally:"
@@ -340,7 +327,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "e9ed38b5",
+   "id": "e9ba3b88",
    "metadata": {
     "tags": [
      "sst_hide_output"
@@ -360,7 +347,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "c06a482e",
+   "id": "11915c5d",
    "metadata": {},
    "source": [
     "If the model has not been converted to half precision, but the input data has, \n",
@@ -376,7 +363,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "7b3f8c6d",
+   "id": "6a32cd2e",
    "metadata": {},
    "source": [
     "## Optimizers and loss scaling\n",
@@ -391,7 +378,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "058dc529",
+   "id": "0e8df0f3",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -406,7 +393,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "344ad251",
+   "id": "4963bcc5",
    "metadata": {},
    "source": [
     "While higher values of `loss_scaling` minimize underflows, values that are \n",
@@ -417,7 +404,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "b047f12e",
+   "id": "c6f83b39",
    "metadata": {},
    "source": [
     "## Set PopTorch's options\n",
@@ -425,7 +412,7 @@
     "To configure some features of the IPU and to be able to use PopTorch's classes \n",
     "in the next sections, we will need to create an instance of `poptorch.Options` \n",
     "which stores the options we will be using. We covered some of the available \n",
-    "options in: [introductory tutorial for PopTorch](https://github.com/graphcore/examples/tree/master/tutorials/pytorch/tut1_basics).\n",
+    "options in the: [Introductory Tutorial for PopTorch](https://github.com/graphcore/examples/tree/master/tutorials/pytorch/tut1_basics).\n",
     "\n",
     "Let's initialise our options object before we talk about the options \n",
     "we will use:"
@@ -434,7 +421,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "a1429abe",
+   "id": "5b56232a",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -443,7 +430,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "0e3eed4e",
+   "id": "ff7cb9fc",
    "metadata": {},
    "source": [
     ">**NOTE**: This tutorial has been designed to be run on a single IPU. \n",
@@ -455,7 +442,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e9d133d6",
+   "id": "1aeb900b",
    "metadata": {},
    "source": [
     "### Stochastic rounding\n",
@@ -470,7 +457,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "3547f7c0",
+   "id": "d7b64985",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -480,7 +467,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "44ad503a",
+   "id": "063f996e",
    "metadata": {},
    "source": [
     "With the IPU Model, this option won't change anything since stochastic \n",
@@ -489,7 +476,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4f30b68b",
+   "id": "b0943b35",
    "metadata": {},
    "source": [
     "### Partials data type\n",
@@ -507,7 +494,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "9e685d84",
+   "id": "72430805",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -519,7 +506,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "18075058",
+   "id": "295ae9ae",
    "metadata": {},
    "source": [
     "## Train the model\n",
@@ -532,7 +519,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "3dc15aa0",
+   "id": "465ee569",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -540,12 +527,12 @@
     "                                       train_dataset,\n",
     "                                       batch_size=12,\n",
     "                                       shuffle=True,\n",
-    "                                       num_workers=4)"
+    "                                       num_workers=40)"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "15afcbba",
+   "id": "44a6982b",
    "metadata": {},
    "source": [
     "We first make sure our model is in training mode, and then wrap it \n",
@@ -555,10 +542,11 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "f991a738",
+   "id": "362be4be",
    "metadata": {},
    "outputs": [],
    "source": [
+    "model.train()\n",
     "poptorch_model = poptorch.trainingModel(model,\n",
     "                                        options=opts,\n",
     "                                        optimizer=optimizer)"
@@ -566,7 +554,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "3fd256a9",
+   "id": "e1339ee2",
    "metadata": {},
    "source": [
     "Let's run the training loop for 10 epochs."
@@ -575,7 +563,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "ddeee51c",
+   "id": "d5be305f",
    "metadata": {
     "tags": [
      "sst_hide_output"
@@ -593,25 +581,26 @@
   },
   {
    "cell_type": "markdown",
-   "id": "14615613",
+   "id": "98201fbb",
    "metadata": {},
    "source": [
-    "Release IPU resources:"
+    "Release resources:"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "df050b9b",
+   "id": "73a07c90",
    "metadata": {},
    "outputs": [],
    "source": [
-    "poptorch_model.detachFromDevice()"
+    "poptorch_model.detachFromDevice()\n",
+    "train_dataloader.terminate()"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "9862f6c1",
+   "id": "66b01508",
    "metadata": {},
    "source": [
     "Our new model is now trained and we can start its evaluation."
@@ -619,7 +608,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f22358c6",
+   "id": "31cce506",
    "metadata": {},
    "source": [
     "## Evaluate the model\n",
@@ -632,7 +621,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "6f8d81de",
+   "id": "f571e835",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -641,12 +630,12 @@
     "test_dataloader = poptorch.DataLoader(opts,\n",
     "                                      test_dataset,\n",
     "                                      batch_size=32,\n",
-    "                                      num_workers=4)"
+    "                                      num_workers=40)"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "717a8245",
+   "id": "ff4a3808",
    "metadata": {},
    "source": [
     "Run inference on the labelled data"
@@ -655,7 +644,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "bdb56600",
+   "id": "79a13a62",
    "metadata": {
     "tags": [
      "sst_hide_output"
@@ -671,25 +660,26 @@
   },
   {
    "cell_type": "markdown",
-   "id": "8eb50688",
+   "id": "007c5124",
    "metadata": {},
    "source": [
-    "Release IPU resources:"
+    "Release resources:"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "fdae998f",
+   "id": "890e2a4a",
    "metadata": {},
    "outputs": [],
    "source": [
-    "poptorch_model_inf.detachFromDevice()"
+    "poptorch_model_inf.detachFromDevice()\n",
+    "test_dataloader.terminate()"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "b66560fb",
+   "id": "e20eff4c",
    "metadata": {},
    "source": [
     "We obtained an accuracy of approximately 84% on the test dataset."
@@ -698,8 +688,12 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "f057eb38",
-   "metadata": {},
+   "id": "04a6c253",
+   "metadata": {
+    "tags": [
+     "sst_hide_output"
+    ]
+   },
    "outputs": [],
    "source": [
     "print(f\"\"\"Eval accuracy on IPU: {100 *\n",
@@ -709,7 +703,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "df567031",
+   "id": "caf99585",
    "metadata": {},
    "source": [
     "# Visualise the memory footprint\n",
@@ -744,7 +738,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f011bb8e",
+   "id": "989d5f60",
    "metadata": {},
    "source": [
     "# PopTorch tracing and casting\n",
@@ -769,7 +763,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "bdfa5241",
+   "id": "cfdab3d6",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -785,7 +779,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e384a946",
+   "id": "7d239c7d",
    "metadata": {},
    "source": [
     "Native PyTorch results in a FP32 tensor:"
@@ -794,7 +788,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "88178f85",
+   "id": "7de7a6c1",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -803,7 +797,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "c0aafef2",
+   "id": "3f6cee96",
    "metadata": {},
    "source": [
     "Let's instantiate default PopTorch `Options` for IPUs:"
@@ -812,7 +806,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "95c60c61",
+   "id": "f35be7dc",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -821,7 +815,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "39f92f2d",
+   "id": "f6a2a1ba",
    "metadata": {},
    "source": [
     "PopTorch results in a FP16 tensor:"
@@ -830,7 +824,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "83732984",
+   "id": "8704a74b",
    "metadata": {
     "tags": [
      "sst_hide_output"
@@ -844,7 +838,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "32979eef",
+   "id": "4ffdefb4",
    "metadata": {},
    "source": [
     "This option makes the same PopTorch example result in an FP32 tensor:"
@@ -853,7 +847,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "a73b50ce",
+   "id": "dd060e76",
    "metadata": {
     "tags": [
      "sst_hide_output"
@@ -870,7 +864,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4dd68ce1",
+   "id": "d899dc41",
    "metadata": {},
    "source": [
     "Release IPU resources:"
@@ -879,7 +873,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "a76ddb20",
+   "id": "0d027f11",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -888,7 +882,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "1eaf3c08",
+   "id": "17b89719",
    "metadata": {},
    "source": [
     "# Summary\n",
diff --git a/tutorials/pytorch/tut3_mixed_precision/walkthrough.py b/tutorials/pytorch/tut3_mixed_precision/walkthrough.py
index 5078704..0252702 100644
--- a/tutorials/pytorch/tut3_mixed_precision/walkthrough.py
+++ b/tutorials/pytorch/tut3_mixed_precision/walkthrough.py
@@ -10,7 +10,7 @@
 """
 """
 Requirements:
-- an installed Poplar SDK. See the Getting Started guide for your IPU hardware 
+- an installed Poplar SDK. See the Getting Started guide for your IPU system 
 for details of how to install the SDK;
 - Other Python modules: `pip install -r requirements.txt`
 """
@@ -89,7 +89,8 @@
 presented later in this tutorial.
 """
 """
-Import the packages
+# Train a model in half precision
+## Import the packages
 """
 import torch
 import torch.nn as nn
@@ -136,7 +137,7 @@ def forward(self, x, labels=None):
 >`model.train()` to switch it back to True.
 """
 """
-Choose parameters. 
+## Choose parameters. 
 """
 """
 >**NOTE** If you wish to modify these parameters for educational purposes, 
@@ -156,7 +157,7 @@ def forward(self, x, labels=None):
 stochastic_rounding = True
 
 # Set partials data type to FP16
-partials_half = True
+partials_half = False
 """
 ### Casting a model's parameters
 
@@ -181,9 +182,9 @@ def forward(self, x, labels=None):
 models to cast only the parameters of certain layers and observe how it 
 affects the overall training job. To downcast the parameters of a single 
 layer, we select the layer by its _name_ and use `half()`:
-"""
+```python
 model.conv1 = model.conv1.half()
-"""
+```
 If you would like to upcast a layer instead, you can use `model.conv1.float()`.
 >**NOTE**: One can print out a list of the components of a PyTorch model, 
 >with their names, by doing `print(model)`.
@@ -255,7 +256,7 @@ def forward(self, x, labels=None):
 To configure some features of the IPU and to be able to use PopTorch's classes 
 in the next sections, we will need to create an instance of `poptorch.Options` 
 which stores the options we will be using. We covered some of the available 
-options in: [introductory tutorial for PopTorch](https://github.com/graphcore/examples/tree/master/tutorials/pytorch/tut1_basics).
+options in the: [Introductory Tutorial for PopTorch](https://github.com/graphcore/examples/tree/master/tutorials/pytorch/tut1_basics).
 
 Let's initialise our options object before we talk about the options 
 we will use:
@@ -310,11 +311,12 @@ def forward(self, x, labels=None):
                                        train_dataset,
                                        batch_size=12,
                                        shuffle=True,
-                                       num_workers=4)
+                                       num_workers=40)
 """
 We first make sure our model is in training mode, and then wrap it 
 with `poptorch.trainingModel`.
 """
+model.train()
 poptorch_model = poptorch.trainingModel(model,
                                         options=opts,
                                         optimizer=optimizer)
@@ -329,9 +331,10 @@ def forward(self, x, labels=None):
         total_loss += loss
 # sst_hide_output
 """
-Release IPU resources:
+Release resources:
 """
 poptorch_model.detachFromDevice()
+train_dataloader.terminate()
 """
 Our new model is now trained and we can start its evaluation.
 """
@@ -347,7 +350,7 @@ def forward(self, x, labels=None):
 test_dataloader = poptorch.DataLoader(opts,
                                       test_dataset,
                                       batch_size=32,
-                                      num_workers=4)
+                                      num_workers=40)
 """
 Run inference on the labelled data
 """
@@ -357,15 +360,17 @@ def forward(self, x, labels=None):
     labels += label
 # sst_hide_output
 """
-Release IPU resources:
+Release resources:
 """
 poptorch_model_inf.detachFromDevice()
+test_dataloader.terminate()
 """
 We obtained an accuracy of approximately 84% on the test dataset.
 """
 print(f"""Eval accuracy on IPU: {100 *
                 (1 - torch.count_nonzero(torch.sub(torch.tensor(labels),
                 torch.tensor(predictions))) / len(labels)):.2f}%""")
+# sst_hide_output
 """
 # Visualise the memory footprint
 
diff --git a/tutorials/pytorch/tut3_mixed_precision/walkthrough_code_only.py b/tutorials/pytorch/tut3_mixed_precision/walkthrough_code_only.py
index e21766c..f74e169 100644
--- a/tutorials/pytorch/tut3_mixed_precision/walkthrough_code_only.py
+++ b/tutorials/pytorch/tut3_mixed_precision/walkthrough_code_only.py
@@ -44,15 +44,13 @@ def forward(self, x, labels=None):
 stochastic_rounding = True
 
 # Set partials data type to FP16
-partials_half = True
+partials_half = False
 
 model = CustomModel()
 
 if model_half:
     model = model.half()
 
-model.conv1 = model.conv1.half()
-
 transform_list = [transforms.Resize(128),
                   transforms.ToTensor(),
                   transforms.Normalize((0.5,), (0.5,))]
@@ -92,8 +90,9 @@ def forward(self, x, labels=None):
                                        train_dataset,
                                        batch_size=12,
                                        shuffle=True,
-                                       num_workers=4)
+                                       num_workers=40)
 
+model.train()
 poptorch_model = poptorch.trainingModel(model,
                                         options=opts,
                                         optimizer=optimizer)
@@ -106,13 +105,14 @@ def forward(self, x, labels=None):
         total_loss += loss
 
 poptorch_model.detachFromDevice()
+train_dataloader.terminate()
 
 model.eval()
 poptorch_model_inf = poptorch.inferenceModel(model, options=opts)
 test_dataloader = poptorch.DataLoader(opts,
                                       test_dataset,
                                       batch_size=32,
-                                      num_workers=4)
+                                      num_workers=40)
 
 predictions, labels = [], []
 for data, label in test_dataloader:
@@ -120,6 +120,7 @@ def forward(self, x, labels=None):
     labels += label
 
 poptorch_model_inf.detachFromDevice()
+test_dataloader.terminate()
 
 print(f"""Eval accuracy on IPU: {100 *
                 (1 - torch.count_nonzero(torch.sub(torch.tensor(labels),

From d385b39359f6082691e21d307920056c54839a3a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Adam=20Wr=C3=B3bel?= <adam.wrobel@deepsense.ai>
Date: Mon, 11 Oct 2021 17:36:01 +0200
Subject: [PATCH 3/5] Pytorch tutorial 3 - review - explanation for DataLoader
 terminate function

---
 .../pytorch/tut3_mixed_precision/README.md    |  12 +-
 .../tut3_mixed_precision/walkthrough.ipynb    | 138 +++++++++---------
 .../tut3_mixed_precision/walkthrough.py       |  12 +-
 .../walkthrough_code_only.py                  |   3 +-
 4 files changed, 95 insertions(+), 70 deletions(-)

diff --git a/tutorials/pytorch/tut3_mixed_precision/README.md b/tutorials/pytorch/tut3_mixed_precision/README.md
index f1aa1db..56780d0 100644
--- a/tutorials/pytorch/tut3_mixed_precision/README.md
+++ b/tutorials/pytorch/tut3_mixed_precision/README.md
@@ -326,7 +326,8 @@ train_dataloader = poptorch.DataLoader(opts,
                                        train_dataset,
                                        batch_size=12,
                                        shuffle=True,
-                                       num_workers=40)
+                                       num_workers=40,
+                                       mode=poptorch.DataLoaderMode.Async)
 ```
 
 We first make sure our model is in training mode, and then wrap it 
@@ -352,7 +353,14 @@ for epoch in tqdm(range(epochs), desc="epochs"):
         total_loss += loss
 ```
 
-Release resources:
+Release resources - detach IPU devices and also execute method `terminate` 
+of the `DataLoader` instance to fully terminate all worker threads.
+
+The need for terminating the workers manually arises from the fact that we use
+here the Asynchronous Data Loader `DataLoaderMode.Async` and that the data
+sample count is not exactly divisible by the resulting number of multiplied
+batch size and device count, leaving some workers waiting for their turn which
+might not happen due to training ending first before all samples are exhausted.
 
 
 ```python
diff --git a/tutorials/pytorch/tut3_mixed_precision/walkthrough.ipynb b/tutorials/pytorch/tut3_mixed_precision/walkthrough.ipynb
index 8a22a23..e43db45 100644
--- a/tutorials/pytorch/tut3_mixed_precision/walkthrough.ipynb
+++ b/tutorials/pytorch/tut3_mixed_precision/walkthrough.ipynb
@@ -2,7 +2,7 @@
  "cells": [
   {
    "cell_type": "markdown",
-   "id": "361135e5",
+   "id": "84ff54a1",
    "metadata": {},
    "source": [
     "Copyright (c) 2021 Graphcore Ltd. All rights reserved."
@@ -10,7 +10,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "42598a76",
+   "id": "1287678f",
    "metadata": {},
    "source": [
     "# Half and mixed precision in PopTorch\n",
@@ -21,7 +21,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "b1c6e9c1",
+   "id": "25cc0a0e",
    "metadata": {},
    "source": [
     "Requirements:\n",
@@ -32,7 +32,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "64c522f4",
+   "id": "a3f20c78",
    "metadata": {},
    "source": [
     "# General\n",
@@ -111,7 +111,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "c013519b",
+   "id": "302803c1",
    "metadata": {},
    "source": [
     "# Train a model in half precision\n",
@@ -121,7 +121,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "7d1c9c43",
+   "id": "e0df151b",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -135,7 +135,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f0c7db76",
+   "id": "2d3b2ded",
    "metadata": {},
    "source": [
     "## Build the model\n",
@@ -150,7 +150,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "00e3d1da",
+   "id": "b3455a58",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -182,7 +182,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "09ed56b4",
+   "id": "867321dc",
    "metadata": {},
    "source": [
     ">**NOTE:** The model inherits `self.training` from `torch.nn.Module` which \n",
@@ -192,7 +192,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "024a4b6f",
+   "id": "dd1d3b40",
    "metadata": {},
    "source": [
     "## Choose parameters. "
@@ -200,7 +200,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "edf340ec",
+   "id": "dd8464c5",
    "metadata": {},
    "source": [
     ">**NOTE** If you wish to modify these parameters for educational purposes, \n",
@@ -211,7 +211,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "e2054ea3",
+   "id": "c66a00ee",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -233,7 +233,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f089adaf",
+   "id": "4141b301",
    "metadata": {},
    "source": [
     "### Casting a model's parameters\n",
@@ -247,7 +247,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "70859fe9",
+   "id": "3d5783b2",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -259,7 +259,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "1a29ae87",
+   "id": "224d2987",
    "metadata": {},
    "source": [
     "For this tutorial, we will cast all the model's parameters to FP16."
@@ -267,7 +267,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "c5d22d42",
+   "id": "82e49951",
    "metadata": {},
    "source": [
     "### Casting a single layer's parameters\n",
@@ -288,7 +288,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e111b1c4",
+   "id": "9e7e1dfc",
    "metadata": {},
    "source": [
     "## Prepare the data\n",
@@ -303,7 +303,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "51193478",
+   "id": "6d881a6e",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -318,7 +318,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4d1f1c90",
+   "id": "c15cfd19",
    "metadata": {},
    "source": [
     "Pull the datasets if they are not available locally:"
@@ -327,7 +327,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "e9ba3b88",
+   "id": "e7fade2b",
    "metadata": {
     "tags": [
      "sst_hide_output"
@@ -347,7 +347,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "11915c5d",
+   "id": "0a2ef89f",
    "metadata": {},
    "source": [
     "If the model has not been converted to half precision, but the input data has, \n",
@@ -363,7 +363,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6a32cd2e",
+   "id": "488dcd0c",
    "metadata": {},
    "source": [
     "## Optimizers and loss scaling\n",
@@ -378,7 +378,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "0e8df0f3",
+   "id": "784ce840",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -393,7 +393,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4963bcc5",
+   "id": "843d8f4d",
    "metadata": {},
    "source": [
     "While higher values of `loss_scaling` minimize underflows, values that are \n",
@@ -404,7 +404,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "c6f83b39",
+   "id": "a7e2932c",
    "metadata": {},
    "source": [
     "## Set PopTorch's options\n",
@@ -421,7 +421,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "5b56232a",
+   "id": "6fded633",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -430,7 +430,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "ff7cb9fc",
+   "id": "1c126026",
    "metadata": {},
    "source": [
     ">**NOTE**: This tutorial has been designed to be run on a single IPU. \n",
@@ -442,7 +442,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "1aeb900b",
+   "id": "2bc652cb",
    "metadata": {},
    "source": [
     "### Stochastic rounding\n",
@@ -457,7 +457,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "d7b64985",
+   "id": "5545195d",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -467,7 +467,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "063f996e",
+   "id": "7783afa7",
    "metadata": {},
    "source": [
     "With the IPU Model, this option won't change anything since stochastic \n",
@@ -476,7 +476,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "b0943b35",
+   "id": "9c497a1c",
    "metadata": {},
    "source": [
     "### Partials data type\n",
@@ -494,7 +494,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "72430805",
+   "id": "09890024",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -506,7 +506,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "295ae9ae",
+   "id": "18a794a6",
    "metadata": {},
    "source": [
     "## Train the model\n",
@@ -519,7 +519,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "465ee569",
+   "id": "f87f28c8",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -527,12 +527,13 @@
     "                                       train_dataset,\n",
     "                                       batch_size=12,\n",
     "                                       shuffle=True,\n",
-    "                                       num_workers=40)"
+    "                                       num_workers=40,\n",
+    "                                       mode=poptorch.DataLoaderMode.Async)"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "44a6982b",
+   "id": "f22b42ab",
    "metadata": {},
    "source": [
     "We first make sure our model is in training mode, and then wrap it \n",
@@ -542,7 +543,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "362be4be",
+   "id": "cf1558e1",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -554,7 +555,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e1339ee2",
+   "id": "59de7d65",
    "metadata": {},
    "source": [
     "Let's run the training loop for 10 epochs."
@@ -563,7 +564,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "d5be305f",
+   "id": "7fc0d1d4",
    "metadata": {
     "tags": [
      "sst_hide_output"
@@ -581,16 +582,23 @@
   },
   {
    "cell_type": "markdown",
-   "id": "98201fbb",
+   "id": "c88d59e5",
    "metadata": {},
    "source": [
-    "Release resources:"
+    "Release resources - detach IPU devices and also execute method `terminate` \n",
+    "of the `DataLoader` instance to fully terminate all worker threads.\n",
+    "\n",
+    "The need for terminating the workers manually arises from the fact that we use\n",
+    "here the Asynchronous Data Loader `DataLoaderMode.Async` and that the data\n",
+    "sample count is not exactly divisible by the resulting number of multiplied\n",
+    "batch size and device count, leaving some workers waiting for their turn which\n",
+    "might not happen due to training ending first before all samples are exhausted."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "73a07c90",
+   "id": "ceb3727b",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -600,7 +608,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "66b01508",
+   "id": "da6e177d",
    "metadata": {},
    "source": [
     "Our new model is now trained and we can start its evaluation."
@@ -608,7 +616,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "31cce506",
+   "id": "18f0dda0",
    "metadata": {},
    "source": [
     "## Evaluate the model\n",
@@ -621,7 +629,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "f571e835",
+   "id": "5247fc5c",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -635,7 +643,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "ff4a3808",
+   "id": "68989a50",
    "metadata": {},
    "source": [
     "Run inference on the labelled data"
@@ -644,7 +652,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "79a13a62",
+   "id": "4194bc3d",
    "metadata": {
     "tags": [
      "sst_hide_output"
@@ -660,7 +668,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "007c5124",
+   "id": "e915c824",
    "metadata": {},
    "source": [
     "Release resources:"
@@ -669,7 +677,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "890e2a4a",
+   "id": "39f3d4c3",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -679,7 +687,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e20eff4c",
+   "id": "92337977",
    "metadata": {},
    "source": [
     "We obtained an accuracy of approximately 84% on the test dataset."
@@ -688,7 +696,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "04a6c253",
+   "id": "2ae6b0aa",
    "metadata": {
     "tags": [
      "sst_hide_output"
@@ -703,7 +711,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "caf99585",
+   "id": "aaa8fe93",
    "metadata": {},
    "source": [
     "# Visualise the memory footprint\n",
@@ -738,7 +746,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "989d5f60",
+   "id": "27a891c8",
    "metadata": {},
    "source": [
     "# PopTorch tracing and casting\n",
@@ -763,7 +771,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "cfdab3d6",
+   "id": "09f800b3",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -779,7 +787,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "7d239c7d",
+   "id": "871f1968",
    "metadata": {},
    "source": [
     "Native PyTorch results in a FP32 tensor:"
@@ -788,7 +796,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "7de7a6c1",
+   "id": "803918c7",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -797,7 +805,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "3f6cee96",
+   "id": "c52dc289",
    "metadata": {},
    "source": [
     "Let's instantiate default PopTorch `Options` for IPUs:"
@@ -806,7 +814,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "f35be7dc",
+   "id": "4f6e4f32",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -815,7 +823,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f6a2a1ba",
+   "id": "6b1ad38b",
    "metadata": {},
    "source": [
     "PopTorch results in a FP16 tensor:"
@@ -824,7 +832,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "8704a74b",
+   "id": "9410f77d",
    "metadata": {
     "tags": [
      "sst_hide_output"
@@ -838,7 +846,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4ffdefb4",
+   "id": "67bcebe8",
    "metadata": {},
    "source": [
     "This option makes the same PopTorch example result in an FP32 tensor:"
@@ -847,7 +855,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "dd060e76",
+   "id": "4f05b8c9",
    "metadata": {
     "tags": [
      "sst_hide_output"
@@ -864,7 +872,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "d899dc41",
+   "id": "f24ed967",
    "metadata": {},
    "source": [
     "Release IPU resources:"
@@ -873,7 +881,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "0d027f11",
+   "id": "4b9ab787",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -882,7 +890,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "17b89719",
+   "id": "a8c5d056",
    "metadata": {},
    "source": [
     "# Summary\n",
diff --git a/tutorials/pytorch/tut3_mixed_precision/walkthrough.py b/tutorials/pytorch/tut3_mixed_precision/walkthrough.py
index 0252702..dd5d218 100644
--- a/tutorials/pytorch/tut3_mixed_precision/walkthrough.py
+++ b/tutorials/pytorch/tut3_mixed_precision/walkthrough.py
@@ -311,7 +311,8 @@ def forward(self, x, labels=None):
                                        train_dataset,
                                        batch_size=12,
                                        shuffle=True,
-                                       num_workers=40)
+                                       num_workers=40,
+                                       mode=poptorch.DataLoaderMode.Async)
 """
 We first make sure our model is in training mode, and then wrap it 
 with `poptorch.trainingModel`.
@@ -331,7 +332,14 @@ def forward(self, x, labels=None):
         total_loss += loss
 # sst_hide_output
 """
-Release resources:
+Release resources - detach IPU devices and also execute method `terminate` 
+of the `DataLoader` instance to fully terminate all worker threads.
+
+The need for terminating the workers manually arises from the fact that we use
+here the Asynchronous Data Loader `DataLoaderMode.Async` and that the data
+sample count is not exactly divisible by the resulting number of multiplied
+batch size and device count, leaving some workers waiting for their turn which
+might not happen due to training ending first before all samples are exhausted.
 """
 poptorch_model.detachFromDevice()
 train_dataloader.terminate()
diff --git a/tutorials/pytorch/tut3_mixed_precision/walkthrough_code_only.py b/tutorials/pytorch/tut3_mixed_precision/walkthrough_code_only.py
index f74e169..60c66ce 100644
--- a/tutorials/pytorch/tut3_mixed_precision/walkthrough_code_only.py
+++ b/tutorials/pytorch/tut3_mixed_precision/walkthrough_code_only.py
@@ -90,7 +90,8 @@ def forward(self, x, labels=None):
                                        train_dataset,
                                        batch_size=12,
                                        shuffle=True,
-                                       num_workers=40)
+                                       num_workers=40,
+                                       mode=poptorch.DataLoaderMode.Async)
 
 model.train()
 poptorch_model = poptorch.trainingModel(model,

From e5c4e13f01f50f80ce4e904555c1da90bf66602e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Adam=20Wr=C3=B3bel?= <adam.wrobel@deepsense.ai>
Date: Tue, 12 Oct 2021 14:03:04 +0200
Subject: [PATCH 4/5] Review fixes - revert using PopTorch DataLoader in Async
 mode in this tutorial

---
 .../pytorch/tut3_mixed_precision/README.md    |  14 +-
 .../tut3_mixed_precision/walkthrough.ipynb    | 144 ++++++++----------
 .../tut3_mixed_precision/walkthrough.py       |  14 +-
 .../walkthrough_code_only.py                  |   5 +-
 4 files changed, 72 insertions(+), 105 deletions(-)

diff --git a/tutorials/pytorch/tut3_mixed_precision/README.md b/tutorials/pytorch/tut3_mixed_precision/README.md
index 56780d0..de6ab39 100644
--- a/tutorials/pytorch/tut3_mixed_precision/README.md
+++ b/tutorials/pytorch/tut3_mixed_precision/README.md
@@ -326,8 +326,7 @@ train_dataloader = poptorch.DataLoader(opts,
                                        train_dataset,
                                        batch_size=12,
                                        shuffle=True,
-                                       num_workers=40,
-                                       mode=poptorch.DataLoaderMode.Async)
+                                       num_workers=40)
 ```
 
 We first make sure our model is in training mode, and then wrap it 
@@ -353,19 +352,11 @@ for epoch in tqdm(range(epochs), desc="epochs"):
         total_loss += loss
 ```
 
-Release resources - detach IPU devices and also execute method `terminate` 
-of the `DataLoader` instance to fully terminate all worker threads.
-
-The need for terminating the workers manually arises from the fact that we use
-here the Asynchronous Data Loader `DataLoaderMode.Async` and that the data
-sample count is not exactly divisible by the resulting number of multiplied
-batch size and device count, leaving some workers waiting for their turn which
-might not happen due to training ending first before all samples are exhausted.
+Release IPU resources.
 
 
 ```python
 poptorch_model.detachFromDevice()
-train_dataloader.terminate()
 ```
 
 Our new model is now trained and we can start its evaluation.
@@ -401,7 +392,6 @@ Release resources:
 
 ```python
 poptorch_model_inf.detachFromDevice()
-test_dataloader.terminate()
 ```
 
 We obtained an accuracy of approximately 84% on the test dataset.
diff --git a/tutorials/pytorch/tut3_mixed_precision/walkthrough.ipynb b/tutorials/pytorch/tut3_mixed_precision/walkthrough.ipynb
index e43db45..81a319f 100644
--- a/tutorials/pytorch/tut3_mixed_precision/walkthrough.ipynb
+++ b/tutorials/pytorch/tut3_mixed_precision/walkthrough.ipynb
@@ -2,7 +2,7 @@
  "cells": [
   {
    "cell_type": "markdown",
-   "id": "84ff54a1",
+   "id": "1be6a4f4",
    "metadata": {},
    "source": [
     "Copyright (c) 2021 Graphcore Ltd. All rights reserved."
@@ -10,7 +10,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "1287678f",
+   "id": "7b050006",
    "metadata": {},
    "source": [
     "# Half and mixed precision in PopTorch\n",
@@ -21,7 +21,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "25cc0a0e",
+   "id": "d594f658",
    "metadata": {},
    "source": [
     "Requirements:\n",
@@ -32,7 +32,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a3f20c78",
+   "id": "c7519b27",
    "metadata": {},
    "source": [
     "# General\n",
@@ -111,7 +111,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "302803c1",
+   "id": "81d67c63",
    "metadata": {},
    "source": [
     "# Train a model in half precision\n",
@@ -121,7 +121,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "e0df151b",
+   "id": "c431ae7a",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -135,7 +135,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "2d3b2ded",
+   "id": "df1aca0b",
    "metadata": {},
    "source": [
     "## Build the model\n",
@@ -150,7 +150,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "b3455a58",
+   "id": "bbd9ab26",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -182,7 +182,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "867321dc",
+   "id": "d074e12d",
    "metadata": {},
    "source": [
     ">**NOTE:** The model inherits `self.training` from `torch.nn.Module` which \n",
@@ -192,7 +192,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "dd1d3b40",
+   "id": "c20a87ea",
    "metadata": {},
    "source": [
     "## Choose parameters. "
@@ -200,7 +200,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "dd8464c5",
+   "id": "70b8a28c",
    "metadata": {},
    "source": [
     ">**NOTE** If you wish to modify these parameters for educational purposes, \n",
@@ -211,7 +211,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "c66a00ee",
+   "id": "455f5723",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -233,7 +233,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4141b301",
+   "id": "bbc16771",
    "metadata": {},
    "source": [
     "### Casting a model's parameters\n",
@@ -247,7 +247,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "3d5783b2",
+   "id": "8ec71654",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -259,7 +259,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "224d2987",
+   "id": "48ee2c76",
    "metadata": {},
    "source": [
     "For this tutorial, we will cast all the model's parameters to FP16."
@@ -267,7 +267,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "82e49951",
+   "id": "a0f55cdc",
    "metadata": {},
    "source": [
     "### Casting a single layer's parameters\n",
@@ -288,7 +288,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "9e7e1dfc",
+   "id": "1f7ffece",
    "metadata": {},
    "source": [
     "## Prepare the data\n",
@@ -303,7 +303,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "6d881a6e",
+   "id": "fa733db3",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -318,7 +318,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "c15cfd19",
+   "id": "04b9e061",
    "metadata": {},
    "source": [
     "Pull the datasets if they are not available locally:"
@@ -327,7 +327,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "e7fade2b",
+   "id": "6ce135e9",
    "metadata": {
     "tags": [
      "sst_hide_output"
@@ -347,7 +347,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "0a2ef89f",
+   "id": "3dbc763d",
    "metadata": {},
    "source": [
     "If the model has not been converted to half precision, but the input data has, \n",
@@ -363,7 +363,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "488dcd0c",
+   "id": "15b8776b",
    "metadata": {},
    "source": [
     "## Optimizers and loss scaling\n",
@@ -378,7 +378,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "784ce840",
+   "id": "2dc9d123",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -393,7 +393,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "843d8f4d",
+   "id": "689887ea",
    "metadata": {},
    "source": [
     "While higher values of `loss_scaling` minimize underflows, values that are \n",
@@ -404,7 +404,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a7e2932c",
+   "id": "e92bdbcf",
    "metadata": {},
    "source": [
     "## Set PopTorch's options\n",
@@ -421,7 +421,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "6fded633",
+   "id": "e578030c",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -430,7 +430,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "1c126026",
+   "id": "5e7f2d4d",
    "metadata": {},
    "source": [
     ">**NOTE**: This tutorial has been designed to be run on a single IPU. \n",
@@ -442,7 +442,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "2bc652cb",
+   "id": "ebb65211",
    "metadata": {},
    "source": [
     "### Stochastic rounding\n",
@@ -457,7 +457,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "5545195d",
+   "id": "83f40051",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -467,7 +467,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "7783afa7",
+   "id": "d720ee52",
    "metadata": {},
    "source": [
     "With the IPU Model, this option won't change anything since stochastic \n",
@@ -476,7 +476,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "9c497a1c",
+   "id": "b6509ce3",
    "metadata": {},
    "source": [
     "### Partials data type\n",
@@ -494,7 +494,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "09890024",
+   "id": "01090dff",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -506,7 +506,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "18a794a6",
+   "id": "9be12405",
    "metadata": {},
    "source": [
     "## Train the model\n",
@@ -519,7 +519,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "f87f28c8",
+   "id": "ab7f5cd6",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -527,13 +527,12 @@
     "                                       train_dataset,\n",
     "                                       batch_size=12,\n",
     "                                       shuffle=True,\n",
-    "                                       num_workers=40,\n",
-    "                                       mode=poptorch.DataLoaderMode.Async)"
+    "                                       num_workers=40)"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "f22b42ab",
+   "id": "42e67559",
    "metadata": {},
    "source": [
     "We first make sure our model is in training mode, and then wrap it \n",
@@ -543,7 +542,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "cf1558e1",
+   "id": "ab093ceb",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -555,7 +554,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "59de7d65",
+   "id": "e42a8284",
    "metadata": {},
    "source": [
     "Let's run the training loop for 10 epochs."
@@ -564,7 +563,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "7fc0d1d4",
+   "id": "cceb4dfd",
    "metadata": {
     "tags": [
      "sst_hide_output"
@@ -582,33 +581,25 @@
   },
   {
    "cell_type": "markdown",
-   "id": "c88d59e5",
+   "id": "f7c9cfc9",
    "metadata": {},
    "source": [
-    "Release resources - detach IPU devices and also execute method `terminate` \n",
-    "of the `DataLoader` instance to fully terminate all worker threads.\n",
-    "\n",
-    "The need for terminating the workers manually arises from the fact that we use\n",
-    "here the Asynchronous Data Loader `DataLoaderMode.Async` and that the data\n",
-    "sample count is not exactly divisible by the resulting number of multiplied\n",
-    "batch size and device count, leaving some workers waiting for their turn which\n",
-    "might not happen due to training ending first before all samples are exhausted."
+    "Release IPU resources."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "ceb3727b",
+   "id": "41c89657",
    "metadata": {},
    "outputs": [],
    "source": [
-    "poptorch_model.detachFromDevice()\n",
-    "train_dataloader.terminate()"
+    "poptorch_model.detachFromDevice()"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "da6e177d",
+   "id": "7d3d3d4a",
    "metadata": {},
    "source": [
     "Our new model is now trained and we can start its evaluation."
@@ -616,7 +607,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "18f0dda0",
+   "id": "a02724ef",
    "metadata": {},
    "source": [
     "## Evaluate the model\n",
@@ -629,7 +620,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "5247fc5c",
+   "id": "320fa86a",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -643,7 +634,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "68989a50",
+   "id": "edbc8998",
    "metadata": {},
    "source": [
     "Run inference on the labelled data"
@@ -652,7 +643,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "4194bc3d",
+   "id": "d7057092",
    "metadata": {
     "tags": [
      "sst_hide_output"
@@ -668,7 +659,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e915c824",
+   "id": "453ce938",
    "metadata": {},
    "source": [
     "Release resources:"
@@ -677,17 +668,16 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "39f3d4c3",
+   "id": "7b11bc76",
    "metadata": {},
    "outputs": [],
    "source": [
-    "poptorch_model_inf.detachFromDevice()\n",
-    "test_dataloader.terminate()"
+    "poptorch_model_inf.detachFromDevice()"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "92337977",
+   "id": "cb3ca5ba",
    "metadata": {},
    "source": [
     "We obtained an accuracy of approximately 84% on the test dataset."
@@ -696,7 +686,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "2ae6b0aa",
+   "id": "bfe27f31",
    "metadata": {
     "tags": [
      "sst_hide_output"
@@ -711,7 +701,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "aaa8fe93",
+   "id": "64ac01ea",
    "metadata": {},
    "source": [
     "# Visualise the memory footprint\n",
@@ -746,7 +736,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "27a891c8",
+   "id": "1baa95a8",
    "metadata": {},
    "source": [
     "# PopTorch tracing and casting\n",
@@ -771,7 +761,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "09f800b3",
+   "id": "a62d2bc7",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -787,7 +777,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "871f1968",
+   "id": "8223a1a8",
    "metadata": {},
    "source": [
     "Native PyTorch results in a FP32 tensor:"
@@ -796,7 +786,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "803918c7",
+   "id": "1b1de1c6",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -805,7 +795,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "c52dc289",
+   "id": "19813d1c",
    "metadata": {},
    "source": [
     "Let's instantiate default PopTorch `Options` for IPUs:"
@@ -814,7 +804,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "4f6e4f32",
+   "id": "0f599f91",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -823,7 +813,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6b1ad38b",
+   "id": "ee5c396e",
    "metadata": {},
    "source": [
     "PopTorch results in a FP16 tensor:"
@@ -832,7 +822,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "9410f77d",
+   "id": "c13784c3",
    "metadata": {
     "tags": [
      "sst_hide_output"
@@ -846,7 +836,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "67bcebe8",
+   "id": "70271578",
    "metadata": {},
    "source": [
     "This option makes the same PopTorch example result in an FP32 tensor:"
@@ -855,7 +845,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "4f05b8c9",
+   "id": "cf06d331",
    "metadata": {
     "tags": [
      "sst_hide_output"
@@ -872,7 +862,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f24ed967",
+   "id": "2b5a6c34",
    "metadata": {},
    "source": [
     "Release IPU resources:"
@@ -881,7 +871,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "4b9ab787",
+   "id": "962eb459",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -890,7 +880,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a8c5d056",
+   "id": "81dee4dd",
    "metadata": {},
    "source": [
     "# Summary\n",
diff --git a/tutorials/pytorch/tut3_mixed_precision/walkthrough.py b/tutorials/pytorch/tut3_mixed_precision/walkthrough.py
index dd5d218..23b7b92 100644
--- a/tutorials/pytorch/tut3_mixed_precision/walkthrough.py
+++ b/tutorials/pytorch/tut3_mixed_precision/walkthrough.py
@@ -311,8 +311,7 @@ def forward(self, x, labels=None):
                                        train_dataset,
                                        batch_size=12,
                                        shuffle=True,
-                                       num_workers=40,
-                                       mode=poptorch.DataLoaderMode.Async)
+                                       num_workers=40)
 """
 We first make sure our model is in training mode, and then wrap it 
 with `poptorch.trainingModel`.
@@ -332,17 +331,9 @@ def forward(self, x, labels=None):
         total_loss += loss
 # sst_hide_output
 """
-Release resources - detach IPU devices and also execute method `terminate` 
-of the `DataLoader` instance to fully terminate all worker threads.
-
-The need for terminating the workers manually arises from the fact that we use
-here the Asynchronous Data Loader `DataLoaderMode.Async` and that the data
-sample count is not exactly divisible by the resulting number of multiplied
-batch size and device count, leaving some workers waiting for their turn which
-might not happen due to training ending first before all samples are exhausted.
+Release IPU resources.
 """
 poptorch_model.detachFromDevice()
-train_dataloader.terminate()
 """
 Our new model is now trained and we can start its evaluation.
 """
@@ -371,7 +362,6 @@ def forward(self, x, labels=None):
 Release resources:
 """
 poptorch_model_inf.detachFromDevice()
-test_dataloader.terminate()
 """
 We obtained an accuracy of approximately 84% on the test dataset.
 """
diff --git a/tutorials/pytorch/tut3_mixed_precision/walkthrough_code_only.py b/tutorials/pytorch/tut3_mixed_precision/walkthrough_code_only.py
index 60c66ce..71d461c 100644
--- a/tutorials/pytorch/tut3_mixed_precision/walkthrough_code_only.py
+++ b/tutorials/pytorch/tut3_mixed_precision/walkthrough_code_only.py
@@ -90,8 +90,7 @@ def forward(self, x, labels=None):
                                        train_dataset,
                                        batch_size=12,
                                        shuffle=True,
-                                       num_workers=40,
-                                       mode=poptorch.DataLoaderMode.Async)
+                                       num_workers=40)
 
 model.train()
 poptorch_model = poptorch.trainingModel(model,
@@ -106,7 +105,6 @@ def forward(self, x, labels=None):
         total_loss += loss
 
 poptorch_model.detachFromDevice()
-train_dataloader.terminate()
 
 model.eval()
 poptorch_model_inf = poptorch.inferenceModel(model, options=opts)
@@ -121,7 +119,6 @@ def forward(self, x, labels=None):
     labels += label
 
 poptorch_model_inf.detachFromDevice()
-test_dataloader.terminate()
 
 print(f"""Eval accuracy on IPU: {100 *
                 (1 - torch.count_nonzero(torch.sub(torch.tensor(labels),

From b94565e30e511bd6040e62562819002a18a15f1a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Adam=20Wr=C3=B3bel?= <adam.wrobel@deepsense.ai>
Date: Tue, 12 Oct 2021 18:33:43 +0200
Subject: [PATCH 5/5]  Pytorch tutorial 3 - review - fix small case link

---
 .../pytorch/tut3_mixed_precision/README.md    |   2 +-
 .../tut3_mixed_precision/walkthrough.ipynb    | 128 +++++++++---------
 .../tut3_mixed_precision/walkthrough.py       |   2 +-
 3 files changed, 66 insertions(+), 66 deletions(-)

diff --git a/tutorials/pytorch/tut3_mixed_precision/README.md b/tutorials/pytorch/tut3_mixed_precision/README.md
index de6ab39..6d41e9d 100644
--- a/tutorials/pytorch/tut3_mixed_precision/README.md
+++ b/tutorials/pytorch/tut3_mixed_precision/README.md
@@ -262,7 +262,7 @@ a hyperparameter for you to tune.
 To configure some features of the IPU and to be able to use PopTorch's classes 
 in the next sections, we will need to create an instance of `poptorch.Options` 
 which stores the options we will be using. We covered some of the available 
-options in the: [Introductory Tutorial for PopTorch](https://github.com/graphcore/examples/tree/master/tutorials/pytorch/tut1_basics).
+options in the: [introductory tutorial for PopTorch](https://github.com/graphcore/examples/tree/master/tutorials/pytorch/tut1_basics).
 
 Let's initialise our options object before we talk about the options 
 we will use:
diff --git a/tutorials/pytorch/tut3_mixed_precision/walkthrough.ipynb b/tutorials/pytorch/tut3_mixed_precision/walkthrough.ipynb
index 81a319f..34b2e90 100644
--- a/tutorials/pytorch/tut3_mixed_precision/walkthrough.ipynb
+++ b/tutorials/pytorch/tut3_mixed_precision/walkthrough.ipynb
@@ -2,7 +2,7 @@
  "cells": [
   {
    "cell_type": "markdown",
-   "id": "1be6a4f4",
+   "id": "bf58ee5e",
    "metadata": {},
    "source": [
     "Copyright (c) 2021 Graphcore Ltd. All rights reserved."
@@ -10,7 +10,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "7b050006",
+   "id": "6c7af396",
    "metadata": {},
    "source": [
     "# Half and mixed precision in PopTorch\n",
@@ -21,7 +21,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "d594f658",
+   "id": "71064d76",
    "metadata": {},
    "source": [
     "Requirements:\n",
@@ -32,7 +32,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "c7519b27",
+   "id": "2ced041b",
    "metadata": {},
    "source": [
     "# General\n",
@@ -111,7 +111,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "81d67c63",
+   "id": "9a4f582d",
    "metadata": {},
    "source": [
     "# Train a model in half precision\n",
@@ -121,7 +121,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "c431ae7a",
+   "id": "0f0ca498",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -135,7 +135,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "df1aca0b",
+   "id": "72478a7c",
    "metadata": {},
    "source": [
     "## Build the model\n",
@@ -150,7 +150,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "bbd9ab26",
+   "id": "cfa280d0",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -182,7 +182,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "d074e12d",
+   "id": "1037c4b8",
    "metadata": {},
    "source": [
     ">**NOTE:** The model inherits `self.training` from `torch.nn.Module` which \n",
@@ -192,7 +192,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "c20a87ea",
+   "id": "ce24c491",
    "metadata": {},
    "source": [
     "## Choose parameters. "
@@ -200,7 +200,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "70b8a28c",
+   "id": "7e61d158",
    "metadata": {},
    "source": [
     ">**NOTE** If you wish to modify these parameters for educational purposes, \n",
@@ -211,7 +211,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "455f5723",
+   "id": "b2b9679f",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -233,7 +233,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "bbc16771",
+   "id": "5fb8ffbd",
    "metadata": {},
    "source": [
     "### Casting a model's parameters\n",
@@ -247,7 +247,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "8ec71654",
+   "id": "ac671db7",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -259,7 +259,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "48ee2c76",
+   "id": "4fe5d6cc",
    "metadata": {},
    "source": [
     "For this tutorial, we will cast all the model's parameters to FP16."
@@ -267,7 +267,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a0f55cdc",
+   "id": "6fa927f9",
    "metadata": {},
    "source": [
     "### Casting a single layer's parameters\n",
@@ -288,7 +288,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "1f7ffece",
+   "id": "16bc9b6f",
    "metadata": {},
    "source": [
     "## Prepare the data\n",
@@ -303,7 +303,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "fa733db3",
+   "id": "d59c2aa5",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -318,7 +318,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "04b9e061",
+   "id": "bb0b69a1",
    "metadata": {},
    "source": [
     "Pull the datasets if they are not available locally:"
@@ -327,7 +327,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "6ce135e9",
+   "id": "49cf2df7",
    "metadata": {
     "tags": [
      "sst_hide_output"
@@ -347,7 +347,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "3dbc763d",
+   "id": "7376cb60",
    "metadata": {},
    "source": [
     "If the model has not been converted to half precision, but the input data has, \n",
@@ -363,7 +363,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "15b8776b",
+   "id": "4b274313",
    "metadata": {},
    "source": [
     "## Optimizers and loss scaling\n",
@@ -378,7 +378,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "2dc9d123",
+   "id": "69e17e54",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -393,7 +393,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "689887ea",
+   "id": "1e82e625",
    "metadata": {},
    "source": [
     "While higher values of `loss_scaling` minimize underflows, values that are \n",
@@ -404,7 +404,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e92bdbcf",
+   "id": "fb777bca",
    "metadata": {},
    "source": [
     "## Set PopTorch's options\n",
@@ -412,7 +412,7 @@
     "To configure some features of the IPU and to be able to use PopTorch's classes \n",
     "in the next sections, we will need to create an instance of `poptorch.Options` \n",
     "which stores the options we will be using. We covered some of the available \n",
-    "options in the: [Introductory Tutorial for PopTorch](https://github.com/graphcore/examples/tree/master/tutorials/pytorch/tut1_basics).\n",
+    "options in the: [introductory tutorial for PopTorch](https://github.com/graphcore/examples/tree/master/tutorials/pytorch/tut1_basics).\n",
     "\n",
     "Let's initialise our options object before we talk about the options \n",
     "we will use:"
@@ -421,7 +421,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "e578030c",
+   "id": "f3f03a4e",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -430,7 +430,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "5e7f2d4d",
+   "id": "faf7319a",
    "metadata": {},
    "source": [
     ">**NOTE**: This tutorial has been designed to be run on a single IPU. \n",
@@ -442,7 +442,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "ebb65211",
+   "id": "e499531b",
    "metadata": {},
    "source": [
     "### Stochastic rounding\n",
@@ -457,7 +457,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "83f40051",
+   "id": "c205011a",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -467,7 +467,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "d720ee52",
+   "id": "199f4ea8",
    "metadata": {},
    "source": [
     "With the IPU Model, this option won't change anything since stochastic \n",
@@ -476,7 +476,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "b6509ce3",
+   "id": "dfb3c4cd",
    "metadata": {},
    "source": [
     "### Partials data type\n",
@@ -494,7 +494,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "01090dff",
+   "id": "a7473d4f",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -506,7 +506,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "9be12405",
+   "id": "eedc74c3",
    "metadata": {},
    "source": [
     "## Train the model\n",
@@ -519,7 +519,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "ab7f5cd6",
+   "id": "37f946ea",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -532,7 +532,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "42e67559",
+   "id": "5bb2014e",
    "metadata": {},
    "source": [
     "We first make sure our model is in training mode, and then wrap it \n",
@@ -542,7 +542,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "ab093ceb",
+   "id": "d9cdd3df",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -554,7 +554,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e42a8284",
+   "id": "1f3d47ea",
    "metadata": {},
    "source": [
     "Let's run the training loop for 10 epochs."
@@ -563,7 +563,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "cceb4dfd",
+   "id": "cf74abbb",
    "metadata": {
     "tags": [
      "sst_hide_output"
@@ -581,7 +581,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f7c9cfc9",
+   "id": "08566d88",
    "metadata": {},
    "source": [
     "Release IPU resources."
@@ -590,7 +590,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "41c89657",
+   "id": "dc89cc3f",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -599,7 +599,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "7d3d3d4a",
+   "id": "c8626043",
    "metadata": {},
    "source": [
     "Our new model is now trained and we can start its evaluation."
@@ -607,7 +607,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a02724ef",
+   "id": "b08649ed",
    "metadata": {},
    "source": [
     "## Evaluate the model\n",
@@ -620,7 +620,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "320fa86a",
+   "id": "b8946b1c",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -634,7 +634,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "edbc8998",
+   "id": "c47c1c97",
    "metadata": {},
    "source": [
     "Run inference on the labelled data"
@@ -643,7 +643,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "d7057092",
+   "id": "8e2c3f08",
    "metadata": {
     "tags": [
      "sst_hide_output"
@@ -659,7 +659,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "453ce938",
+   "id": "3fac5c2c",
    "metadata": {},
    "source": [
     "Release resources:"
@@ -668,7 +668,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "7b11bc76",
+   "id": "97e9f8ba",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -677,7 +677,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "cb3ca5ba",
+   "id": "a702544b",
    "metadata": {},
    "source": [
     "We obtained an accuracy of approximately 84% on the test dataset."
@@ -686,7 +686,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "bfe27f31",
+   "id": "994196c4",
    "metadata": {
     "tags": [
      "sst_hide_output"
@@ -701,7 +701,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "64ac01ea",
+   "id": "af117a9c",
    "metadata": {},
    "source": [
     "# Visualise the memory footprint\n",
@@ -736,7 +736,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "1baa95a8",
+   "id": "0f2d530f",
    "metadata": {},
    "source": [
     "# PopTorch tracing and casting\n",
@@ -761,7 +761,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "a62d2bc7",
+   "id": "65e44e27",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -777,7 +777,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "8223a1a8",
+   "id": "fd4d6479",
    "metadata": {},
    "source": [
     "Native PyTorch results in a FP32 tensor:"
@@ -786,7 +786,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "1b1de1c6",
+   "id": "dc887a5b",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -795,7 +795,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "19813d1c",
+   "id": "4dcdb13e",
    "metadata": {},
    "source": [
     "Let's instantiate default PopTorch `Options` for IPUs:"
@@ -804,7 +804,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "0f599f91",
+   "id": "0734b0ce",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -813,7 +813,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "ee5c396e",
+   "id": "c1811917",
    "metadata": {},
    "source": [
     "PopTorch results in a FP16 tensor:"
@@ -822,7 +822,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "c13784c3",
+   "id": "be76b175",
    "metadata": {
     "tags": [
      "sst_hide_output"
@@ -836,7 +836,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "70271578",
+   "id": "e77fafa2",
    "metadata": {},
    "source": [
     "This option makes the same PopTorch example result in an FP32 tensor:"
@@ -845,7 +845,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "cf06d331",
+   "id": "eb5a3efe",
    "metadata": {
     "tags": [
      "sst_hide_output"
@@ -862,7 +862,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "2b5a6c34",
+   "id": "cd0b856c",
    "metadata": {},
    "source": [
     "Release IPU resources:"
@@ -871,7 +871,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "962eb459",
+   "id": "24c889ed",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -880,7 +880,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "81dee4dd",
+   "id": "341d1080",
    "metadata": {},
    "source": [
     "# Summary\n",
diff --git a/tutorials/pytorch/tut3_mixed_precision/walkthrough.py b/tutorials/pytorch/tut3_mixed_precision/walkthrough.py
index 23b7b92..2ac09d2 100644
--- a/tutorials/pytorch/tut3_mixed_precision/walkthrough.py
+++ b/tutorials/pytorch/tut3_mixed_precision/walkthrough.py
@@ -256,7 +256,7 @@ def forward(self, x, labels=None):
 To configure some features of the IPU and to be able to use PopTorch's classes 
 in the next sections, we will need to create an instance of `poptorch.Options` 
 which stores the options we will be using. We covered some of the available 
-options in the: [Introductory Tutorial for PopTorch](https://github.com/graphcore/examples/tree/master/tutorials/pytorch/tut1_basics).
+options in the: [introductory tutorial for PopTorch](https://github.com/graphcore/examples/tree/master/tutorials/pytorch/tut1_basics).
 
 Let's initialise our options object before we talk about the options 
 we will use: