From c9d7780716f575abdec748591ce259f60e03ce55 Mon Sep 17 00:00:00 2001
From: Arnaud Gelas <arnaudgelas@gmail.com>
Date: Wed, 6 Jan 2021 21:13:01 +0100
Subject: [PATCH 1/2] Fix pre-commit trailing-whitespace and end-of-file-fixer
 hooks.

---
 .github/BECOMING_A_CORE_CONTRIBUTOR.md    |  18 ++--
 .github/ISSUE_TEMPLATE/documentation.md   |   2 +-
 .github/ISSUE_TEMPLATE/how-to-question.md |   8 +-
 .github/workflows/docs-checks.yml         |   1 -
 MANIFEST.in                               |   1 -
 docs/.build_docs.sh                       |   2 +-
 docs/Makefile                             |   2 +-
 docs/source/_static/main.css              |   2 +-
 docs/source/asr_nlp_tts.rst               | 104 +++++++++++-----------
 docs/source/cloud_training.rst            |   2 +-
 docs/source/datamodules.rst               |   2 +-
 docs/source/introduction_guide.rst        |   6 +-
 docs/source/loggers.rst                   |   4 +-
 docs/source/lr_finder.rst                 |  28 +++---
 docs/source/new-project.rst               |  18 ++--
 docs/source/optimizers.rst                |   2 +-
 docs/source/sequences.rst                 |   2 +-
 docs/source/slurm.rst                     |   2 +-
 docs/source/test_set.rst                  |   7 +-
 docs/source/training_tricks.rst           |   2 +-
 docs/source/transfer_learning.rst         |   2 +-
 docs/source/weights_loading.rst           |   2 +-
 pl_examples/README.md                     |   6 +-
 pl_examples/basic_examples/README.md      |  20 ++---
 requirements/devel.txt                    |   2 +-
 requirements/docs.txt                     |   2 +-
 requirements/examples.txt                 |   2 +-
 requirements/loggers.txt                  |   2 +-
 tests/README.md                           |   4 +-
 29 files changed, 126 insertions(+), 131 deletions(-)

diff --git a/.github/BECOMING_A_CORE_CONTRIBUTOR.md b/.github/BECOMING_A_CORE_CONTRIBUTOR.md
index 3fa357ef062ca..828f45aedbecc 100644
--- a/.github/BECOMING_A_CORE_CONTRIBUTOR.md
+++ b/.github/BECOMING_A_CORE_CONTRIBUTOR.md
@@ -1,14 +1,14 @@
 # How to become a core contributor
 
-Thanks for your interest in joining the Lightning team! We’re a rapidly growing project which is poised to become the go-to framework for DL researchers! 
-We're currently recruiting for a team of 5 core maintainers. 
+Thanks for your interest in joining the Lightning team! We’re a rapidly growing project which is poised to become the go-to framework for DL researchers!
+We're currently recruiting for a team of 5 core maintainers.
 
 As a core maintainer you will have a strong say in the direction of the project. Big changes will require a majority of maintainers to agree.
 
-### Code of conduct  
+### Code of conduct
 First and foremost, you'll be evaluated against [these core values](https://github.com/PyTorchLightning/pytorch-lightning/blob/master/.github/CONTRIBUTING.md). Any code we commit or feature we add needs to align with those core values.
 
-### The bar for joining the team   
+### The bar for joining the team
 Lightning is being used to solve really hard problems at the top AI labs in the world. As such, the bar for adding team members is extremely high. Candidates must have solid engineering skills, have a good eye for user experience, and must be a power user of Lightning and PyTorch.
 
 With that said, the Lightning team will be diverse and a reflection of an inclusive AI community. You don't have to be an engineer to contribute! Scientists with great usability intuition and PyTorch ninja skills are welcomed!
@@ -36,10 +36,10 @@ Pleasant/helpful tone.
 - Code is NOT overly engineered or hard to read
 - Ask yourself, could a non-engineer understand what’s happening here?
 - Make sure new tests are written
-- Is this NECESSARY for Lightning? There are some PRs which are just purely about adding engineering complexity which have no place in Lightning. 
+- Is this NECESSARY for Lightning? There are some PRs which are just purely about adding engineering complexity which have no place in Lightning.
 Guidance
 - Some other PRs are for people who are wanting to get involved and add something unnecessary. We do want their help though! So don’t approve the PR, but direct them to a Github issue that they might be interested in helping with instead!
-- To be considered for core contributor, please review 10 PRs and help the authors land it on master. Once you've finished the review, ping me 
+- To be considered for core contributor, please review 10 PRs and help the authors land it on master. Once you've finished the review, ping me
 for a sanity check. At the end of 10 PRs if your PR reviews are inline with expectations described above, then you can merge PRs on your own going forward,
 otherwise we'll do a few more until we're both comfortable :)
 
@@ -47,15 +47,15 @@ otherwise we'll do a few more until we're both comfortable :)
 There are some big decisions which the project must make. For these I expect core contributors to have something meaningful to add if it’s their area of expertise.
 
 #### Diversity
-Lightning should reflect the broader community it serves. As such we should have scientists/researchers from 
-different fields contributing!   
+Lightning should reflect the broader community it serves. As such we should have scientists/researchers from
+different fields contributing!
 
 The first 5 core contributors will fit this profile. Thus if you overlap strongly with experiences and expertise as someone else on the team, you might have to wait until the next set of contributors are added.
 
 #### Summary: Requirements to apply
 The goal is to be inline with expectations for solving issues by the last one so you can do them on your own. If not, I might ask you to solve a few more specific ones.
 
-- Solve 10+ Github issues. 
+- Solve 10+ Github issues.
 - Create 5+ meaningful PRs which solves some reported issue - bug,
 - Perform 10+ PR reviews from other contributors.
 
diff --git a/.github/ISSUE_TEMPLATE/documentation.md b/.github/ISSUE_TEMPLATE/documentation.md
index 2b249089657c8..e78df92a18bab 100644
--- a/.github/ISSUE_TEMPLATE/documentation.md
+++ b/.github/ISSUE_TEMPLATE/documentation.md
@@ -12,7 +12,7 @@ assignees: ''
 For typos and doc fixes, please go ahead and:
 
 1. Create an issue.
-2. Fix the typo.   
+2. Fix the typo.
 3. Submit a PR.
 
 Thanks!
diff --git a/.github/ISSUE_TEMPLATE/how-to-question.md b/.github/ISSUE_TEMPLATE/how-to-question.md
index 2a307e18de5c7..786244d2f5e74 100644
--- a/.github/ISSUE_TEMPLATE/how-to-question.md
+++ b/.github/ISSUE_TEMPLATE/how-to-question.md
@@ -9,10 +9,10 @@ assignees: ''
 
 ## ❓ Questions and Help
 
-### Before asking: 
+### Before asking:
 1. Try to find answers to your questions in [the Lightning Forum!](https://forums.pytorchlightning.ai/)
-2. Search for similar [issues](https://github.com/PyTorchLightning/pytorch-lightning/issues).   
-3. Search the [docs](https://pytorch-lightning.readthedocs.io/en/latest/).    
+2. Search for similar [issues](https://github.com/PyTorchLightning/pytorch-lightning/issues).
+3. Search the [docs](https://pytorch-lightning.readthedocs.io/en/latest/).
 
 <!-- If you still can't find what you need: -->
 
@@ -20,7 +20,7 @@ assignees: ''
 
 #### Code
 
-<!-- Please paste a code snippet if your question requires it! -->   
+<!-- Please paste a code snippet if your question requires it! -->
 
 #### What have you tried?
 
diff --git a/.github/workflows/docs-checks.yml b/.github/workflows/docs-checks.yml
index 3f6b35ba7b7cb..247c5cf61f9c1 100644
--- a/.github/workflows/docs-checks.yml
+++ b/.github/workflows/docs-checks.yml
@@ -109,4 +109,3 @@ jobs:
           path: docs/build/html/
         # Use always() to always run this step to publish test results when there are test failures
         if: success()
-
diff --git a/MANIFEST.in b/MANIFEST.in
index 8db3912027d6d..450a9ec576d0b 100644
--- a/MANIFEST.in
+++ b/MANIFEST.in
@@ -69,4 +69,3 @@ prune temp*
 prune test*
 prune benchmark*
 prune dockers
-
diff --git a/docs/.build_docs.sh b/docs/.build_docs.sh
index 2b57c47953675..6cf6eab2fd398 100644
--- a/docs/.build_docs.sh
+++ b/docs/.build_docs.sh
@@ -1,3 +1,3 @@
 rm -rf source/generated
 make clean
-make html --debug --jobs 2 SPHINXOPTS="-W"
\ No newline at end of file
+make html --debug --jobs 2 SPHINXOPTS="-W"
diff --git a/docs/Makefile b/docs/Makefile
index 69fe55ecfa9aa..ba501f6f5b1bf 100644
--- a/docs/Makefile
+++ b/docs/Makefile
@@ -16,4 +16,4 @@ help:
 # Catch-all target: route all unknown targets to Sphinx using the new
 # "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
 %: Makefile
-	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
\ No newline at end of file
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
diff --git a/docs/source/_static/main.css b/docs/source/_static/main.css
index 7441b775a4be5..82aa8b338ad39 100644
--- a/docs/source/_static/main.css
+++ b/docs/source/_static/main.css
@@ -1,3 +1,3 @@
 col {
     width: 50% !important;
-}
\ No newline at end of file
+}
diff --git a/docs/source/asr_nlp_tts.rst b/docs/source/asr_nlp_tts.rst
index a5f1ac59bf696..49bed0a981a6e 100644
--- a/docs/source/asr_nlp_tts.rst
+++ b/docs/source/asr_nlp_tts.rst
@@ -10,16 +10,16 @@ These are amazing ecosystems to help with Automatic Speech Recognition (ASR), Na
 NeMo
 ****
 
-`NVIDIA NeMo <https://github.com/NVIDIA/NeMo>`_ is a toolkit for building new State-of-the-Art 
-Conversational AI models. NeMo has separate collections for Automatic Speech Recognition (ASR), 
-Natural Language Processing (NLP), and Text-to-Speech (TTS) models. Each collection consists of 
-prebuilt modules that include everything needed to train on your data. 
-Every module can easily be customized, extended, and composed to create new Conversational AI 
+`NVIDIA NeMo <https://github.com/NVIDIA/NeMo>`_ is a toolkit for building new State-of-the-Art
+Conversational AI models. NeMo has separate collections for Automatic Speech Recognition (ASR),
+Natural Language Processing (NLP), and Text-to-Speech (TTS) models. Each collection consists of
+prebuilt modules that include everything needed to train on your data.
+Every module can easily be customized, extended, and composed to create new Conversational AI
 model architectures.
 
-Conversational AI architectures are typically very large and require a lot of data  and compute 
-for training. NeMo uses PyTorch Lightning for easy and performant multi-GPU/multi-node 
-mixed-precision training. 
+Conversational AI architectures are typically very large and require a lot of data  and compute
+for training. NeMo uses PyTorch Lightning for easy and performant multi-GPU/multi-node
+mixed-precision training.
 
 .. note:: Every NeMo model is a LightningModule that comes equipped with all supporting infrastructure for training and reproducibility.
 
@@ -31,7 +31,7 @@ NeMo Models
 NeMo Models contain everything needed to train and reproduce state of the art Conversational AI
 research and applications, including:
 
-- neural network architectures 
+- neural network architectures
 - datasets/data loaders
 - data preprocessing/postprocessing
 - data augmentors
@@ -83,7 +83,7 @@ To install from a local clone of NeMo:
 
     ./reinstall.sh # from cloned NeMo's git root
 
-For Docker users, the NeMo container is available on 
+For Docker users, the NeMo container is available on
 `NGC <https://ngc.nvidia.com/catalog/containers/nvidia:nemo>`_.
 
 .. code-block:: bash
@@ -97,7 +97,7 @@ For Docker users, the NeMo container is available on
 Experiment Manager
 ------------------
 
-NeMo's Experiment Manager leverages PyTorch Lightning for model checkpointing, 
+NeMo's Experiment Manager leverages PyTorch Lightning for model checkpointing,
 TensorBoard Logging, and Weights and Biases logging. The Experiment Manager is included by default
 in all NeMo example scripts.
 
@@ -126,11 +126,11 @@ Optionally launch Tensorboard to view training results in ./nemo_experiments (by
 Automatic Speech Recognition (ASR)
 ==================================
 
-Everything needed to train Convolutional ASR models is included with NeMo. 
-NeMo supports multiple Speech Recognition architectures, including Jasper and QuartzNet. 
-`NeMo Speech Models <https://ngc.nvidia.com/catalog/models/nvidia:nemospeechmodels>`_ 
-can be trained from scratch on custom datasets or 
-fine-tuned using pre-trained checkpoints trained on thousands of hours of audio 
+Everything needed to train Convolutional ASR models is included with NeMo.
+NeMo supports multiple Speech Recognition architectures, including Jasper and QuartzNet.
+`NeMo Speech Models <https://ngc.nvidia.com/catalog/models/nvidia:nemospeechmodels>`_
+can be trained from scratch on custom datasets or
+fine-tuned using pre-trained checkpoints trained on thousands of hours of audio
 that can be restored for immediate use.
 
 Some typical ASR tasks are included with NeMo:
@@ -141,7 +141,7 @@ Some typical ASR tasks are included with NeMo:
 - `Voice Activity Detection <https://github.com/NVIDIA/NeMo/blob/v1.0.0b1/tutorials/asr/06_Voice_Activiy_Detection.ipynb>`_
 - `Speaker Recognition <https://github.com/NVIDIA/NeMo/blob/v1.0.0b1/examples/speaker_recognition/speaker_reco.py>`_
 
-See this `asr notebook <https://github.com/NVIDIA/NeMo/blob/v1.0.0b1/tutorials/asr/01_ASR_with_NeMo.ipynb>`_ 
+See this `asr notebook <https://github.com/NVIDIA/NeMo/blob/v1.0.0b1/tutorials/asr/01_ASR_with_NeMo.ipynb>`_
 for a full tutorial on doing ASR with NeMo, PyTorch Lightning, and Hydra.
 
 Specify ASR Model Configurations with YAML File
@@ -149,7 +149,7 @@ Specify ASR Model Configurations with YAML File
 
 NeMo Models and the PyTorch Lightning Trainer can be fully configured from .yaml files using Hydra.
 
-See this `asr config <https://github.com/NVIDIA/NeMo/blob/v1.0.0b1/examples/asr/conf/config.yaml>`_ 
+See this `asr config <https://github.com/NVIDIA/NeMo/blob/v1.0.0b1/examples/asr/conf/config.yaml>`_
 for the entire speech to text .yaml file.
 
 .. code-block:: yaml
@@ -198,7 +198,7 @@ Developing ASR Model From Scratch
         trainer.fit(asr_model)
 
 
-Hydra makes every aspect of the NeMo model, 
+Hydra makes every aspect of the NeMo model,
 including the PyTorch Lightning Trainer, customizable from the command line.
 
 .. code-block:: bash
@@ -259,7 +259,7 @@ with PyTorch Lightning since every NeMo model is a Lightning Module.
             log_probs = self.decoder(encoder_output=encoded)
             greedy_predictions = log_probs.argmax(dim=-1, keepdim=False)
             return log_probs, encoded_len, greedy_predictions
-    
+
         # PTL-specific methods
         def training_step(self, batch, batch_nb):
             audio_signal, audio_signal_len, transcript, transcript_len = batch
@@ -281,7 +281,7 @@ Neural Types in NeMo ASR
 ------------------------
 
 NeMo Models and Neural Modules come with Neural Type checking.
-Neural type checking is extremely useful when combining many different neural 
+Neural type checking is extremely useful when combining many different neural
 network architectures for a production-grade application.
 
 .. code-block:: python
@@ -311,12 +311,12 @@ Natural Language Processing (NLP)
 =================================
 
 Everything needed to finetune BERT-like language models for NLP tasks is included with NeMo.
-`NeMo NLP Models <https://ngc.nvidia.com/catalog/models/nvidia:nemonlpmodels>`_  
-include `HuggingFace Transformers <https://github.com/huggingface/transformers>`_ 
-and `NVIDIA Megatron-LM <https://github.com/NVIDIA/Megatron-LM>`_ BERT and Bio-Megatron models. 
+`NeMo NLP Models <https://ngc.nvidia.com/catalog/models/nvidia:nemonlpmodels>`_
+include `HuggingFace Transformers <https://github.com/huggingface/transformers>`_
+and `NVIDIA Megatron-LM <https://github.com/NVIDIA/Megatron-LM>`_ BERT and Bio-Megatron models.
 NeMo can also be used for pretraining BERT-based language models from HuggingFace.
 
-Any of the HuggingFace encoders or Megatron-LM encoders can easily be used for the NLP tasks 
+Any of the HuggingFace encoders or Megatron-LM encoders can easily be used for the NLP tasks
 that are included with NeMo:
 
 - `Glue Benchmark (All tasks) <https://github.com/NVIDIA/NeMo/blob/v1.0.0b1/tutorials/nlp/GLUE_Benchmark.ipynb>`_
@@ -339,7 +339,7 @@ for a full tutorial on doing NER with NeMo, PyTorch Lightning, and Hydra.
 Specify NER Model Configurations with YAML File
 -----------------------------------------------
 
-.. note:: NeMo Models and the PyTorch Lightning Trainer can be fully configured from .yaml files using Hydra. 
+.. note:: NeMo Models and the PyTorch Lightning Trainer can be fully configured from .yaml files using Hydra.
 
 See this `token classification config <https://github.com/NVIDIA/NeMo/blob/v1.0.0b1/examples/nlp/token_classification/conf/token_classification_config.yaml>`_
 for the entire NER (token classification) .yaml file.
@@ -368,7 +368,7 @@ for the entire NER (token classification) .yaml file.
         pretrained_model_name: bert-base-uncased
         lm_checkpoint: null
         ...
-    # the classifier for the downstream task 
+    # the classifier for the downstream task
       head:
         num_fc_layers: 2
         fc_dropout: 0.5
@@ -435,12 +435,12 @@ Hydra makes every aspect of the NeMo model, including the PyTorch Lightning Trai
 Tokenizers
 ----------
 
-Tokenization is the process of converting natural language text into integer arrays 
+Tokenization is the process of converting natural language text into integer arrays
 which can be used for machine learning.
-For NLP tasks, tokenization is an essential part of data preprocessing. 
-NeMo supports all BERT-like model tokenizers from 
+For NLP tasks, tokenization is an essential part of data preprocessing.
+NeMo supports all BERT-like model tokenizers from
 `HuggingFace's AutoTokenizer <https://huggingface.co/transformers/model_doc/auto.html#autotokenizer>`_
-and also supports `Google's SentencePieceTokenizer <https://github.com/google/sentencepiece>`_ 
+and also supports `Google's SentencePieceTokenizer <https://github.com/google/sentencepiece>`_
 which can be trained on custom data.
 
 To see the list of supported tokenizers:
@@ -451,18 +451,18 @@ To see the list of supported tokenizers:
 
     nemo_nlp.modules.get_tokenizer_list()
 
-See this `tokenizer notebook <https://github.com/NVIDIA/NeMo/blob/v1.0.0b1/tutorials/nlp/02_NLP_Tokenizers.ipynb>`_ 
+See this `tokenizer notebook <https://github.com/NVIDIA/NeMo/blob/v1.0.0b1/tutorials/nlp/02_NLP_Tokenizers.ipynb>`_
 for a full tutorial on using tokenizers in NeMo.
 
 Language Models
 ---------------
 
-Language models are used to extract information from (tokenized) text. 
+Language models are used to extract information from (tokenized) text.
 Much of the state-of-the-art in natural language processing is achieved
-by fine-tuning pretrained language models on the downstream task. 
+by fine-tuning pretrained language models on the downstream task.
 
-With NeMo, you can either `pretrain <https://github.com/NVIDIA/NeMo/blob/v1.0.0b1/examples/nlp/language_modeling/bert_pretraining.py>`_ 
-a BERT model on your data or use a pretrained language model from `HuggingFace Transformers <https://github.com/huggingface/transformers>`_  
+With NeMo, you can either `pretrain <https://github.com/NVIDIA/NeMo/blob/v1.0.0b1/examples/nlp/language_modeling/bert_pretraining.py>`_
+a BERT model on your data or use a pretrained language model from `HuggingFace Transformers <https://github.com/huggingface/transformers>`_
 or `NVIDIA Megatron-LM <https://github.com/NVIDIA/Megatron-LM>`_.
 
 To see the list of language models available in NeMo:
@@ -483,11 +483,11 @@ for a full tutorial on using pretrained language models in NeMo.
 Using a Pre-trained NER Model
 -----------------------------
 
-NeMo has pre-trained NER models that can be used 
+NeMo has pre-trained NER models that can be used
 to get started with Token Classification right away.
-Models are automatically downloaded from NGC, 
+Models are automatically downloaded from NGC,
 cached locally to disk,
-and loaded into GPU memory using the `.from_pretrained` method. 
+and loaded into GPU memory using the `.from_pretrained` method.
 
 .. code-block:: python
 
@@ -511,7 +511,7 @@ and loaded into GPU memory using the `.from_pretrained` method.
 NeMo NER Model Under the Hood
 -----------------------------
 
-Any aspect of NLP training or model architecture design can easily be customized with PyTorch Lightning 
+Any aspect of NLP training or model architecture design can easily be customized with PyTorch Lightning
 since every NeMo model is a Lightning Module.
 
 .. code-block:: python
@@ -546,8 +546,8 @@ since every NeMo model is a Lightning Module.
 Neural Types in NeMo NLP
 ------------------------
 
-NeMo Models and Neural Modules come with Neural Type checking. 
-Neural type checking is extremely useful when combining many different neural network architectures 
+NeMo Models and Neural Modules come with Neural Type checking.
+Neural type checking is extremely useful when combining many different neural network architectures
 for a production-grade application.
 
 .. code-block:: python
@@ -565,11 +565,11 @@ for a production-grade application.
 Text-To-Speech (TTS)
 ====================
 
-Everything needed to train TTS models and generate audio is included with NeMo. 
-`NeMo TTS Models <https://ngc.nvidia.com/catalog/models/nvidia:nemottsmodels>`_ 
+Everything needed to train TTS models and generate audio is included with NeMo.
+`NeMo TTS Models <https://ngc.nvidia.com/catalog/models/nvidia:nemottsmodels>`_
 can be trained from scratch on your own data or pretrained models can be downloaded
-automatically. NeMo currently supports  a two step inference procedure. 
-First, a model is used to generate a mel spectrogram from text. 
+automatically. NeMo currently supports  a two step inference procedure.
+First, a model is used to generate a mel spectrogram from text.
 Second, a model is used to generate audio from a mel spectrogram.
 
 Mel Spectrogram Generators:
@@ -647,10 +647,10 @@ Hydra makes every aspect of the NeMo model, including the PyTorch Lightning Trai
 Using State-Of-The-Art Pre-trained TTS Model
 --------------------------------------------
 
-Generate speech using models trained on `LJSpeech <https://keithito.com/LJ-Speech-Dataset/>`, 
+Generate speech using models trained on `LJSpeech <https://keithito.com/LJ-Speech-Dataset/>`,
 around 24 hours of single speaker data.
 
-See this `TTS notebook <https://github.com/NVIDIA/NeMo/blob/v1.0.0b1/tutorials/tts/1_TTS_inference.ipynb>`_ 
+See this `TTS notebook <https://github.com/NVIDIA/NeMo/blob/v1.0.0b1/tutorials/tts/1_TTS_inference.ipynb>`_
 for a full tutorial on generating speech with NeMo, PyTorch Lightning, and Hydra.
 
 .. code-block:: python
@@ -673,7 +673,7 @@ for a full tutorial on generating speech with NeMo, PyTorch Lightning, and Hydra
         if isinstance(audio, torch.Tensor):
             audio = audio.to('cpu').numpy()
         return spectrogram, audio
-        
+
     text_to_generate = input("Input what you want the model to say: ")
     spec, audio = infer(spec_gen, vocoder, text_to_generate)
 
@@ -763,8 +763,8 @@ be customized with PyTorch Lightning since every NeMo model is a LightningModule
 Neural Types in NeMo TTS
 ------------------------
 
-NeMo Models and Neural Modules come with Neural Type checking. 
-Neural type checking is extremely useful when combining many different neural network architectures 
+NeMo Models and Neural Modules come with Neural Type checking.
+Neural type checking is extremely useful when combining many different neural network architectures
 for a production-grade application.
 
 .. code-block:: python
@@ -793,7 +793,7 @@ Learn More
 - Visit the `NVIDIA NeMo Developer Website <https://developer.nvidia.com/nvidia-nemo>`_
 - Read the `NVIDIA NeMo PyTorch Blog <https://medium.com/pytorch/nvidia-nemo-neural-modules-and-models-for-conversational-ai-d660480d9696>`_
 - Download pre-trained `ASR <https://ngc.nvidia.com/catalog/models/nvidia:nemospeechmodels>`_, `NLP <https://ngc.nvidia.com/catalog/models/nvidia:nemonlpmodels>`_, and `TTS <https://ngc.nvidia.com/catalog/models/nvidia:nemospeechmodels>`_ models on `NVIDIA NGC <https://ngc.nvidia.com/>`_ to quickly get started with NeMo.
-- Become an expert on Building Conversational AI applications with our `tutorials <https://github.com/NVIDIA/NeMo#tutorials>`_, and `example scripts <https://github.com/NVIDIA/NeMo/tree/v1.0.0b1/examples>`_, 
+- Become an expert on Building Conversational AI applications with our `tutorials <https://github.com/NVIDIA/NeMo#tutorials>`_, and `example scripts <https://github.com/NVIDIA/NeMo/tree/v1.0.0b1/examples>`_,
 - See our `developer guide <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/>`_ for more information on core NeMo concepts, ASR/NLP/TTS collections, and the NeMo API.
 
 .. note:: NeMo tutorial notebooks can be run on `Google Colab <https://colab.research.google.com/notebooks/intro.ipynb>`_.
diff --git a/docs/source/cloud_training.rst b/docs/source/cloud_training.rst
index 9fef417da7442..127bee6478dfd 100644
--- a/docs/source/cloud_training.rst
+++ b/docs/source/cloud_training.rst
@@ -26,4 +26,4 @@ using over 20+ distributions, lists, etc. Of course, you can also configure all
 can be dynamically assembled at runtime.
 
 
-.. hint:: Grid supports the search strategy of your choice! (and much more than just sweeps)
\ No newline at end of file
+.. hint:: Grid supports the search strategy of your choice! (and much more than just sweeps)
diff --git a/docs/source/datamodules.rst b/docs/source/datamodules.rst
index 2589ac605ee11..bc79d7dc3d6ea 100644
--- a/docs/source/datamodules.rst
+++ b/docs/source/datamodules.rst
@@ -129,7 +129,7 @@ Here's a more realistic, complex DataModule that shows how much more reusable th
 
             # self.dims is returned when you call dm.size()
             # Setting default dims here because we know them.
-            # Could optionally be assigned dynamically in dm.setup() 
+            # Could optionally be assigned dynamically in dm.setup()
             self.dims = (1, 28, 28)
 
         def prepare_data(self):
diff --git a/docs/source/introduction_guide.rst b/docs/source/introduction_guide.rst
index d306c3b3bbb7a..b8ef55b0d6d5e 100644
--- a/docs/source/introduction_guide.rst
+++ b/docs/source/introduction_guide.rst
@@ -1051,7 +1051,7 @@ would be the particular system and how it's trained (ie: A GAN or VAE or GPT).
     out = decoder(features, x)
 
     loss = perceptual_loss(x1, x2, x) + CE(out, x)
-    
+
 In Lightning, this code is organized into a :ref:`lightning_module`.
 
 Engineering code
@@ -1071,7 +1071,7 @@ over GPUs, 16-bit precision, etc. This is normally code that is THE SAME across
         download_data()
 
     dist.barrier()
-    
+
 In Lightning, this code is abstracted out by the :ref:`trainer`.
 
 Non-essential code
@@ -1090,7 +1090,7 @@ This is code that helps the research but isn't relevant to the research code. So
     z = Q.rsample()
     generated = decoder(z)
     self.experiment.log('images', generated)
-    
+
 In Lightning this code is organized into :ref:`callbacks`.
 
 Data code
diff --git a/docs/source/loggers.rst b/docs/source/loggers.rst
index b74fe292b251b..08b3b1e997555 100644
--- a/docs/source/loggers.rst
+++ b/docs/source/loggers.rst
@@ -9,7 +9,7 @@
 Loggers
 *******
 
-Lightning supports the most popular logging frameworks (TensorBoard, Comet, etc...). TensorBoard is used by default, 
+Lightning supports the most popular logging frameworks (TensorBoard, Comet, etc...). TensorBoard is used by default,
 but you can pass to the :class:`~pytorch_lightning.trainer.trainer.Trainer` any combination of the following loggers.
 
 .. note::
@@ -247,7 +247,7 @@ Lightning supports the use of multiple loggers, just pass a list to the
     logger1 = TensorBoardLogger('tb_logs', name='my_model')
     logger2 = TestTubeLogger('tb_logs', name='my_model')
     trainer = Trainer(logger=[logger1, logger2])
-   
+
 The loggers are available as a list anywhere except ``__init__`` in your
 :class:`~pytorch_lightning.core.lightning.LightningModule`.
 
diff --git a/docs/source/lr_finder.rst b/docs/source/lr_finder.rst
index fbeb1f5fd959d..a5c3b312f30fc 100755
--- a/docs/source/lr_finder.rst
+++ b/docs/source/lr_finder.rst
@@ -2,7 +2,7 @@
 
     from pytorch_lightning.trainer.trainer import Trainer
     from pytorch_lightning.core.lightning import LightningModule
-    
+
 .. _lr_finder:
 
 Learning Rate Finder
@@ -22,14 +22,14 @@ for both better performance and faster convergence. Even optimizers such as
 choices.
 
 To reduce the amount of guesswork concerning choosing a good initial learning
-rate, a `learning rate finder` can be used. As described in this `paper <https://arxiv.org/abs/1506.01186>`_ 
-a learning rate finder does a small run where the learning rate is increased 
-after each processed batch and the corresponding loss is logged. The result of 
+rate, a `learning rate finder` can be used. As described in this `paper <https://arxiv.org/abs/1506.01186>`_
+a learning rate finder does a small run where the learning rate is increased
+after each processed batch and the corresponding loss is logged. The result of
 this is a `lr` vs. `loss` plot that can be used as guidance for choosing a optimal
-initial lr. 
+initial lr.
 
-.. warning:: 
-    For the moment, this feature only works with models having a single optimizer. 
+.. warning::
+    For the moment, this feature only works with models having a single optimizer.
     LR Finder support for DDP is not implemented yet, it is coming soon.
 
 ----------
@@ -52,7 +52,7 @@ which can be accessed via ``self.learning_rate`` or ``self.lr``.
 
         def configure_optimizers(self):
             return Adam(self.parameters(), lr=(self.lr or self.learning_rate))
-            
+
     model = LitModel()
 
     # finds learning rate automatically
@@ -81,26 +81,26 @@ method of the trainer. A typical example of this would look like
 
     model = MyModelClass(hparams)
     trainer = Trainer()
-    
+
     # Run learning rate finder
     lr_finder = trainer.tuner.lr_find(model)
-    
+
     # Results can be found in
     lr_finder.results
-    
+
     # Plot with
     fig = lr_finder.plot(suggest=True)
     fig.show()
-    
+
     # Pick point based on plot, or get suggestion
     new_lr = lr_finder.suggestion()
-    
+
     # update hparams of the model
     model.hparams.lr = new_lr
 
     # Fit model
     trainer.fit(model)
-    
+
 The figure produced by ``lr_finder.plot()`` should look something like the figure
 below. It is recommended to not pick the learning rate that achieves the lowest
 loss, but instead something in the middle of the sharpest downward slope (red point).
diff --git a/docs/source/new-project.rst b/docs/source/new-project.rst
index 30e06f76ae5bd..4c9c16e9faa0d 100644
--- a/docs/source/new-project.rst
+++ b/docs/source/new-project.rst
@@ -132,7 +132,7 @@ Examples of systems are:
 - `DQN <https://colab.research.google.com/drive/1F_RNcHzTfFuQf-LeKvSlud6x7jXYkG31#scrollTo=IAlT0-75T_Kv>`_
 - `GAN <https://github.com/PyTorchLightning/pytorch-lightning-bolts/blob/master/pl_bolts/models/gans/basic/basic_gan_module.py>`_
 - `Image classifier <https://colab.research.google.com/drive/1F_RNcHzTfFuQf-LeKvSlud6x7jXYkG31#scrollTo=gEulmrbxwaYL>`_
-- Seq2seq 
+- Seq2seq
 - `SimCLR <https://github.com/PyTorchLightning/pytorch-lightning-bolts/blob/master/pl_bolts/models/self_supervised/simclr/simclr_module.py>`_
 - `VAE <https://github.com/PyTorchLightning/pytorch-lightning-bolts/blob/master/pl_bolts/models/autoencoders/basic_vae/basic_vae_module.py>`_
 
@@ -195,7 +195,7 @@ First, define the data however you want. Lightning just needs a :class:`~torch.u
 
     dataset = MNIST(os.getcwd(), download=True, transform=transforms.ToTensor())
     train_loader = DataLoader(dataset)
-    
+
 Next, init the :ref:`lightning_module` and the PyTorch Lightning :class:`~pytorch_lightning.trainer.Trainer`,
 then call fit with both the data and model.
 
@@ -392,7 +392,7 @@ It's trivial to use CPUs, GPUs or TPUs in Lightning. There's **NO NEED** to chan
 
     # train on 1 GPU
     trainer = pl.Trainer(gpus=1)
-    
+
 .. code-block:: python
 
     # train on multiple GPUs across nodes (32 gpus here)
@@ -400,7 +400,7 @@ It's trivial to use CPUs, GPUs or TPUs in Lightning. There's **NO NEED** to chan
         gpus=4,
         num_nodes=8
     )
-    
+
 .. code-block:: python
 
     # train on gpu 1, 3, 5 (3 gpus total)
@@ -428,7 +428,7 @@ Without changing a SINGLE line of your code, you can now do the following with t
         limit_train_batches=0.5,
         val_check_interval=0.25
     )
-    
+
 -----------
 
 Checkpoints
@@ -709,7 +709,7 @@ Lightning has many tools for debugging. Here is an example of just a few of them
 
 .. code-block:: python
 
-    # Automatically overfit the sane batch of your model for a sanity test 
+    # Automatically overfit the sane batch of your model for a sanity test
     trainer = pl.Trainer(overfit_batches=1)
 
 .. code-block:: python
@@ -719,7 +719,7 @@ Lightning has many tools for debugging. Here is an example of just a few of them
     trainer = pl.Trainer(fast_dev_run=True)
 
 .. code-block:: python
-   
+
    # train only 20% of an epoch
    trainer = pl.Trainer(limit_train_batches=0.2)
 
@@ -729,10 +729,10 @@ Lightning has many tools for debugging. Here is an example of just a few of them
     trainer = pl.Trainer(val_check_interval=0.25)
 
 .. code-block:: python
-    
+
     # Profile your code to find speed/memory bottlenecks
     Trainer(profiler=True)
- 
+
 ---------------
 
 ********************
diff --git a/docs/source/optimizers.rst b/docs/source/optimizers.rst
index 2680c01e4c7ec..5e96b5da0da8c 100644
--- a/docs/source/optimizers.rst
+++ b/docs/source/optimizers.rst
@@ -247,7 +247,7 @@ The default ``optimizer_step`` is relying on the internal ``LightningOptimizer``
 .. testcode::
 
     from pytorch_lightning.core.optimizer import LightningOptimizer
-   
+
     # function hook in LightningModule
     def optimizer_step(self, current_epoch, batch_nb, optimizer, optimizer_idx, closure, on_tpu=False, using_native_amp=False, using_lbfgs=False):
       if not isinstance(optimizer, LightningOptimizer):
diff --git a/docs/source/sequences.rst b/docs/source/sequences.rst
index 93fefad0d0e35..759a671cc42ef 100644
--- a/docs/source/sequences.rst
+++ b/docs/source/sequences.rst
@@ -2,7 +2,7 @@
 
     from torch.utils.data import IterableDataset
     from pytorch_lightning.trainer.trainer import Trainer
-    
+
 .. _sequences:
 
 Sequential Data
diff --git a/docs/source/slurm.rst b/docs/source/slurm.rst
index be40810c3f944..da6de596db5a2 100644
--- a/docs/source/slurm.rst
+++ b/docs/source/slurm.rst
@@ -1,7 +1,7 @@
 .. testsetup:: *
 
     from pytorch_lightning.trainer.trainer import Trainer
-    
+
 .. _slurm:
 
 Computing cluster (SLURM)
diff --git a/docs/source/test_set.rst b/docs/source/test_set.rst
index 9fe9640aa723b..d9e989a4182f3 100644
--- a/docs/source/test_set.rst
+++ b/docs/source/test_set.rst
@@ -41,7 +41,7 @@ You can run the test set on multiple models using the same trainer instance.
 
     model1 = LitModel()
     model2 = GANModel()
-    
+
     trainer = Trainer()
     trainer.test(model1)
     trainer.test(model2)
@@ -87,7 +87,7 @@ is not available at the time your model was declared.
 
 You can either pass in a single dataloader or a list of them. This optional named
 parameter can be used in conjunction with any of the above use cases. Additionally,
-you can also pass in an :ref:`datamodules` that have overridden the 
+you can also pass in an :ref:`datamodules` that have overridden the
 :ref:`datamodule-test-dataloader-label` method.
 
 .. code-block:: python
@@ -102,6 +102,3 @@ you can also pass in an :ref:`datamodules` that have overridden the
 
     # test (pass in datamodule)
     trainer.test(datamodule=dm)
-    
-
-
diff --git a/docs/source/training_tricks.rst b/docs/source/training_tricks.rst
index 10ee668a97fa8..d7230a1fd687a 100644
--- a/docs/source/training_tricks.rst
+++ b/docs/source/training_tricks.rst
@@ -130,4 +130,4 @@ Sequential Model Parallelism with Checkpointing
 PyTorch Lightning integration for Sequential Model Parallelism using `FairScale <https://github.com/facebookresearch/fairscale>`_.
 Sequential Model Parallelism splits a sequential module onto multiple GPUs, reducing peak GPU memory requirements substantially.
 
-For more information, refer to :ref:`sequential-parallelism`.
\ No newline at end of file
+For more information, refer to :ref:`sequential-parallelism`.
diff --git a/docs/source/transfer_learning.rst b/docs/source/transfer_learning.rst
index 5e885dbf3e376..157cb64a36bd7 100644
--- a/docs/source/transfer_learning.rst
+++ b/docs/source/transfer_learning.rst
@@ -1,7 +1,7 @@
 .. testsetup:: *
 
     from pytorch_lightning.core.lightning import LightningModule
-    
+
 Transfer Learning
 -----------------
 
diff --git a/docs/source/weights_loading.rst b/docs/source/weights_loading.rst
index f22e355a09d17..1c8babd72ed18 100644
--- a/docs/source/weights_loading.rst
+++ b/docs/source/weights_loading.rst
@@ -92,7 +92,7 @@ You can also control more advanced options, like `save_top_k`, to save the best
     )
 
     trainer = Trainer(callbacks=[checkpoint_callback])
-    
+
 You can retrieve the checkpoint after training by calling
 
 .. code-block:: python
diff --git a/pl_examples/README.md b/pl_examples/README.md
index 936f1cc3df0cf..a1cb856eb1e33 100644
--- a/pl_examples/README.md
+++ b/pl_examples/README.md
@@ -1,4 +1,4 @@
-# Examples   
+# Examples
 Our most robust examples showing all sorts of implementations
 can be found in our sister library [PyTorch-Lightning-Bolts](https://pytorch-lightning-bolts.readthedocs.io/en/latest/convolutional.html#gpt-2).
 
@@ -14,6 +14,6 @@ In this folder we add 3 simple examples:
 ---
 
 ## Domain examples
-This folder contains older examples. You should instead use the examples 
-in [PyTorch-Lightning-Bolts](https://pytorch-lightning-bolts.readthedocs.io/en/latest/convolutional.html#gpt-2) 
+This folder contains older examples. You should instead use the examples
+in [PyTorch-Lightning-Bolts](https://pytorch-lightning-bolts.readthedocs.io/en/latest/convolutional.html#gpt-2)
 for advanced use cases.
diff --git a/pl_examples/basic_examples/README.md b/pl_examples/basic_examples/README.md
index 18ae204396290..199c453566c6f 100644
--- a/pl_examples/basic_examples/README.md
+++ b/pl_examples/basic_examples/README.md
@@ -1,5 +1,5 @@
-## Basic Examples   
-Use these examples to test how lightning works.   
+## Basic Examples
+Use these examples to test how lightning works.
 
 #### MNIST
 Trains MNIST where the model is defined inside the LightningModule.
@@ -36,7 +36,7 @@ python image_classifier.py --gpus 2
 python image_classifier.py --gpus 2 --distributed_backend 'dp'
 ```
 
----   
+---
 #### Autoencoder
 Showing the power of a system... arbitrarily complex training loops
 ```bash
@@ -49,23 +49,23 @@ python autoencoder.py --gpus 2
 # dataparallel
 python autoencoder.py --gpus 2 --distributed_backend 'dp'
 ```
----    
-# Multi-node example   
+---
+# Multi-node example
 
 This demo launches a job using 2 GPUs on 2 different nodes (4 GPUs total).
 To run this demo do the following:
 
-1. Log into the jumphost node of your SLURM-managed cluster.  
-2. Create a conda environment with Lightning and a GPU PyTorch version.   
-3. Choose a script to submit    
+1. Log into the jumphost node of your SLURM-managed cluster.
+2. Create a conda environment with Lightning and a GPU PyTorch version.
+3. Choose a script to submit
 
-#### DDP  
+#### DDP
 Submit this job to run with DistributedDataParallel (2 nodes, 2 gpus each)
 ```bash
 sbatch submit_ddp_job.sh YourEnv
 ```
 
-#### DDP2  
+#### DDP2
 Submit this job to run with a different implementation of DistributedDataParallel.
 In this version, each node acts like DataParallel but syncs across nodes like DDP.
 ```bash
diff --git a/requirements/devel.txt b/requirements/devel.txt
index a8c5293c8c7db..dcf66495ee46f 100644
--- a/requirements/devel.txt
+++ b/requirements/devel.txt
@@ -8,4 +8,4 @@
 -r ./test.txt
 
 # install all extra dependencies for running examples
--r ./examples.txt
\ No newline at end of file
+-r ./examples.txt
diff --git a/requirements/docs.txt b/requirements/docs.txt
index df596ed2bdda8..0f8f2005b88b1 100644
--- a/requirements/docs.txt
+++ b/requirements/docs.txt
@@ -11,4 +11,4 @@ https://github.com/PyTorchLightning/lightning_sphinx_theme/archive/master.zip#eg
 sphinx-autodoc-typehints
 sphinx-paramlinks<0.4.0
 sphinx-togglebutton
-sphinx-copybutton
\ No newline at end of file
+sphinx-copybutton
diff --git a/requirements/examples.txt b/requirements/examples.txt
index c87d10a39346f..6e48778cb222a 100644
--- a/requirements/examples.txt
+++ b/requirements/examples.txt
@@ -1,2 +1,2 @@
 torchvision>=0.4.1
-gym>=0.17.0
\ No newline at end of file
+gym>=0.17.0
diff --git a/requirements/loggers.txt b/requirements/loggers.txt
index 3ec7b25db4643..001210855871d 100644
--- a/requirements/loggers.txt
+++ b/requirements/loggers.txt
@@ -3,4 +3,4 @@ neptune-client>=0.4.109
 comet-ml>=3.1.12
 mlflow>=1.0.0
 test_tube>=0.7.5
-wandb>=0.8.21
\ No newline at end of file
+wandb>=0.8.21
diff --git a/tests/README.md b/tests/README.md
index 8ef006c4d879a..7b857a1901fd7 100644
--- a/tests/README.md
+++ b/tests/README.md
@@ -33,8 +33,8 @@ The GPU machine must have:
 3. [Horovod with NCCL](https://horovod.readthedocs.io/en/stable/gpus_include.html) support: `HOROVOD_GPU_OPERATIONS=NCCL pip install horovod`
 
 
-## Running Coverage   
-Make sure to run coverage on a GPU machine with at least 2 GPUs and NVIDIA apex installed. 
+## Running Coverage
+Make sure to run coverage on a GPU machine with at least 2 GPUs and NVIDIA apex installed.
 
 ```bash
 cd pytorch-lightning

From f917be803119470845bade07b64b8da8537d8cdd Mon Sep 17 00:00:00 2001
From: Arnaud Gelas <arnaudgelas@gmail.com>
Date: Thu, 7 Jan 2021 06:24:47 +0100
Subject: [PATCH 2/2] Fix pre-commit trailing-whitespace and end-of-file-fixer
 hooks. (#5387)

(cherry picked from commit 4c6f36e6e14a5e3bace1fe32505ae0fe6f8bc682)