docs: minor spelling tweaks

brettkoonce · brettkoonce · commit cae1489f0345 · 2020-12-08T10:13:03.000-06:00
diff --git a/docs/source/accelerators.rst b/docs/source/accelerators.rst
@@ -21,7 +21,7 @@ To link up arbitrary hardware, implement your own Accelerator subclass
         class MyAccelerator(Accelerator):
             def __init__(self, trainer, cluster_environment=None):
                 super().__init__(trainer, cluster_environment)
-                self.nickname = 'my_accelator'
+                self.nickname = 'my_accelerator'
 
             def setup(self):
                 # find local rank, etc, custom things to implement
diff --git a/docs/source/asr_nlp_tts.rst b/docs/source/asr_nlp_tts.rst
@@ -324,13 +324,13 @@ that are included with NeMo:
 - `Language Modeling (BERT Pretraining) <https://github.com/NVIDIA/NeMo/blob/v1.0.0b1/tutorials/nlp/01_Pretrained_Language_Models_for_Downstream_Tasks.ipynb>`_
 - `Question Answering <https://github.com/NVIDIA/NeMo/blob/v1.0.0b1/tutorials/nlp/Question_Answering_Squad.ipynb>`_
 - `Text Classification <https://github.com/NVIDIA/NeMo/tree/v1.0.0b1/examples/nlp/text_classification>`_ (including Sentiment Analysis)
-- `Token Classifcation <https://github.com/NVIDIA/NeMo/tree/v1.0.0b1/examples/nlp/token_classification>`_ (including Named Entity Recognition)
+- `Token Classification <https://github.com/NVIDIA/NeMo/tree/v1.0.0b1/examples/nlp/token_classification>`_ (including Named Entity Recognition)
 - `Punctuation and Capitalization <https://github.com/NVIDIA/NeMo/blob/v1.0.0b1/tutorials/nlp/Punctuation_and_Capitalization.ipynb>`_
 
 Named Entity Recognition (NER)
 ------------------------------
 
-NER (or more generally token classifcation) is the NLP task of detecting and classifying key information (entities) in text.
+NER (or more generally token classification) is the NLP task of detecting and classifying key information (entities) in text.
 This task is very popular in Healthcare and Finance. In finance, for example, it can be important to identify
 geographical, geopolitical, organizational, persons, events, and natural phenomenon entities.
 See this `NER notebook <https://github.com/NVIDIA/NeMo/blob/v1.0.0b1/tutorials/nlp/Token_Classification_Named_Entity_Recognition.ipynb>`_
@@ -435,7 +435,7 @@ Hydra makes every aspect of the NeMo model, including the PyTorch Lightning Trai
 Tokenizers
 ----------
 
-Tokenization is the process of converting natural langauge text into integer arrays 
+Tokenization is the process of converting natural language text into integer arrays 
 which can be used for machine learning.
 For NLP tasks, tokenization is an essential part of data preprocessing. 
 NeMo supports all BERT-like model tokenizers from 
@@ -462,7 +462,7 @@ Much of the state-of-the-art in natural language processing is achieved
 by fine-tuning pretrained language models on the downstream task. 
 
 With NeMo, you can either `pretrain <https://github.com/NVIDIA/NeMo/blob/v1.0.0b1/examples/nlp/language_modeling/bert_pretraining.py>`_ 
-a BERT model on your data or use a pretrained lanugage model from `HuggingFace Transformers <https://github.com/huggingface/transformers>`_  
+a BERT model on your data or use a pretrained language model from `HuggingFace Transformers <https://github.com/huggingface/transformers>`_  
 or `NVIDIA Megatron-LM <https://github.com/NVIDIA/Megatron-LM>`_.
 
 To see the list of language models available in NeMo:
diff --git a/docs/source/bolts.rst b/docs/source/bolts.rst
@@ -46,7 +46,7 @@ Example 1: Pretrained, prebuilt models
 Example 2: Extend for faster research
 -------------------------------------
 Bolts are contributed with benchmarks and continuous-integration tests. This means
-you can trust the implementations and use them to bootstrap your resarch much faster.
+you can trust the implementations and use them to bootstrap your research much faster.
 
 .. code-block:: python
 
diff --git a/docs/source/loggers.rst b/docs/source/loggers.rst
@@ -10,7 +10,7 @@ Loggers
 *******
 
 Lightning supports the most popular logging frameworks (TensorBoard, Comet, etc...). TensorBoard is used by default, 
-but you can pass to the :class:`~pytorch_lightning.trainer.trainer.Trainer` any combintation of the following loggers.
+but you can pass to the :class:`~pytorch_lightning.trainer.trainer.Trainer` any combination of the following loggers.
 
 .. note::
 
diff --git a/docs/source/lr_finder.rst b/docs/source/lr_finder.rst
@@ -102,7 +102,7 @@ method of the trainer. A typical example of this would look like
     trainer.fit(model)
     
 The figure produced by ``lr_finder.plot()`` should look something like the figure
-below. It is recommended to not pick the learning rate that achives the lowest
+below. It is recommended to not pick the learning rate that achieves the lowest
 loss, but instead something in the middle of the sharpest downward slope (red point).
 This is the point returned py ``lr_finder.suggestion()``.
 
diff --git a/docs/source/metrics.rst b/docs/source/metrics.rst
@@ -17,7 +17,7 @@ common metric implementations.
 
 The metrics API provides ``update()``, ``compute()``, ``reset()`` functions to the user. The metric base class inherits
 ``nn.Module`` which allows us to call ``metric(...)`` directly. The ``forward()`` method of the base ``Metric`` class
-serves the dual purpose of calling ``update()`` on its input and simultanously returning the value of the metric over the
+serves the dual purpose of calling ``update()`` on its input and simultaneously returning the value of the metric over the
 provided input.
 
 These metrics work with DDP in PyTorch and PyTorch Lightning by default. When ``.compute()`` is called in
diff --git a/docs/source/trainer.rst b/docs/source/trainer.rst
@@ -224,7 +224,7 @@ The accelerator backend to use (previously known as distributed_backend).
 - (```ddp```) is DistributedDataParallel (each gpu on each node trains, and syncs grads)
 - (```ddp_cpu```) is DistributedDataParallel on CPU (same as `ddp`, but does not use GPUs.
   Useful for multi-node CPU training or single-node debugging. Note that this will **not** give
-  a speedup on a single node, since Torch already makes effient use of multiple CPUs on a single
+  a speedup on a single node, since Torch already makes efficient use of multiple CPUs on a single
   machine.)
 - (```ddp2```) dp on node, ddp across nodes. Useful for things like increasing
     the number of negative samples
@@ -971,7 +971,7 @@ Number of processes to train with. Automatically set to the number of GPUs
 when using ``accelerator="ddp"``. Set to a number greater than 1 when
 using ``accelerator="ddp_cpu"`` to mimic distributed training on a
 machine without GPUs. This is useful for debugging, but **will not** provide
-any speedup, since single-process Torch already makes effient use of multiple
+any speedup, since single-process Torch already makes efficient use of multiple
 CPUs.
 
 .. testcode::
diff --git a/docs/source/training_tricks.rst b/docs/source/training_tricks.rst
@@ -110,11 +110,11 @@ The algorithm in short works by:
     2. Iteratively until convergence or maximum number of tries `max_trials` (default 25) has been reached:
         - Call `fit()` method of trainer. This evaluates `steps_per_trial` (default 3) number of
           training steps. Each training step can trigger an OOM error if the tensors
-          (training batch, weights, gradients ect.) allocated during the steps have a
+          (training batch, weights, gradients, etc.) allocated during the steps have a
           too large memory footprint.
         - If an OOM error is encountered, decrease batch size else increase it.
-          How much the batch size is increased/decreased is determined by the choosen
-          stratrgy.
+          How much the batch size is increased/decreased is determined by the chosen
+          strategy.
     3. The found batch size is saved to either `model.batch_size` or `model.hparams.batch_size`
     4. Restore the initial state of model and trainer