akashkw
diff --git a/‎.github/workflows/ci_test-conda.yml‎
Lines changed: 3 additions & 2 deletions b/‎.github/workflows/ci_test-conda.yml‎
Lines changed: 3 additions & 2 deletions
diff --git a/‎.github/workflows/ci_test-full.yml‎
Lines changed: 3 additions & 2 deletions b/‎.github/workflows/ci_test-full.yml‎
Lines changed: 3 additions & 2 deletions
diff --git a/‎.github/workflows/ci_test-slow.yml‎
Lines changed: 4 additions & 3 deletions b/‎.github/workflows/ci_test-slow.yml‎
Lines changed: 4 additions & 3 deletions
diff --git a/‎CHANGELOG.md‎
Lines changed: 34 additions & 1 deletion b/‎CHANGELOG.md‎
Lines changed: 34 additions & 1 deletion
diff --git a/‎docs/source/accelerators/gpu.rst‎
Lines changed: 1 addition & 1 deletion b/‎docs/source/accelerators/gpu.rst‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/source/advanced/profiler.rst‎
Lines changed: 44 additions & 2 deletions b/‎docs/source/advanced/profiler.rst‎
Lines changed: 44 additions & 2 deletions
diff --git a/‎docs/source/common/trainer.rst‎
Lines changed: 2 additions & 2 deletions b/‎docs/source/common/trainer.rst‎
Lines changed: 2 additions & 2 deletions
@@ -53,8 +53,9 @@ jobs:
     - name: Upload pytest results
       uses: actions/upload-artifact@v2
       with:
-        name: pytest-results-${{ runner.os }}-${{ matrix.python-version }}-${{ matrix.requires }}
-        path: junit/test-results-${{ runner.os }}-${{ matrix.python-version }}-${{ matrix.requires }}.xml
+        name: pytest-results-${{ runner.os }}-torch${{ matrix.pytorch-version }}
+        path: junit/test-results-${{ runner.os }}-torch${{ matrix.pytorch-version }}.xml
+        if-no-files-found: error
       if: failure()
 
     - name: Statistics
 
@@ -147,8 +147,9 @@ jobs:
     - name: Upload pytest results
       uses: actions/upload-artifact@v2
       with:
-        name: pytest-results-${{ runner.os }}-${{ matrix.python-version }}-${{ matrix.requires }}-${{ matrix.release }}
-        path: junit/test-results-${{ runner.os }}-${{ matrix.python-version }}-${{ matrix.requires }}-${{ matrix.release }}.xml
+        name: pytest-results-${{ runner.os }}-py${{ matrix.python-version }}-${{ matrix.requires }}-${{ matrix.release }}
+        path: junit/test-results-${{ runner.os }}-py${{ matrix.python-version }}-${{ matrix.requires }}-${{ matrix.release }}.xml
+        if-no-files-found: error
       if: failure()
 
     - name: Statistics
 
@@ -57,15 +57,16 @@ jobs:
 
     - name: Tests
       run: |
-        coverage run --source pytorch_lightning -m pytest tests -v --junitxml=junit/test-results-${{ runner.os }}-${{ matrix.python-version }}.xml
+        coverage run --source pytorch_lightning -m pytest tests -v --junitxml=junit/test-results-${{ runner.os }}-py${{ matrix.python-version }}.xml
       env:
         PL_RUN_SLOW_TESTS: 1
 
     - name: Upload pytest test results
       uses: actions/upload-artifact@v2
       with:
-        name: pytest-results-${{ runner.os }}-${{ matrix.python-version }}
-        path: junit/test-results-${{ runner.os }}-${{ matrix.python-version }}.xml
+        name: pytest-results-${{ runner.os }}-py${{ matrix.python-version }}
+        path: junit/test-results-${{ runner.os }}-py${{ matrix.python-version }}.xml
+        if-no-files-found: error
       if: failure()
 
     - name: Statistics
 
@@ -9,7 +9,10 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 
 ### Added
 
-- Add new `DETAIL` log level to provide useful logs for improving monitoring and debugging of batch jobs
+- Enable gradient accumulation using Horovod's `backward_passes_per_step` ([#11911](https://github.com/PyTorchLightning/pytorch-lightning/pull/11911))
+
+
+- Add new `DETAIL` log level to provide useful logs for improving monitoring and debugging of batch jobs ([#11008](https://github.com/PyTorchLightning/pytorch-lightning/pull/11008))
 
 
 - Added a flag `SLURMEnvironment(auto_requeue=True|False)` to control whether Lightning handles the requeuing ([#10601](https://github.com/PyTorchLightning/pytorch-lightning/pull/10601))
@@ -33,6 +36,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 - Added a function to validate if fault tolerant training is supported. ([#10465](https://github.com/PyTorchLightning/pytorch-lightning/pull/10465))
 
 
+- Added a private callback to manage the creation and deletion of fault-tolerance checkpoints ([#11862](https://github.com/PyTorchLightning/pytorch-lightning/pull/11862))
+
+
 - Show a better error message when a custom `DataLoader` implementation is not well implemented and we need to reconstruct it ([#10719](https://github.com/PyTorchLightning/pytorch-lightning/pull/10719))
 
 
@@ -66,6 +72,8 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 - Added a `LOGGER_REGISTRY` instance to register custom loggers to the `LightningCLI` ([#11533](https://github.com/PyTorchLightning/pytorch-lightning/pull/11533))
 
 
+- Added info message when the `Trainer` arguments `limit_*_batches`, `overfit_batches`, or `val_check_interval` are set to `1` or `1.0` ([#11950](https://github.com/PyTorchLightning/pytorch-lightning/pull/11950))
+
 - Added a `PrecisionPlugin.teardown` method ([#10990](https://github.com/PyTorchLightning/pytorch-lightning/pull/10990))
 
 
@@ -117,9 +125,13 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 - Added `Accelerator.is_available` to check device availability ([#11797](https://github.com/PyTorchLightning/pytorch-lightning/pull/11797))
 
 
+- Enabled static type-checking on the signature of `Trainer` ([#11888](https://github.com/PyTorchLightning/pytorch-lightning/pull/11888))
+
+
 - Added utility functions for moving optimizers to devices ([#11758](https://github.com/PyTorchLightning/pytorch-lightning/pull/11758))
 
 
+
 ### Changed
 
 - Implemented a new native and rich format in `_print_results` method of the `EvaluationLoop` ([#11332](https://github.com/PyTorchLightning/pytorch-lightning/pull/11332))
@@ -296,6 +308,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 
 - Changed default logger name to `lightning_logs` for consistency ([#11762](https://github.com/PyTorchLightning/pytorch-lightning/pull/11762))
 
+
+- Rewrote `accelerator_connector` ([#11448](https://github.com/PyTorchLightning/pytorch-lightning/pull/11448))
+
 ### Deprecated
 
 - Deprecated `training_type_plugin` property in favor of `strategy` in `Trainer` and updated the references ([#11141](https://github.com/PyTorchLightning/pytorch-lightning/pull/11141))
@@ -400,6 +415,12 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 - Deprecated `pytorch_lightning.utilities.warnings.LightningDeprecationWarning` in favor of `pytorch_lightning.utilities.rank_zero.LightningDeprecationWarning`
 
 
+- Deprecated `agg_key_funcs` and `agg_default_func` parameters from `LightningLoggerBase` ([#11871](https://github.com/PyTorchLightning/pytorch-lightning/pull/11871))
+
+
+- Deprecated `LightningLoggerBase.update_agg_funcs` ([#11871](https://github.com/PyTorchLightning/pytorch-lightning/pull/11871))
+
+
 - Deprecated `LightningLoggerBase.agg_and_log_metrics` in favor of `LightningLoggerBase.log_metrics` ([#11832](https://github.com/PyTorchLightning/pytorch-lightning/pull/11832))
 
 
@@ -553,6 +574,12 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 - Removed `log_text` and `log_image` from the `LightningLoggerBase` API ([#11857](https://github.com/PyTorchLightning/pytorch-lightning/pull/11857))
 
 
+- Removed calls to `profile("model_forward")` in favor of profiling `training_step` ([#12032](https://github.com/PyTorchLightning/pytorch-lightning/pull/12032))
+
+
+- Removed `get_mp_spawn_kwargs` from `DDPSpawnStrategy` and `TPUSpawnStrategy` in favor of configuration in the `_SpawnLauncher` ([#11966](https://github.com/PyTorchLightning/pytorch-lightning/pull/11966))
+
+
 ### Fixed
 
 - Fixed an issue where `HorovodStrategy.teardown()` did not complete gracefully if an exception was thrown during callback setup [#11752](https://github.com/PyTorchLightning/pytorch-lightning/pull/11752)
@@ -605,6 +632,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 - Configure native Deepspeed schedulers with interval='step' ([#11788](https://github.com/PyTorchLightning/pytorch-lightning/pull/11788))
 
 
+- Update `RichProgressBarTheme` styles after detecting light theme on colab ([#10993](https://github.com/PyTorchLightning/pytorch-lightning/pull/10993))
+
+
 ## [1.5.10] - 2022-02-08
 
 ### Fixed
@@ -641,6 +671,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 - Disabled sampler replacement when using `IterableDataset` ([#11507](https://github.com/PyTorchLightning/pytorch-lightning/pull/11507))
 
 
+- Disable loading dataloades if corresponding `limit_batches=0` ([#11576](https://github.com/PyTorchLightning/pytorch-lightning/pull/11576))
+
+
 ## [1.5.8] - 2022-01-05
 
 ### Fixed
 
@@ -506,7 +506,7 @@ but Bagua can usually produce a higher training throughput due to its backend wr
 
 .. code-block:: python
 
-    # train on 2 GPUs (using Bagua mode)
+    # train on 4 GPUs (using Bagua mode)
     trainer = Trainer(strategy="bagua", accelerator="gpu", devices=4)
 
 
 
@@ -19,7 +19,6 @@ PyTorch Lightning supports profiling standard actions in the training loop out o
 - on_train_epoch_start
 - on_train_epoch_end
 - on_train_batch_start
-- model_forward
 - model_backward
 - on_after_backward
 - optimizer_step
@@ -66,7 +65,6 @@ The profiler's results will be printed at the completion of a training ``trainer
     |  run_training_epoch                              |  6.1558         	|  6.1558         |
     |  run_training_batch                              |  0.0022506      	|  0.015754       |
     |  [LightningModule]BoringModel.optimizer_step     |  0.0017477      	|  0.012234       |
-    |  model_forward                                   |  0.00055868     	|  0.0039108      |
     |  [LightningModule]BoringModel.val_dataloader     |  0.00024388     	|  0.00024388     |
     |  on_train_batch_start                            |  0.00014637     	|  0.0010246      |
     |  [LightningModule]BoringModel.teardown           |  2.15e-06       	|  2.15e-06       |
@@ -210,6 +208,50 @@ To visualize the profiled operation, you can either:
     python -c 'import torch; print(torch.autograd.profiler.load_nvprof("trace_name.prof"))'
 
 
+XLA Profiler
+============
+
+:class:`~pytorch_lightning.profiler.xla.XLAProfiler` will help you debug and optimize training
+workload performance for your models using Cloud TPU performance tools.
+
+.. code-block:: python
+
+    # by passing the `XLAProfiler` alias
+    trainer = Trainer(..., profiler="xla")
+
+    # or by passing an instance
+    from pytorch_lightning.profiler import XLAProfiler
+
+    profiler = XLAProfiler(port=9001)
+    trainer = Trainer(..., profiler=profiler)
+
+
+Manual Capture via TensorBoard
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The following instructions are for capturing traces from a running program:
+
+0. This `guide <https://cloud.google.com/tpu/docs/pytorch-xla-performance-profiling-tpu-vm#tpu-vm>`_ will
+help you with the Cloud TPU setup with the required installations.
+
+1. Start a `TensorBoard <https://www.tensorflow.org/tensorboard>`_ server. You could view the TensorBoard output at ``http://localhost:9001`` on your local machine, and then open the
+``PROFILE`` plugin from the top right dropdown or open ``http://localhost:9001/#profile``
+
+.. code-block:: bash
+
+    tensorboard --logdir ./tensorboard --port 9001
+
+2. Once the code you'd like to profile is running, click on the ``CAPTURE PROFILE`` button. Enter
+``localhost:9001`` (default port for XLA Profiler) as the Profile Service URL. Then, enter
+the number of milliseconds for the profiling duration, and click ``CAPTURE``
+
+3. Make sure the code is running while you are trying to capture the traces. Also, it would lead to better
+performance insights if the profiling duration is longer than the step time.
+
+4. Once the capture is finished, the page will refresh and you can browse through the insights using the
+``Tools`` dropdown at the top left
+
+
 ----------------
 
 ****************
 
@@ -1544,8 +1544,8 @@ val_check_interval
 How often within one training epoch to check the validation set.
 Can specify as float or int.
 
-- use (float) to check within a training epoch
-- use (int) to check every n steps (batches)
+- pass a ``float`` in the range [0.0, 1.0] to check after a fraction of the training epoch.
+- pass an ``int`` to check after a fixed number of training batches.
 
 .. testcode::