Lightning-AI
diff --git a/‎.deepsource.toml
Lines changed: 0 additions & 26 deletions b/‎.deepsource.toml
Lines changed: 0 additions & 26 deletions
diff --git a/‎CHANGELOG.md
Lines changed: 30 additions & 2 deletions b/‎CHANGELOG.md
Lines changed: 30 additions & 2 deletions
diff --git a/‎docs/source/advanced/advanced_gpu.rst
Lines changed: 23 additions & 23 deletions b/‎docs/source/advanced/advanced_gpu.rst
Lines changed: 23 additions & 23 deletions
diff --git a/‎docs/source/advanced/ipu.rst
Lines changed: 5 additions & 5 deletions b/‎docs/source/advanced/ipu.rst
Lines changed: 5 additions & 5 deletions
diff --git a/‎docs/source/advanced/mixed_precision.rst
Lines changed: 2 additions & 2 deletions b/‎docs/source/advanced/mixed_precision.rst
Lines changed: 2 additions & 2 deletions
@@ -64,6 +64,8 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
   * Allow registering custom optimizers and learning rate schedulers without subclassing the CLI ([#9565](https://github.com/PyTorchLightning/pytorch-lightning/pull/9565))
   * Support shorthand notation to instantiate optimizers and learning rate schedulers ([#9565](https://github.com/PyTorchLightning/pytorch-lightning/pull/9565))
   * Support passing lists of callbacks via command line ([#8815](https://github.com/PyTorchLightning/pytorch-lightning/pull/8815))
+  * Support shorthand notation to instantiate models ([#9588](https://github.com/PyTorchLightning/pytorch-lightning/pull/9588))
+  * Support shorthand notation to instantiate datamodules ([#10011](https://github.com/PyTorchLightning/pytorch-lightning/pull/10011))
 
 
 - Fault-tolerant training:
@@ -193,24 +195,35 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 - Added `strategy` argument to Trainer ([#8597](https://github.com/PyTorchLightning/pytorch-lightning/pull/8597))
 
 
+- Added `init_meta_context`, `materialize_module` utilities ([#9920](https://github.com/PyTorchLightning/pytorch-lightning/pull/9920))
+
+
 - Added `TPUPrecisionPlugin` ([#10020](https://github.com/PyTorchLightning/pytorch-lightning/pull/#10020))
 
 
 - `torch.bfloat16` support:
   * Added bfloat16 support for Lightning Trainer ([#9049](https://github.com/PyTorchLightning/pytorch-lightning/pull/9049))
   * Renamed `TPUHalfPrecisionPlugin` to `TPUBf16PrecisionPlugin` ([#10026](https://github.com/PyTorchLightning/pytorch-lightning/pull/10026))
-
+  * Default to `precision=bf16` on CPU when `precision=16` is passed ([#10033](https://github.com/PyTorchLightning/pytorch-lightning/pull/10033))
 
 
 - Added `kfold` example for loop customization ([#9965](https://github.com/PyTorchLightning/pytorch-lightning/pull/9965))
 
 
 - LightningLite:
     * Added `PrecisionPlugin.forward_context`, making it the default implementation for all `{train,val,test,predict}_step_context()` methods ([#9988](https://github.com/PyTorchLightning/pytorch-lightning/pull/9988))
-    * Added `DDPSpawnPlugin.spawn()` for spawning new processes of a given function ([#10018](https://github.com/PyTorchLightning/pytorch-lightning/pull/10018))
+    * Added `DDPSpawnPlugin.spawn()` for spawning new processes of a given function ([#10018](https://github.com/PyTorchLightning/pytorch-lightning/pull/10018), [#10022](https://github.com/PyTorchLightning/pytorch-lightning/pull/10022))
     * Added `TrainingTypePlugin.{_setup_model, _setup_optimizer}` methods ([#9994](https://github.com/PyTorchLightning/pytorch-lightning/pull/9994))
     * Implemented `DataParallelPlugin._setup_model` ([#10010](https://github.com/PyTorchLightning/pytorch-lightning/pull/10010))
     * Implemented `DeepSpeedPlugin._setup_models_and_optimizers` ([#10009](https://github.com/PyTorchLightning/pytorch-lightning/pull/10009))
+    * Implemented `{DDPShardedPlugin,DDPShardedSpawnPlugin}._setup_models_and_optimizers` ([#10028](https://github.com/PyTorchLightning/pytorch-lightning/pull/10028))
+    * Added optional `model` argument to the `optimizer_step` methods in accelerators and plugins ([#10023](https://github.com/PyTorchLightning/pytorch-lightning/pull/10023))
+
+
+
+- Added `XLACheckpointIO` plugin ([#9972](https://github.com/PyTorchLightning/pytorch-lightning/pull/9972))
+
+
 
 ### Changed
 
@@ -508,6 +521,12 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 - Remove deprecated `distributed_backend` from `Trainer` ([#10017](https://github.com/PyTorchLightning/pytorch-lightning/pull/10017))
 
 
+- Removed `process_idx` from the `{DDPSpawnPlugin,TPUSpawnPlugin}.new_process` methods ([#10022](https://github.com/PyTorchLightning/pytorch-lightning/pull/10022))
+
+
+- Removed automatic patching of `{train,val,test,predict}_dataloader()` on the `LightningModule` ([#9764](https://github.com/PyTorchLightning/pytorch-lightning/pull/9764))
+
+
 ### Fixed
 
 
@@ -553,6 +572,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 - Fixed `broadcast` in `DDPPlugin` and ``DDPSpawnPlugin` to respect the `src` input  ([#9691](https://github.com/PyTorchLightning/pytorch-lightning/pull/9691))
 
 
+- Fixed `self.log(on_epoch=True, reduce_fx=sum))` for the `on_batch_start` and `on_train_batch_start` hooks ([#9791(https://github.com/PyTorchLightning/pytorch-lightning/pull/9791))
+
+
 - Fixed `self.log(on_epoch=True)` for the `on_batch_start` and `on_train_batch_start` hooks ([#9780](https://github.com/PyTorchLightning/pytorch-lightning/pull/9780))
 
 
@@ -585,6 +607,12 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 - Fixed `train_dataloader` getting loaded twice when resuming from a checkpoint during `Trainer.fit()` ([#9671](https://github.com/PyTorchLightning/pytorch-lightning/pull/9671))
 
 
+- Fixed `LearningRateMonitor` logging with multiple param groups optimizer with no scheduler ([#10044](https://github.com/PyTorchLightning/pytorch-lightning/pull/10044))
+
+
+
+- Fixed undesired side effects being caused by `Trainer` patching dataloader methods on the `LightningModule` ([#9764](https://github.com/PyTorchLightning/pytorch-lightning/pull/9764))
+
 
 ## [1.4.9] - 2021-09-30
 
 
@@ -71,9 +71,9 @@ To use Sharded Training, you need to first install FairScale using the command b
 .. code-block:: python
 
     # train using Sharded DDP
-    trainer = Trainer(plugins="ddp_sharded")
+    trainer = Trainer(strategy="ddp_sharded")
 
-Sharded Training can work across all DDP variants by adding the additional ``--plugins ddp_sharded`` flag.
+Sharded Training can work across all DDP variants by adding the additional ``--strategy ddp_sharded`` flag.
 
 Internally we re-initialize your optimizers and shard them across your machines and processes. We handle all communication using PyTorch distributed, so no code changes are required.
 
@@ -156,7 +156,7 @@ Below is an example of using both ``wrap`` and ``auto_wrap`` to create your mode
 
 
     model = MyModel()
-    trainer = Trainer(gpus=4, plugins="fsdp", precision=16)
+    trainer = Trainer(gpus=4, strategy="fsdp", precision=16)
     trainer.fit(model)
 
     trainer.test()
@@ -248,7 +248,7 @@ It is recommended to skip Stage 1 and use Stage 2, which comes with larger memor
     from pytorch_lightning import Trainer
 
     model = MyModel()
-    trainer = Trainer(gpus=4, plugins="deepspeed_stage_1", precision=16)
+    trainer = Trainer(gpus=4, strategy="deepspeed_stage_1", precision=16)
     trainer.fit(model)
 
 
@@ -265,7 +265,7 @@ As a result, benefits can also be seen on a single GPU. Do note that the default
     from pytorch_lightning import Trainer
 
     model = MyModel()
-    trainer = Trainer(gpus=4, plugins="deepspeed_stage_2", precision=16)
+    trainer = Trainer(gpus=4, strategy="deepspeed_stage_2", precision=16)
     trainer.fit(model)
 
 .. code-block:: bash
@@ -286,7 +286,7 @@ Below we show an example of running `ZeRO-Offload <https://www.deepspeed.ai/tuto
     from pytorch_lightning.plugins import DeepSpeedPlugin
 
     model = MyModel()
-    trainer = Trainer(gpus=4, plugins="deepspeed_stage_2_offload", precision=16)
+    trainer = Trainer(gpus=4, strategy="deepspeed_stage_2_offload", precision=16)
     trainer.fit(model)
 
 
@@ -307,7 +307,7 @@ You can also modify the ZeRO-Offload parameters via the plugin as below.
     model = MyModel()
     trainer = Trainer(
         gpus=4,
-        plugins=DeepSpeedPlugin(offload_optimizer=True, allgather_bucket_size=5e8, reduce_bucket_size=5e8),
+        strategy=DeepSpeedPlugin(offload_optimizer=True, allgather_bucket_size=5e8, reduce_bucket_size=5e8),
         precision=16,
     )
     trainer.fit(model)
@@ -340,7 +340,7 @@ For even more speed benefit, DeepSpeed offers an optimized CPU version of ADAM c
 
 
     model = MyModel()
-    trainer = Trainer(gpus=4, plugins="deepspeed_stage_2_offload", precision=16)
+    trainer = Trainer(gpus=4, strategy="deepspeed_stage_2_offload", precision=16)
     trainer.fit(model)
 
 
@@ -383,7 +383,7 @@ Also please have a look at our :ref:`deepspeed-zero-stage-3-tips` which contains
 
 
     model = MyModel()
-    trainer = Trainer(gpus=4, plugins="deepspeed_stage_3", precision=16)
+    trainer = Trainer(gpus=4, strategy="deepspeed_stage_3", precision=16)
     trainer.fit(model)
 
     trainer.test()
@@ -403,7 +403,7 @@ You can also use the Lightning Trainer to run predict or evaluate with DeepSpeed
 
 
     model = MyModel()
-    trainer = Trainer(gpus=4, plugins="deepspeed_stage_3", precision=16)
+    trainer = Trainer(gpus=4, strategy="deepspeed_stage_3", precision=16)
     trainer.test(ckpt_path="my_saved_deepspeed_checkpoint.ckpt")
 
 
@@ -438,7 +438,7 @@ This reduces the time taken to initialize very large models, as well as ensure w
 
 
     model = MyModel()
-    trainer = Trainer(gpus=4, plugins="deepspeed_stage_3", precision=16)
+    trainer = Trainer(gpus=4, strategy="deepspeed_stage_3", precision=16)
     trainer.fit(model)
 
     trainer.test()
@@ -463,14 +463,14 @@ DeepSpeed ZeRO Stage 3 Offloads optimizer state, gradients to the host CPU to re
 
     # Enable CPU Offloading
     model = MyModel()
-    trainer = Trainer(gpus=4, plugins="deepspeed_stage_3_offload", precision=16)
+    trainer = Trainer(gpus=4, strategy="deepspeed_stage_3_offload", precision=16)
     trainer.fit(model)
 
     # Enable CPU Offloading, and offload parameters to CPU
     model = MyModel()
     trainer = Trainer(
         gpus=4,
-        plugins=DeepSpeedPlugin(
+        strategy=DeepSpeedPlugin(
             stage=3,
             offload_optimizer=True,
             offload_parameters=True,
@@ -492,14 +492,14 @@ Additionally, DeepSpeed supports offloading to NVMe drives for even larger model
 
     # Enable CPU Offloading
     model = MyModel()
-    trainer = Trainer(gpus=4, plugins="deepspeed_stage_3_offload", precision=16)
+    trainer = Trainer(gpus=4, strategy="deepspeed_stage_3_offload", precision=16)
     trainer.fit(model)
 
     # Enable CPU Offloading, and offload parameters to CPU
     model = MyModel()
     trainer = Trainer(
         gpus=4,
-        plugins=DeepSpeedPlugin(
+        strategy=DeepSpeedPlugin(
             stage=3,
             offload_optimizer=True,
             offload_parameters=True,
@@ -576,12 +576,12 @@ This saves memory when training larger models, however requires using a checkpoi
     model = MyModel()
 
 
-    trainer = Trainer(gpus=4, plugins="deepspeed_stage_3_offload", precision=16)
+    trainer = Trainer(gpus=4, strategy="deepspeed_stage_3_offload", precision=16)
 
     # Enable CPU Activation Checkpointing
     trainer = Trainer(
         gpus=4,
-        plugins=DeepSpeedPlugin(
+        strategy=DeepSpeedPlugin(
             stage=3,
             offload_optimizer=True,  # Enable CPU Offloading
             cpu_checkpointing=True,  # (Optional) offload activations to CPU
@@ -670,7 +670,7 @@ In some cases you may want to define your own DeepSpeed Config, to access all pa
     }
 
     model = MyModel()
-    trainer = Trainer(gpus=4, plugins=DeepSpeedPlugin(deepspeed_config), precision=16)
+    trainer = Trainer(gpus=4, strategy=DeepSpeedPlugin(deepspeed_config), precision=16)
     trainer.fit(model)
 
 
@@ -682,7 +682,7 @@ We support taking the config as a json formatted file:
     from pytorch_lightning.plugins import DeepSpeedPlugin
 
     model = MyModel()
-    trainer = Trainer(gpus=4, plugins=DeepSpeedPlugin("/path/to/deepspeed_config.json"), precision=16)
+    trainer = Trainer(gpus=4, strategy=DeepSpeedPlugin("/path/to/deepspeed_config.json"), precision=16)
     trainer.fit(model)
 
 
@@ -717,7 +717,7 @@ This can reduce peak memory usage and throughput as saved memory will be equal t
     from pytorch_lightning.plugins import DDPPlugin
 
     model = MyModel()
-    trainer = Trainer(gpus=4, plugins=DDPPlugin(gradient_as_bucket_view=True))
+    trainer = Trainer(gpus=4, strategy=DDPPlugin(gradient_as_bucket_view=True))
     trainer.fit(model)
 
 DDP Communication Hooks
@@ -740,7 +740,7 @@ Enable `FP16 Compress Hook for multi-node throughput improvement <https://pytorc
     )
 
     model = MyModel()
-    trainer = Trainer(gpus=4, plugins=DDPPlugin(ddp_comm_hook=default.fp16_compress_hook))
+    trainer = Trainer(gpus=4, strategy=DDPPlugin(ddp_comm_hook=default.fp16_compress_hook))
     trainer.fit(model)
 
 Enable `PowerSGD for multi-node throughput improvement <https://pytorch.org/docs/stable/ddp_comm_hooks.html#powersgd-communication-hook>`__:
@@ -758,7 +758,7 @@ Enable `PowerSGD for multi-node throughput improvement <https://pytorch.org/docs
     model = MyModel()
     trainer = Trainer(
         gpus=4,
-        plugins=DDPPlugin(
+        strategy=DDPPlugin(
             ddp_comm_state=powerSGD.PowerSGDState(
                 process_group=None,
                 matrix_approximation_rank=1,
@@ -787,7 +787,7 @@ Combine hooks for accumulated benefit:
     model = MyModel()
     trainer = Trainer(
         gpus=4,
-        plugins=DDPPlugin(
+        strategy=DDPPlugin(
             ddp_comm_state=powerSGD.PowerSGDState(
                 process_group=None,
                 matrix_approximation_rank=1,
 
@@ -83,7 +83,7 @@ IPUs provide further optimizations to speed up training. By using the ``IPUPlugi
     from pytorch_lightning.plugins import IPUPlugin
 
     model = MyLightningModule()
-    trainer = pl.Trainer(ipus=8, plugins=IPUPlugin(device_iterations=32))
+    trainer = pl.Trainer(ipus=8, strategy=IPUPlugin(device_iterations=32))
     trainer.fit(model)
 
 Note that by default we return the last device iteration loss. You can override this by passing in your own ``poptorch.Options`` and setting the AnchorMode as described in the `PopTorch documentation <https://docs.graphcore.ai/projects/poptorch-user-guide/en/latest/reference.html#poptorch.Options.anchorMode>`__.
@@ -102,7 +102,7 @@ Note that by default we return the last device iteration loss. You can override
     training_opts.anchorMode(poptorch.AnchorMode.All)
     training_opts.deviceIterations(32)
 
-    trainer = Trainer(ipus=8, plugins=IPUPlugin(inference_opts=inference_opts, training_opts=training_opts))
+    trainer = Trainer(ipus=8, strategy=IPUPlugin(inference_opts=inference_opts, training_opts=training_opts))
     trainer.fit(model)
 
 You can also override all options by passing the ``poptorch.Options`` to the plugin. See `PopTorch options documentation <https://docs.graphcore.ai/projects/poptorch-user-guide/en/latest/batching.html>`__ for more information.
@@ -124,7 +124,7 @@ Lightning supports dumping all reports to a directory to open using the tool.
     from pytorch_lightning.plugins import IPUPlugin
 
     model = MyLightningModule()
-    trainer = pl.Trainer(ipus=8, plugins=IPUPlugin(autoreport_dir="report_dir/"))
+    trainer = pl.Trainer(ipus=8, strategy=IPUPlugin(autoreport_dir="report_dir/"))
     trainer.fit(model)
 
 This will dump all reports to ``report_dir/`` which can then be opened using the Graph Analyser Tool, see `Opening Reports <https://docs.graphcore.ai/projects/graphcore-popvision-user-guide/en/latest/graph/graph.html#opening-reports>`__.
@@ -174,7 +174,7 @@ Below is an example using the block annotation in a LightningModule.
 
 
     model = MyLightningModule()
-    trainer = pl.Trainer(ipus=8, plugins=IPUPlugin(device_iterations=20))
+    trainer = pl.Trainer(ipus=8, strategy=IPUPlugin(device_iterations=20))
     trainer.fit(model)
 
 
@@ -217,7 +217,7 @@ You can also use the block context manager within the forward function, or any o
 
 
     model = MyLightningModule()
-    trainer = pl.Trainer(ipus=8, plugins=IPUPlugin(device_iterations=20))
+    trainer = pl.Trainer(ipus=8, strategy=IPUPlugin(device_iterations=20))
     trainer.fit(model)
 
 
 
@@ -50,14 +50,14 @@ BFloat16 Mixed precision is similar to FP16 mixed precision, however we maintain
 Since BFloat16 is more stable than FP16 during training, we do not need to worry about any gradient scaling or nan gradient values that comes with using FP16 mixed precision.
 
 .. testcode::
-    :skipif: not _TORCH_BFLOAT_AVAILABLE
+    :skipif: not _TORCH_GREATER_EQUAL_DEV_1_10 or not torch.cuda.is_available()
 
     Trainer(gpus=1, precision="bf16")
 
 It is also possible to use BFloat16 mixed precision on the CPU, relying on MKLDNN under the hood.
 
 .. testcode::
-    :skipif: not _TORCH_CPU_AMP_AVAILABLE
+    :skipif: not _TORCH_GREATER_EQUAL_DEV_1_10
 
     Trainer(precision="bf16")