Lightning-AI
diff --git a/‎CHANGELOG.md
Lines changed: 34 additions & 0 deletions b/‎CHANGELOG.md
Lines changed: 34 additions & 0 deletions
diff --git a/‎benchmarks/test_basic_parity.py
Lines changed: 1 addition & 1 deletion b/‎benchmarks/test_basic_parity.py
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/source/advanced/multi_gpu.rst
Lines changed: 8 additions & 2 deletions b/‎docs/source/advanced/multi_gpu.rst
Lines changed: 8 additions & 2 deletions
diff --git a/‎docs/source/common/debugging.rst
Lines changed: 8 additions & 4 deletions b/‎docs/source/common/debugging.rst
Lines changed: 8 additions & 4 deletions
diff --git a/‎docs/source/common/hyperparameters.rst
Lines changed: 1 addition & 1 deletion b/‎docs/source/common/hyperparameters.rst
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/source/common/trainer.rst
Lines changed: 60 additions & 36 deletions b/‎docs/source/common/trainer.rst
Lines changed: 60 additions & 36 deletions
diff --git a/‎docs/source/guides/data.rst
Lines changed: 2 additions & 2 deletions b/‎docs/source/guides/data.rst
Lines changed: 2 additions & 2 deletions
diff --git a/‎pl_examples/bug_report_model.py
Lines changed: 1 addition & 1 deletion b/‎pl_examples/bug_report_model.py
Lines changed: 1 addition & 1 deletion
diff --git a/‎pl_examples/domain_templates/computer_vision_fine_tuning.py
Lines changed: 1 addition & 1 deletion b/‎pl_examples/domain_templates/computer_vision_fine_tuning.py
Lines changed: 1 addition & 1 deletion
diff --git a/‎pytorch_lightning/callbacks/base.py
Lines changed: 1 addition & 1 deletion b/‎pytorch_lightning/callbacks/base.py
Lines changed: 1 addition & 1 deletion
@@ -175,9 +175,15 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 - Enabled automatic parameters tying for TPUs ([#9525](https://github.com/PyTorchLightning/pytorch-lightning/pull/9525))
 
 
+- Raise a `MisconfigurationException` when trainer functions are called with `ckpt_path="best"` but `checkpoint_callback` isn't configured ([#9841](https://github.com/PyTorchLightning/pytorch-lightning/pull/9841))
+
+
 - Added support for `torch.autograd.set_detect_anomaly` through `Trainer` constructor argument `detect_anomaly` ([#9848](https://github.com/PyTorchLightning/pytorch-lightning/pull/9848))
 
 
+- Added `enable_model_summary` flag to Trainer ([#9699](https://github.com/PyTorchLightning/pytorch-lightning/pull/9699))
+
+
 - Added `strategy` argument to Trainer ([#8597](https://github.com/PyTorchLightning/pytorch-lightning/pull/8597))
 
 
@@ -260,6 +266,10 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 - Changed `HorovodPlugin.all_gather` to return a `torch.Tensor` instead of a list ([#9696](https://github.com/PyTorchLightning/pytorch-lightning/pull/9696))
 
 
+- Changed Trainer connectors to be protected attributes:
+    * Configuration Validator ([#9779](https://github.com/PyTorchLightning/pytorch-lightning/pull/9779))
+
+
 - Restore `current_epoch` and `global_step` irrespective of trainer task ([#9413](https://github.com/PyTorchLightning/pytorch-lightning/pull/9413))
 
 
@@ -272,11 +282,17 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 - Update the logic to check for accumulation steps with deepspeed ([#9826](https://github.com/PyTorchLightning/pytorch-lightning/pull/9826))
 
 
+- Updated error message for interactive incompatible plugins ([#9896](https://github.com/PyTorchLightning/pytorch-lightning/pull/9896))
+
+
 ### Deprecated
 
 - Deprecated trainer argument `terminate_on_nan` in favour of `detect_anomaly`([#9175](https://github.com/PyTorchLightning/pytorch-lightning/pull/9175))
 
 
+- Deprecated `Trainer.terminate_on_nan` public attribute access ([#9849](https://github.com/PyTorchLightning/pytorch-lightning/pull/9849))
+
+
 - Deprecated `LightningModule.summarize()` in favor of `pytorch_lightning.utilities.model_summary.summarize()`
 
 
@@ -325,12 +341,18 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 - Deprecated Accelerator collective API `barrier`, `broadcast`, and `all_gather`, call `TrainingTypePlugin` collective API directly ([#9677](https://github.com/PyTorchLightning/pytorch-lightning/pull/9677))
 
 
+- Deprecated `checkpoint_callback` from the `Trainer` constructor in favour of `enable_checkpointing` ([#9754](https://github.com/PyTorchLightning/pytorch-lightning/pull/9754))
+
+
 - Deprecated the `LightningModule.on_post_move_to_device` method ([#9525](https://github.com/PyTorchLightning/pytorch-lightning/pull/9525))
 
 
 - Deprecated `pytorch_lightning.core.decorators.parameter_validation` in favor of `pytorch_lightning.utilities.parameter_tying.set_shared_parameters` ([#9525](https://github.com/PyTorchLightning/pytorch-lightning/pull/9525))
 
 
+- Deprecated passing `weights_summary` to the `Trainer` constructor in favor of adding the `ModelSummary` callback with `max_depth` directly to the list of callbacks ([#9699](https://github.com/PyTorchLightning/pytorch-lightning/pull/9699))
+
+
 ### Removed
 
 - Removed deprecated `metrics` ([#8586](https://github.com/PyTorchLightning/pytorch-lightning/pull/8586/))
@@ -435,6 +457,12 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 - Removed a redundant warning with `ModelCheckpoint(monitor=None)` callback ([#9875](https://github.com/PyTorchLightning/pytorch-lightning/pull/9875))
 
 
+- Remove `epoch` from `trainer.logged_metrics` ([#9904](https://github.com/PyTorchLightning/pytorch-lightning/pull/9904))
+
+
+- Removed `should_rank_save_checkpoint` property from Trainer ([#9433](https://github.com/PyTorchLightning/pytorch-lightning/pull/9433))
+
+
 ### Fixed
 
 
@@ -459,6 +487,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 - Fixed `BasePredictionWriter` not returning the batch_indices in a non-distributed setting ([#9432](https://github.com/PyTorchLightning/pytorch-lightning/pull/9432))
 
 
+- Fixed an error when running on in XLA environments with no TPU attached ([#9572](https://github.com/PyTorchLightning/pytorch-lightning/pull/9572))
+
+
 - Fixed check on torchmetrics logged whose `compute()` output is a multielement tensor ([#9582](https://github.com/PyTorchLightning/pytorch-lightning/pull/9582))
 
 
@@ -485,6 +516,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 - Fixed missing arguments when saving hyperparameters from the parent class but not from the child class ([#9800](https://github.com/PyTorchLightning/pytorch-lightning/pull/9800))
 
 
+- Fixed DeepSpeed GPU device IDs ([#9847](https://github.com/PyTorchLightning/pytorch-lightning/pull/9847))
+
+
 - Reset `val_dataloader` in `tuner/batch_size_scaling` ([#9857](https://github.com/PyTorchLightning/pytorch-lightning/pull/9857))
 
 
 
@@ -159,7 +159,7 @@ def lightning_loop(cls_model, idx, device_type: str = "cuda", num_epochs=10):
         # as the first run is skipped, no need to run it long
         max_epochs=num_epochs if idx > 0 else 1,
         enable_progress_bar=False,
-        weights_summary=None,
+        enable_model_summary=False,
         gpus=1 if device_type == "cuda" else 0,
         checkpoint_callback=False,
         logger=False,
 
@@ -611,28 +611,34 @@ Let's say you have a batch size of 7 in your dataloader.
         def train_dataloader(self):
             return Dataset(..., batch_size=7)
 
-In DDP or Horovod your effective batch size will be 7 * gpus * num_nodes.
+In DDP, DDP_SPAWN, Deepspeed, DDP_SHARDED, or Horovod your effective batch size will be 7 * gpus * num_nodes.
 
 .. code-block:: python
 
     # effective batch size = 7 * 8
     Trainer(gpus=8, accelerator="ddp")
+    Trainer(gpus=8, accelerator="ddp_spawn")
+    Trainer(gpus=8, accelerator="ddp_sharded")
     Trainer(gpus=8, accelerator="horovod")
 
     # effective batch size = 7 * 8 * 10
     Trainer(gpus=8, num_nodes=10, accelerator="ddp")
+    Trainer(gpus=8, num_nodes=10, accelerator="ddp_spawn")
+    Trainer(gpus=8, num_nodes=10, accelerator="ddp_sharded")
     Trainer(gpus=8, num_nodes=10, accelerator="horovod")
 
-In DDP2, your effective batch size will be 7 * num_nodes.
+In DDP2 or DP, your effective batch size will be 7 * num_nodes.
 The reason is that the full batch is visible to all GPUs on the node when using DDP2.
 
 .. code-block:: python
 
     # effective batch size = 7
     Trainer(gpus=8, accelerator="ddp2")
+    Trainer(gpus=8, accelerator="dp")
 
     # effective batch size = 7 * 10
     Trainer(gpus=8, num_nodes=10, accelerator="ddp2")
+    Trainer(gpus=8, accelerator="dp")
 
 
 .. note:: Huge batch sizes are actually really bad for convergence. Check out:
 
@@ -95,11 +95,14 @@ Print a summary of your LightningModule
 ---------------------------------------
 Whenever the ``.fit()`` function gets called, the Trainer will print the weights summary for the LightningModule.
 By default it only prints the top-level modules. If you want to show all submodules in your network, use the
-`'full'` option:
+``max_depth`` option:
 
 .. testcode::
 
-    trainer = Trainer(weights_summary="full")
+    from pytorch_lightning.callbacks import ModelSummary
+
+    trainer = Trainer(callbacks=[ModelSummary(max_depth=-1)])
+
 
 You can also display the intermediate input- and output sizes of all your layers by setting the
 ``example_input_array`` attribute in your LightningModule. It will print a table like this
@@ -115,8 +118,9 @@ You can also display the intermediate input- and output sizes of all your layers
 when you call ``.fit()`` on the Trainer. This can help you find bugs in the composition of your layers.
 
 See Also:
-    - :paramref:`~pytorch_lightning.trainer.trainer.Trainer.weights_summary` Trainer argument
-    - :class:`~pytorch_lightning.core.memory.ModelSummary`
+    - :class:`~pytorch_lightning.callbacks.model_summary.ModelSummary`
+    - :func:`~pytorch_lightning.utilities.model_summary.summarize`
+    - :class:`~pytorch_lightning.utilities.model_summary.ModelSummary`
 
 ----------------
 
 
@@ -201,7 +201,7 @@ To recap, add ALL possible trainer flags to the argparser and init the ``Trainer
     trainer = Trainer.from_argparse_args(hparams)
 
     # or if you need to pass in callbacks
-    trainer = Trainer.from_argparse_args(hparams, checkpoint_callback=..., callbacks=[...])
+    trainer = Trainer.from_argparse_args(hparams, enable_checkpointing=..., callbacks=[...])
 
 ----------
 
 
@@ -528,6 +528,38 @@ Example::
 checkpoint_callback
 ^^^^^^^^^^^^^^^^^^^
 
+Deprecated: This has been deprecated in v1.5 and will be removed in v.17. Please use ``enable_checkpointing`` instead.
+
+default_root_dir
+^^^^^^^^^^^^^^^^
+
+.. raw:: html
+
+    <video width="50%" max-width="400px" controls
+    poster="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/trainer_flags/thumb/default%E2%80%A8_root_dir.jpg"
+    src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/trainer_flags/default_root_dir.mp4"></video>
+
+|
+
+Default path for logs and weights when no logger or
+:class:`pytorch_lightning.callbacks.ModelCheckpoint` callback passed.  On
+certain clusters you might want to separate where logs and checkpoints are
+stored. If you don't then use this argument for convenience. Paths can be local
+paths or remote paths such as `s3://bucket/path` or 'hdfs://path/'. Credentials
+will need to be set up to use remote filepaths.
+
+.. testcode::
+
+    # default used by the Trainer
+    trainer = Trainer(default_root_dir=os.getcwd())
+
+distributed_backend
+^^^^^^^^^^^^^^^^^^^
+Deprecated: This has been renamed ``accelerator``.
+
+enable_checkpointing
+^^^^^^^^^^^^^^^^^^^^
+
 .. raw:: html
 
     <video width="50%" max-width="400px" controls
@@ -542,11 +574,11 @@ To disable automatic checkpointing, set this to `False`.
 
 .. code-block:: python
 
-    # default used by Trainer
-    trainer = Trainer(checkpoint_callback=True)
+    # default used by Trainer, saves the most recent model to a single checkpoint after each epoch
+    trainer = Trainer(enable_checkpointing=True)
 
     # turn off automatic checkpointing
-    trainer = Trainer(checkpoint_callback=False)
+    trainer = Trainer(enable_checkpointing=False)
 
 
 You can override the default behavior by initializing the :class:`~pytorch_lightning.callbacks.ModelCheckpoint`
@@ -563,38 +595,6 @@ See :doc:`Saving and Loading Weights <../common/weights_loading>` for how to cus
     # Add your callback to the callbacks list
     trainer = Trainer(callbacks=[checkpoint_callback])
 
-
-.. warning:: Passing a ModelCheckpoint instance to this argument is deprecated since
-    v1.1 and will be unsupported from v1.3. Use `callbacks` argument instead.
-
-
-default_root_dir
-^^^^^^^^^^^^^^^^
-
-.. raw:: html
-
-    <video width="50%" max-width="400px" controls
-    poster="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/trainer_flags/thumb/default%E2%80%A8_root_dir.jpg"
-    src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/trainer_flags/default_root_dir.mp4"></video>
-
-|
-
-Default path for logs and weights when no logger or
-:class:`pytorch_lightning.callbacks.ModelCheckpoint` callback passed.  On
-certain clusters you might want to separate where logs and checkpoints are
-stored. If you don't then use this argument for convenience. Paths can be local
-paths or remote paths such as `s3://bucket/path` or 'hdfs://path/'. Credentials
-will need to be set up to use remote filepaths.
-
-.. testcode::
-
-    # default used by the Trainer
-    trainer = Trainer(default_root_dir=os.getcwd())
-
-distributed_backend
-^^^^^^^^^^^^^^^^^^^
-Deprecated: This has been renamed ``accelerator``.
-
 fast_dev_run
 ^^^^^^^^^^^^
 
@@ -1589,6 +1589,11 @@ Example::
 weights_summary
 ^^^^^^^^^^^^^^^
 
+.. warning:: `weights_summary` is deprecated in v1.5 and will be removed in v1.7. Please pass :class:`~pytorch_lightning.callbacks.model_summary.ModelSummary`
+    directly to the Trainer's ``callbacks`` argument instead. To disable the model summary,
+    pass ``enable_model_summary = False`` to the Trainer.
+
+
 .. raw:: html
 
     <video width="50%" max-width="400px" controls
@@ -1611,6 +1616,25 @@ Options: 'full', 'top', None.
     # don't print a summary
     trainer = Trainer(weights_summary=None)
 
+
+enable_model_summary
+^^^^^^^^^^^^^^^^^^^^
+
+Whether to enable or disable the model summarization. Defaults to True.
+
+.. testcode::
+
+    # default used by the Trainer
+    trainer = Trainer(enable_model_summary=True)
+
+    # disable summarization
+    trainer = Trainer(enable_model_summary=False)
+
+    # enable custom summarization
+    from pytorch_lightning.callbacks import ModelSummary
+
+    trainer = Trainer(enable_model_summary=True, callbacks=[ModelSummary(max_depth=-1)])
+
 -----
 
 Trainer class API
@@ -1706,7 +1730,7 @@ The metrics sent to the logger (visualizer).
 .. code-block:: python
 
     def training_step(self, batch, batch_idx):
-        self.log("a_val", 2, log=True)
+        self.log("a_val", 2, logger=True)
 
 
     logged_metrics = trainer.logged_metrics
 
@@ -227,8 +227,8 @@ needs to wrap the DataLoaders with `CombinedLoader`.
 
 
     def val_dataloader(self):
-        loader_1 = DataLoader()
-        loader_2 = DataLoader()
+        loader_a = DataLoader()
+        loader_b = DataLoader()
         loaders = {"a": loader_a, "b": loader_b}
         combined_loaders = CombinedLoader(loaders, "max_size_cycle")
         return combined_loaders
 
@@ -55,7 +55,7 @@ def run():
         limit_val_batches=1,
         num_sanity_val_steps=0,
         max_epochs=1,
-        weights_summary=None,
+        enable_model_summary=False,
     )
     trainer.fit(model, train_dataloaders=train_data, val_dataloaders=val_data)
     trainer.test(model, dataloaders=test_data)
 
@@ -272,7 +272,7 @@ def add_arguments_to_parser(self, parser):
         parser.set_defaults(
             {
                 "trainer.max_epochs": 15,
-                "trainer.weights_summary": None,
+                "trainer.enable_model_summary": False,
                 "trainer.num_sanity_val_steps": 0,
             }
         )
 
@@ -327,5 +327,5 @@ def on_before_optimizer_step(
         pass
 
     def on_before_zero_grad(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule", optimizer: Optimizer) -> None:
-        """Called after ``optimizer.step()`` and before ``optimizer.zero_grad()``."""
+        """Called before ``optimizer.zero_grad()``."""
         pass
Original file line number	Diff line number	Diff line change
`@@ -55,7 +55,7 @@ def run():`
`55`	`55`	`limit_val_batches=1,`
`56`	`56`	`num_sanity_val_steps=0,`
`57`	`57`	`max_epochs=1,`
`58`		`- weights_summary=None,`
	`58`	`+ enable_model_summary=False,`
`59`	`59`	`)`
`60`	`60`	`trainer.fit(model, train_dataloaders=train_data, val_dataloaders=val_data)`
`61`	`61`	`trainer.test(model, dataloaders=test_data)`
Original file line number	Diff line number	Diff line change
`@@ -272,7 +272,7 @@ def add_arguments_to_parser(self, parser):`
`272`	`272`	`parser.set_defaults(`
`273`	`273`	`{`
`274`	`274`	`"trainer.max_epochs": 15,`
`275`		`- "trainer.weights_summary": None,`
	`275`	`+ "trainer.enable_model_summary": False,`
`276`	`276`	`"trainer.num_sanity_val_steps": 0,`
`277`	`277`	`}`
`278`	`278`	`)`