Skip to content

Commit 89c190f

Browse files
authored
Merge branch 'master' into add/training_type
2 parents 6a7d9d4 + 28fc8d2 commit 89c190f

File tree

77 files changed

+732
-394
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

77 files changed

+732
-394
lines changed

CHANGELOG.md

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -175,9 +175,15 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
175175
- Enabled automatic parameters tying for TPUs ([#9525](https://github.com/PyTorchLightning/pytorch-lightning/pull/9525))
176176

177177

178+
- Raise a `MisconfigurationException` when trainer functions are called with `ckpt_path="best"` but `checkpoint_callback` isn't configured ([#9841](https://github.com/PyTorchLightning/pytorch-lightning/pull/9841))
179+
180+
178181
- Added support for `torch.autograd.set_detect_anomaly` through `Trainer` constructor argument `detect_anomaly` ([#9848](https://github.com/PyTorchLightning/pytorch-lightning/pull/9848))
179182

180183

184+
- Added `enable_model_summary` flag to Trainer ([#9699](https://github.com/PyTorchLightning/pytorch-lightning/pull/9699))
185+
186+
181187
- Added `strategy` argument to Trainer ([#8597](https://github.com/PyTorchLightning/pytorch-lightning/pull/8597))
182188

183189

@@ -260,6 +266,10 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
260266
- Changed `HorovodPlugin.all_gather` to return a `torch.Tensor` instead of a list ([#9696](https://github.com/PyTorchLightning/pytorch-lightning/pull/9696))
261267

262268

269+
- Changed Trainer connectors to be protected attributes:
270+
* Configuration Validator ([#9779](https://github.com/PyTorchLightning/pytorch-lightning/pull/9779))
271+
272+
263273
- Restore `current_epoch` and `global_step` irrespective of trainer task ([#9413](https://github.com/PyTorchLightning/pytorch-lightning/pull/9413))
264274

265275

@@ -272,11 +282,17 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
272282
- Update the logic to check for accumulation steps with deepspeed ([#9826](https://github.com/PyTorchLightning/pytorch-lightning/pull/9826))
273283

274284

285+
- Updated error message for interactive incompatible plugins ([#9896](https://github.com/PyTorchLightning/pytorch-lightning/pull/9896))
286+
287+
275288
### Deprecated
276289

277290
- Deprecated trainer argument `terminate_on_nan` in favour of `detect_anomaly`([#9175](https://github.com/PyTorchLightning/pytorch-lightning/pull/9175))
278291

279292

293+
- Deprecated `Trainer.terminate_on_nan` public attribute access ([#9849](https://github.com/PyTorchLightning/pytorch-lightning/pull/9849))
294+
295+
280296
- Deprecated `LightningModule.summarize()` in favor of `pytorch_lightning.utilities.model_summary.summarize()`
281297

282298

@@ -325,12 +341,18 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
325341
- Deprecated Accelerator collective API `barrier`, `broadcast`, and `all_gather`, call `TrainingTypePlugin` collective API directly ([#9677](https://github.com/PyTorchLightning/pytorch-lightning/pull/9677))
326342

327343

344+
- Deprecated `checkpoint_callback` from the `Trainer` constructor in favour of `enable_checkpointing` ([#9754](https://github.com/PyTorchLightning/pytorch-lightning/pull/9754))
345+
346+
328347
- Deprecated the `LightningModule.on_post_move_to_device` method ([#9525](https://github.com/PyTorchLightning/pytorch-lightning/pull/9525))
329348

330349

331350
- Deprecated `pytorch_lightning.core.decorators.parameter_validation` in favor of `pytorch_lightning.utilities.parameter_tying.set_shared_parameters` ([#9525](https://github.com/PyTorchLightning/pytorch-lightning/pull/9525))
332351

333352

353+
- Deprecated passing `weights_summary` to the `Trainer` constructor in favor of adding the `ModelSummary` callback with `max_depth` directly to the list of callbacks ([#9699](https://github.com/PyTorchLightning/pytorch-lightning/pull/9699))
354+
355+
334356
### Removed
335357

336358
- Removed deprecated `metrics` ([#8586](https://github.com/PyTorchLightning/pytorch-lightning/pull/8586/))
@@ -435,6 +457,12 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
435457
- Removed a redundant warning with `ModelCheckpoint(monitor=None)` callback ([#9875](https://github.com/PyTorchLightning/pytorch-lightning/pull/9875))
436458

437459

460+
- Remove `epoch` from `trainer.logged_metrics` ([#9904](https://github.com/PyTorchLightning/pytorch-lightning/pull/9904))
461+
462+
463+
- Removed `should_rank_save_checkpoint` property from Trainer ([#9433](https://github.com/PyTorchLightning/pytorch-lightning/pull/9433))
464+
465+
438466
### Fixed
439467

440468

@@ -459,6 +487,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
459487
- Fixed `BasePredictionWriter` not returning the batch_indices in a non-distributed setting ([#9432](https://github.com/PyTorchLightning/pytorch-lightning/pull/9432))
460488

461489

490+
- Fixed an error when running on in XLA environments with no TPU attached ([#9572](https://github.com/PyTorchLightning/pytorch-lightning/pull/9572))
491+
492+
462493
- Fixed check on torchmetrics logged whose `compute()` output is a multielement tensor ([#9582](https://github.com/PyTorchLightning/pytorch-lightning/pull/9582))
463494

464495

@@ -485,6 +516,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
485516
- Fixed missing arguments when saving hyperparameters from the parent class but not from the child class ([#9800](https://github.com/PyTorchLightning/pytorch-lightning/pull/9800))
486517

487518

519+
- Fixed DeepSpeed GPU device IDs ([#9847](https://github.com/PyTorchLightning/pytorch-lightning/pull/9847))
520+
521+
488522
- Reset `val_dataloader` in `tuner/batch_size_scaling` ([#9857](https://github.com/PyTorchLightning/pytorch-lightning/pull/9857))
489523

490524

benchmarks/test_basic_parity.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -159,7 +159,7 @@ def lightning_loop(cls_model, idx, device_type: str = "cuda", num_epochs=10):
159159
# as the first run is skipped, no need to run it long
160160
max_epochs=num_epochs if idx > 0 else 1,
161161
enable_progress_bar=False,
162-
weights_summary=None,
162+
enable_model_summary=False,
163163
gpus=1 if device_type == "cuda" else 0,
164164
checkpoint_callback=False,
165165
logger=False,

docs/source/advanced/multi_gpu.rst

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -611,28 +611,34 @@ Let's say you have a batch size of 7 in your dataloader.
611611
def train_dataloader(self):
612612
return Dataset(..., batch_size=7)
613613

614-
In DDP or Horovod your effective batch size will be 7 * gpus * num_nodes.
614+
In DDP, DDP_SPAWN, Deepspeed, DDP_SHARDED, or Horovod your effective batch size will be 7 * gpus * num_nodes.
615615

616616
.. code-block:: python
617617
618618
# effective batch size = 7 * 8
619619
Trainer(gpus=8, accelerator="ddp")
620+
Trainer(gpus=8, accelerator="ddp_spawn")
621+
Trainer(gpus=8, accelerator="ddp_sharded")
620622
Trainer(gpus=8, accelerator="horovod")
621623
622624
# effective batch size = 7 * 8 * 10
623625
Trainer(gpus=8, num_nodes=10, accelerator="ddp")
626+
Trainer(gpus=8, num_nodes=10, accelerator="ddp_spawn")
627+
Trainer(gpus=8, num_nodes=10, accelerator="ddp_sharded")
624628
Trainer(gpus=8, num_nodes=10, accelerator="horovod")
625629
626-
In DDP2, your effective batch size will be 7 * num_nodes.
630+
In DDP2 or DP, your effective batch size will be 7 * num_nodes.
627631
The reason is that the full batch is visible to all GPUs on the node when using DDP2.
628632

629633
.. code-block:: python
630634
631635
# effective batch size = 7
632636
Trainer(gpus=8, accelerator="ddp2")
637+
Trainer(gpus=8, accelerator="dp")
633638
634639
# effective batch size = 7 * 10
635640
Trainer(gpus=8, num_nodes=10, accelerator="ddp2")
641+
Trainer(gpus=8, accelerator="dp")
636642
637643
638644
.. note:: Huge batch sizes are actually really bad for convergence. Check out:

docs/source/common/debugging.rst

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -95,11 +95,14 @@ Print a summary of your LightningModule
9595
---------------------------------------
9696
Whenever the ``.fit()`` function gets called, the Trainer will print the weights summary for the LightningModule.
9797
By default it only prints the top-level modules. If you want to show all submodules in your network, use the
98-
`'full'` option:
98+
``max_depth`` option:
9999

100100
.. testcode::
101101

102-
trainer = Trainer(weights_summary="full")
102+
from pytorch_lightning.callbacks import ModelSummary
103+
104+
trainer = Trainer(callbacks=[ModelSummary(max_depth=-1)])
105+
103106

104107
You can also display the intermediate input- and output sizes of all your layers by setting the
105108
``example_input_array`` attribute in your LightningModule. It will print a table like this
@@ -115,8 +118,9 @@ You can also display the intermediate input- and output sizes of all your layers
115118
when you call ``.fit()`` on the Trainer. This can help you find bugs in the composition of your layers.
116119

117120
See Also:
118-
- :paramref:`~pytorch_lightning.trainer.trainer.Trainer.weights_summary` Trainer argument
119-
- :class:`~pytorch_lightning.core.memory.ModelSummary`
121+
- :class:`~pytorch_lightning.callbacks.model_summary.ModelSummary`
122+
- :func:`~pytorch_lightning.utilities.model_summary.summarize`
123+
- :class:`~pytorch_lightning.utilities.model_summary.ModelSummary`
120124

121125
----------------
122126

docs/source/common/hyperparameters.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -201,7 +201,7 @@ To recap, add ALL possible trainer flags to the argparser and init the ``Trainer
201201
trainer = Trainer.from_argparse_args(hparams)
202202
203203
# or if you need to pass in callbacks
204-
trainer = Trainer.from_argparse_args(hparams, checkpoint_callback=..., callbacks=[...])
204+
trainer = Trainer.from_argparse_args(hparams, enable_checkpointing=..., callbacks=[...])
205205
206206
----------
207207

docs/source/common/trainer.rst

Lines changed: 60 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -528,6 +528,38 @@ Example::
528528
checkpoint_callback
529529
^^^^^^^^^^^^^^^^^^^
530530

531+
Deprecated: This has been deprecated in v1.5 and will be removed in v.17. Please use ``enable_checkpointing`` instead.
532+
533+
default_root_dir
534+
^^^^^^^^^^^^^^^^
535+
536+
.. raw:: html
537+
538+
<video width="50%" max-width="400px" controls
539+
poster="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/trainer_flags/thumb/default%E2%80%A8_root_dir.jpg"
540+
src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/trainer_flags/default_root_dir.mp4"></video>
541+
542+
|
543+
544+
Default path for logs and weights when no logger or
545+
:class:`pytorch_lightning.callbacks.ModelCheckpoint` callback passed. On
546+
certain clusters you might want to separate where logs and checkpoints are
547+
stored. If you don't then use this argument for convenience. Paths can be local
548+
paths or remote paths such as `s3://bucket/path` or 'hdfs://path/'. Credentials
549+
will need to be set up to use remote filepaths.
550+
551+
.. testcode::
552+
553+
# default used by the Trainer
554+
trainer = Trainer(default_root_dir=os.getcwd())
555+
556+
distributed_backend
557+
^^^^^^^^^^^^^^^^^^^
558+
Deprecated: This has been renamed ``accelerator``.
559+
560+
enable_checkpointing
561+
^^^^^^^^^^^^^^^^^^^^
562+
531563
.. raw:: html
532564

533565
<video width="50%" max-width="400px" controls
@@ -542,11 +574,11 @@ To disable automatic checkpointing, set this to `False`.
542574

543575
.. code-block:: python
544576
545-
# default used by Trainer
546-
trainer = Trainer(checkpoint_callback=True)
577+
# default used by Trainer, saves the most recent model to a single checkpoint after each epoch
578+
trainer = Trainer(enable_checkpointing=True)
547579
548580
# turn off automatic checkpointing
549-
trainer = Trainer(checkpoint_callback=False)
581+
trainer = Trainer(enable_checkpointing=False)
550582
551583
552584
You can override the default behavior by initializing the :class:`~pytorch_lightning.callbacks.ModelCheckpoint`
@@ -563,38 +595,6 @@ See :doc:`Saving and Loading Weights <../common/weights_loading>` for how to cus
563595
# Add your callback to the callbacks list
564596
trainer = Trainer(callbacks=[checkpoint_callback])
565597

566-
567-
.. warning:: Passing a ModelCheckpoint instance to this argument is deprecated since
568-
v1.1 and will be unsupported from v1.3. Use `callbacks` argument instead.
569-
570-
571-
default_root_dir
572-
^^^^^^^^^^^^^^^^
573-
574-
.. raw:: html
575-
576-
<video width="50%" max-width="400px" controls
577-
poster="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/trainer_flags/thumb/default%E2%80%A8_root_dir.jpg"
578-
src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/pl_docs/trainer_flags/default_root_dir.mp4"></video>
579-
580-
|
581-
582-
Default path for logs and weights when no logger or
583-
:class:`pytorch_lightning.callbacks.ModelCheckpoint` callback passed. On
584-
certain clusters you might want to separate where logs and checkpoints are
585-
stored. If you don't then use this argument for convenience. Paths can be local
586-
paths or remote paths such as `s3://bucket/path` or 'hdfs://path/'. Credentials
587-
will need to be set up to use remote filepaths.
588-
589-
.. testcode::
590-
591-
# default used by the Trainer
592-
trainer = Trainer(default_root_dir=os.getcwd())
593-
594-
distributed_backend
595-
^^^^^^^^^^^^^^^^^^^
596-
Deprecated: This has been renamed ``accelerator``.
597-
598598
fast_dev_run
599599
^^^^^^^^^^^^
600600

@@ -1589,6 +1589,11 @@ Example::
15891589
weights_summary
15901590
^^^^^^^^^^^^^^^
15911591

1592+
.. warning:: `weights_summary` is deprecated in v1.5 and will be removed in v1.7. Please pass :class:`~pytorch_lightning.callbacks.model_summary.ModelSummary`
1593+
directly to the Trainer's ``callbacks`` argument instead. To disable the model summary,
1594+
pass ``enable_model_summary = False`` to the Trainer.
1595+
1596+
15921597
.. raw:: html
15931598

15941599
<video width="50%" max-width="400px" controls
@@ -1611,6 +1616,25 @@ Options: 'full', 'top', None.
16111616
# don't print a summary
16121617
trainer = Trainer(weights_summary=None)
16131618

1619+
1620+
enable_model_summary
1621+
^^^^^^^^^^^^^^^^^^^^
1622+
1623+
Whether to enable or disable the model summarization. Defaults to True.
1624+
1625+
.. testcode::
1626+
1627+
# default used by the Trainer
1628+
trainer = Trainer(enable_model_summary=True)
1629+
1630+
# disable summarization
1631+
trainer = Trainer(enable_model_summary=False)
1632+
1633+
# enable custom summarization
1634+
from pytorch_lightning.callbacks import ModelSummary
1635+
1636+
trainer = Trainer(enable_model_summary=True, callbacks=[ModelSummary(max_depth=-1)])
1637+
16141638
-----
16151639

16161640
Trainer class API
@@ -1706,7 +1730,7 @@ The metrics sent to the logger (visualizer).
17061730
.. code-block:: python
17071731
17081732
def training_step(self, batch, batch_idx):
1709-
self.log("a_val", 2, log=True)
1733+
self.log("a_val", 2, logger=True)
17101734
17111735
17121736
logged_metrics = trainer.logged_metrics

docs/source/guides/data.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -227,8 +227,8 @@ needs to wrap the DataLoaders with `CombinedLoader`.
227227

228228

229229
def val_dataloader(self):
230-
loader_1 = DataLoader()
231-
loader_2 = DataLoader()
230+
loader_a = DataLoader()
231+
loader_b = DataLoader()
232232
loaders = {"a": loader_a, "b": loader_b}
233233
combined_loaders = CombinedLoader(loaders, "max_size_cycle")
234234
return combined_loaders

pl_examples/bug_report_model.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ def run():
5555
limit_val_batches=1,
5656
num_sanity_val_steps=0,
5757
max_epochs=1,
58-
weights_summary=None,
58+
enable_model_summary=False,
5959
)
6060
trainer.fit(model, train_dataloaders=train_data, val_dataloaders=val_data)
6161
trainer.test(model, dataloaders=test_data)

pl_examples/domain_templates/computer_vision_fine_tuning.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -272,7 +272,7 @@ def add_arguments_to_parser(self, parser):
272272
parser.set_defaults(
273273
{
274274
"trainer.max_epochs": 15,
275-
"trainer.weights_summary": None,
275+
"trainer.enable_model_summary": False,
276276
"trainer.num_sanity_val_steps": 0,
277277
}
278278
)

pytorch_lightning/callbacks/base.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -327,5 +327,5 @@ def on_before_optimizer_step(
327327
pass
328328

329329
def on_before_zero_grad(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule", optimizer: Optimizer) -> None:
330-
"""Called after ``optimizer.step()`` and before ``optimizer.zero_grad()``."""
330+
"""Called before ``optimizer.zero_grad()``."""
331331
pass

0 commit comments

Comments
 (0)