Skip to content

Commit 4129e9e

Browse files
committed
Merge branch 'master' into refactor/is_slurm_managing_tasks
2 parents 2e005a8 + c15b84d commit 4129e9e

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+297
-311
lines changed

.github/workflows/ci_test-base.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ jobs:
1919
# this will install stable torch
2020
python-version: [3.9]
2121

22-
# Timeout: https://stackoverflow.com/a/59076067/4521646
22+
# lower timeout as this should run very quickly
2323
timeout-minutes: 20
2424
steps:
2525
- uses: actions/checkout@v2

.github/workflows/ci_test-conda.yml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,6 @@ jobs:
1717
python-version: ["3.8"] # previous to last Python version as that one is already used in test-full
1818
pytorch-version: ["1.7", "1.8", "1.9", "1.10"] # nightly: add when there's a release candidate
1919

20-
# Timeout: https://stackoverflow.com/a/59076067/4521646
2120
timeout-minutes: 35
2221
steps:
2322
- uses: actions/checkout@v2

.github/workflows/ci_test-full.yml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,8 +29,6 @@ jobs:
2929
# nightly: add when there's a release candidate
3030
#- {os: ubuntu-20.04, python-version: "3.10", requires: "latest", release: "pre"}
3131

32-
# Timeout: https://stackoverflow.com/a/59076067/4521646
33-
# TODO: the macOS is taking too long, probably caching did not work...
3432
timeout-minutes: 40
3533

3634
steps:

.github/workflows/probot-auto-cc.yml

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,16 +2,14 @@ name: Probot
22

33
on:
44
issues:
5-
types:
6-
- labeled
5+
types: [labeled]
76
pull_request:
8-
types:
9-
- labeled
7+
types: [labeled, ready_for_review]
108

119
jobs:
1210
auto-cc:
13-
if: ${{ github.repository_owner == 'PyTorchLightning' }}
1411
runs-on: ubuntu-latest
12+
if: github.event_name == 'issue' || github.event.pull_request.draft == false
1513
steps:
1614
- uses: carmocca/probot@v1
1715
env:

CHANGELOG.md

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
3131
- The `monitor` argument in the `EarlyStopping` callback is no longer optional ([#10328](https://github.com/PyTorchLightning/pytorch-lightning/pull/10328))
3232

3333

34+
- Do not fail if batch size could not be inferred for logging when using DeepSpeed ([#10438](https://github.com/PyTorchLightning/pytorch-lightning/issues/10438))
35+
36+
3437
- Raise `MisconfigurationException` when `enable_progress_bar=False` and a progress bar instance has been passed in the callback list ([#10520](https://github.com/PyTorchLightning/pytorch-lightning/issues/10520))
3538

3639

@@ -133,6 +136,9 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
133136
- Removed deprecated `CheckpointConnector.hpc_load` property in favor of `CheckpointConnector.restore` ([#10525](https://github.com/PyTorchLightning/pytorch-lightning/pull/10525))
134137

135138

139+
- Removed deprecated `reload_dataloaders_every_epoch` from `Trainer` in favour of `reload_dataloaders_every_n_epochs` ([#10481](https://github.com/PyTorchLightning/pytorch-lightning/pull/10481))
140+
141+
136142

137143
### Fixed
138144

@@ -142,7 +148,7 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
142148
- Fixed `CombinedLoader` and `max_size_cycle` didn't receive a `DistributedSampler` ([#10374](https://github.com/PyTorchLightning/pytorch-lightning/issues/10374))
143149

144150

145-
- Fixed `to_torchscript()` causing false positive deprecation warnings ([#10470](https://github.com/PyTorchLightning/pytorch-lightning/issues/10470))
151+
- Fixed scripting causing false positive deprecation warnings ([#10470](https://github.com/PyTorchLightning/pytorch-lightning/pull/10470), [#10555](https://github.com/PyTorchLightning/pytorch-lightning/pull/10555))
146152

147153

148154
- Fixed `isinstance` not working with `init_meta_context`, materialized model not being moved to the device ([#10493](https://github.com/PyTorchLightning/metrics/pull/10493))
@@ -157,7 +163,7 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
157163
- Fixed sampler replacement logic with `overfit_batches` to only replace the sample when `SequentialSampler` is not used ([#10486](https://github.com/PyTorchLightning/pytorch-lightning/issues/10486))
158164

159165

160-
-
166+
- Fixed propagation of device and dtype information to submodules of LightningLite when they inherit from `DeviceDtypeModuleMixin` ([#10559](https://github.com/PyTorchLightning/pytorch-lightning/issues/10559))
161167

162168

163169
-

docs/source/_templates/layout.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
{% block footer %}
55
{{ super() }}
66
<script script type="text/javascript">
7-
var collapsedSections = ['Best practices', 'Lightning API', 'Optional extensions', 'Tutorials', 'API References', 'Bolts', 'Examples', 'Partner Domain Frameworks', 'Community'];
7+
var collapsedSections = ['Best practices', 'Optional extensions', 'Tutorials', 'API References', 'Bolts', 'Examples', 'Partner Domain Frameworks', 'Community'];
88
</script>
99

1010
{% endblock %}

pl_examples/loop_examples/kfold.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -205,7 +205,7 @@ def on_run_end(self) -> None:
205205
voting_model = EnsembleVotingModel(type(self.trainer.lightning_module), checkpoint_paths)
206206
voting_model.trainer = self.trainer
207207
# This requires to connect the new model and move it the right device.
208-
self.trainer.accelerator.connect(voting_model)
208+
self.trainer.training_type_plugin.connect(voting_model)
209209
self.trainer.training_type_plugin.model_to_device()
210210
self.trainer.test_loop.run()
211211

pl_examples/loop_examples/yielding_training_step.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,7 @@ def _training_step(self, generator):
8686
# Here, instead of calling `lightning_module.training_step()`
8787
# we call next() on the generator!
8888
training_step_output = next(generator)
89-
self.trainer.accelerator.post_training_step()
89+
self.trainer.training_type_plugin.post_training_step()
9090

9191
training_step_output = self.trainer.call_hook("training_step_end", training_step_output)
9292

pytorch_lightning/__init__.py

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
"""Root package info."""
22

33
import logging
4-
import os
54

65
from pytorch_lightning.__about__ import * # noqa: F401, F403
76

@@ -14,9 +13,6 @@
1413
_logger.addHandler(logging.StreamHandler())
1514
_logger.propagate = False
1615

17-
_PACKAGE_ROOT = os.path.dirname(__file__)
18-
_PROJECT_ROOT = os.path.dirname(_PACKAGE_ROOT)
19-
2016
from pytorch_lightning.callbacks import Callback # noqa: E402
2117
from pytorch_lightning.core import LightningDataModule, LightningModule # noqa: E402
2218
from pytorch_lightning.trainer import Trainer # noqa: E402

pytorch_lightning/lite/wrappers.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
from torch.utils.data import DataLoader
2525

2626
from pytorch_lightning.accelerators import Accelerator
27+
from pytorch_lightning.core.mixins import DeviceDtypeModuleMixin
2728
from pytorch_lightning.plugins import PrecisionPlugin
2829
from pytorch_lightning.utilities.apply_func import apply_to_collection, move_data_to_device
2930

@@ -64,7 +65,7 @@ def step(self, closure: Optional[Callable] = None) -> None:
6465
)
6566

6667

67-
class _LiteModule(nn.Module):
68+
class _LiteModule(DeviceDtypeModuleMixin):
6869
def __init__(self, module: nn.Module, precision_plugin: PrecisionPlugin) -> None:
6970
"""The LiteModule is a thin wrapper around the :class:`torch.nn.Module` and handles precision / autocast
7071
automatically for the forward pass.

pytorch_lightning/loggers/tensorboard.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -240,7 +240,9 @@ def log_graph(self, model: "pl.LightningModule", input_array=None):
240240

241241
if input_array is not None:
242242
input_array = model._apply_batch_transfer_handler(input_array)
243+
model._running_torchscript = True
243244
self.experiment.add_graph(model, input_array)
245+
model._running_torchscript = False
244246
else:
245247
rank_zero_warn(
246248
"Could not log computational graph since the"

pytorch_lightning/plugins/training_type/deepspeed.py

Lines changed: 13 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -618,11 +618,6 @@ def _format_batch_size_and_grad_accum_config(self):
618618
)
619619
self.config["gradient_accumulation_steps"] = self.lightning_module.trainer.accumulate_grad_batches
620620
if "train_micro_batch_size_per_gpu" not in self.config:
621-
rank_zero_warn(
622-
"Inferring the batch size for internal deepspeed logging from the `train_dataloader()`. "
623-
"If you require skipping this, please pass "
624-
"`Trainer(strategy=DeepSpeedPlugin(logging_batch_size_per_gpu=batch_size))`"
625-
)
626621
batch_size = self._auto_select_batch_size()
627622
self.config["train_micro_batch_size_per_gpu"] = batch_size
628623
if "gradient_clipping" not in self.config:
@@ -634,9 +629,19 @@ def _auto_select_batch_size(self):
634629
batch_size = 1
635630
train_dl_source = self.lightning_module.trainer._data_connector._train_dataloader_source
636631
if train_dl_source.is_defined():
637-
train_dataloader = train_dl_source.dataloader()
638-
if hasattr(train_dataloader, "batch_sampler"):
639-
batch_size = train_dataloader.batch_sampler.batch_size
632+
try:
633+
train_dataloader = train_dl_source.dataloader()
634+
if hasattr(train_dataloader, "batch_sampler"):
635+
batch_size = train_dataloader.batch_sampler.batch_size
636+
# broad exception on purpose as `source.dataloader()` will fail if the dataloader requires `setup`
637+
# to have been called before
638+
except Exception:
639+
if self.global_rank == 0:
640+
deepspeed.utils.logging.logger.warning(
641+
"Tried to infer the batch size for internal deepspeed logging from the `train_dataloader()`. "
642+
"To ensure DeepSpeed logging remains correct, please manually pass the plugin with the "
643+
"batch size, `Trainer(strategy=DeepSpeedPlugin(logging_batch_size_per_gpu=batch_size))`."
644+
)
640645
return batch_size
641646

642647
def _format_precision_config(self):

pytorch_lightning/plugins/training_type/ipu.py

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -237,21 +237,25 @@ def to_tensor(x):
237237
args = apply_to_collection(args, dtype=(int, float), function=to_tensor)
238238
return args
239239

240-
def training_step(self, *args, **kwargs):
240+
def _step(self, stage: RunningStage, *args: Any, **kwargs: Any):
241241
args = self._prepare_input(args)
242-
return self.poptorch_models[RunningStage.TRAINING](*args, **kwargs)
242+
poptorch_model = self.poptorch_models[stage]
243+
self.lightning_module._running_torchscript = True
244+
out = poptorch_model(*args, **kwargs)
245+
self.lightning_module._running_torchscript = False
246+
return out
247+
248+
def training_step(self, *args, **kwargs):
249+
return self._step(RunningStage.TRAINING, *args, **kwargs)
243250

244251
def validation_step(self, *args, **kwargs):
245-
args = self._prepare_input(args)
246-
return self.poptorch_models[RunningStage.VALIDATING](*args, **kwargs)
252+
return self._step(RunningStage.VALIDATING, *args, **kwargs)
247253

248254
def test_step(self, *args, **kwargs):
249-
args = self._prepare_input(args)
250-
return self.poptorch_models[RunningStage.TESTING](*args, **kwargs)
255+
return self._step(RunningStage.TESTING, *args, **kwargs)
251256

252257
def predict_step(self, *args, **kwargs):
253-
args = self._prepare_input(args)
254-
return self.poptorch_models[RunningStage.PREDICTING](*args, **kwargs)
258+
return self._step(RunningStage.PREDICTING, *args, **kwargs)
255259

256260
def teardown(self) -> None:
257261
# undo dataloader patching

pytorch_lightning/trainer/connectors/data_connector.py

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,6 @@ def on_trainer_init(
6464
self,
6565
check_val_every_n_epoch: int,
6666
reload_dataloaders_every_n_epochs: int,
67-
reload_dataloaders_every_epoch: bool,
6867
prepare_data_per_node: Optional[bool] = None,
6968
) -> None:
7069
self.trainer.datamodule = None
@@ -83,13 +82,6 @@ def on_trainer_init(
8382

8483
self.trainer.check_val_every_n_epoch = check_val_every_n_epoch
8584

86-
if reload_dataloaders_every_epoch:
87-
reload_dataloaders_every_n_epochs = int(reload_dataloaders_every_epoch)
88-
rank_zero_deprecation(
89-
"`reload_dataloaders_every_epoch` is deprecated in v1.4 and will be removed in v1.6."
90-
" Please use `reload_dataloaders_every_n_epochs` in Trainer."
91-
)
92-
9385
if not isinstance(reload_dataloaders_every_n_epochs, int) or (reload_dataloaders_every_n_epochs < 0):
9486
raise MisconfigurationException(
9587
f"`reload_dataloaders_every_n_epochs` should be an int >= 0, got {reload_dataloaders_every_n_epochs}."

pytorch_lightning/trainer/trainer.py

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -162,7 +162,6 @@ def __init__(
162162
benchmark: bool = False,
163163
deterministic: bool = False,
164164
reload_dataloaders_every_n_epochs: int = 0,
165-
reload_dataloaders_every_epoch: bool = False,
166165
auto_lr_find: Union[bool, str] = False,
167166
replace_sampler_ddp: bool = True,
168167
detect_anomaly: bool = False,
@@ -341,12 +340,6 @@ def __init__(
341340
342341
reload_dataloaders_every_n_epochs: Set to a non-negative integer to reload dataloaders every n epochs.
343342
344-
reload_dataloaders_every_epoch: Set to True to reload dataloaders every epoch.
345-
346-
.. deprecated:: v1.4
347-
``reload_dataloaders_every_epoch`` has been deprecated in v1.4 and will be removed in v1.6.
348-
Please use ``reload_dataloaders_every_n_epochs``.
349-
350343
replace_sampler_ddp: Explicitly enables or disables sampler replacement. If not specified this
351344
will toggled automatically when DDP is used. By default it will add ``shuffle=True`` for
352345
train sampler and ``shuffle=False`` for val/test sampler. If you want to customize it,
@@ -515,7 +508,6 @@ def __init__(
515508
self._data_connector.on_trainer_init(
516509
check_val_every_n_epoch,
517510
reload_dataloaders_every_n_epochs,
518-
reload_dataloaders_every_epoch,
519511
prepare_data_per_node,
520512
)
521513

0 commit comments

Comments
 (0)