You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue follows @ananthsub 's #11756 to move strategy-specific dataloader logic to the stategies. During the discussion, we noticed the existing strategy.process_dataloader().
strategy.process_dataloader() aims to provide additional strategy-specific data loading logic change.
However, dataloader has been initialized in _prepare_dataloader() in data_connector.py, which is way before any loop starts.
Therefore, we could refactor the call site of strategy.process_dataloader() by moving it to _prepare_dataloader() in data_connector.py.
we could update dataloader before loop starts reading data.
advance() was called multiple times within a loop, but strategy specific dataloader update should only need once. We could save unnecessary function calls.
Motivation
Currently, strategy.process_dataloader() are called in advance() functions in fit_loop, eval_loop, prediction_loop.
However, dataloader has been initialized in _prepare_dataloader() in data_connector.py, which is way before any loop starts.
Therefore, we could refactor the call site of strategy.process_dataloader() by moving it to _prepare_dataloader() in data_connector.py.
Pitch
We could refactor the call site of strategy.process_dataloader() by moving it to _prepare_dataloader() in data_connector.py.
If you enjoy Lightning, check out our other projects! ⚡
Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.
Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.
Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.
@awaelchli@carmocca@kaushikb11 this would make #11756 easier to do for 1.7 by moving the placement of where the strategy processes the dataloader. by doing this before 1.6, it ensures the distributed-specific dataloading logic can be moved to the strategies without changing the overall control flow.
AFAICT, TPU spawn is the only strategy which implements this function. now that spawning is called earlier, calling TPUSpawnStrategy.process_dataloader should be safe to do when the dataloader is first initialized.
Uh oh!
There was an error while loading. Please reload this page.
Proposed refactor
This issue follows @ananthsub 's #11756 to move strategy-specific dataloader logic to the stategies. During the discussion, we noticed the existing
strategy.process_dataloader()
.strategy.process_dataloader()
aims to provide additional strategy-specific data loading logic change.https://github.com/PyTorchLightning/pytorch-lightning/blob/cf64f3443474a93d23b5afb0417e4a60298006e6/pytorch_lightning/strategies/strategy.py#L372-L378
Currently,
strategy.process_dataloader()
are called inadvance()
functions infit_loop
,eval_loop
,prediction_loop
.For example,
https://github.com/PyTorchLightning/pytorch-lightning/blob/6309a59c3cf93e0bfc352efb7cbf6c50b4544372/pytorch_lightning/loops/fit_loop.py#L275
However, dataloader has been initialized in
_prepare_dataloader()
in data_connector.py, which is way before any loop starts.Therefore, we could refactor the call site of
strategy.process_dataloader()
by moving it to_prepare_dataloader()
in data_connector.py.https://github.com/PyTorchLightning/pytorch-lightning/blob/a52a6ea0301abf93a288c27ff6297ec36ca7630d/pytorch_lightning/trainer/connectors/data_connector.py#L278-L285
By doing so, it has two benefits:
advance()
was called multiple times within a loop, but strategy specific dataloader update should only need once. We could save unnecessary function calls.Motivation
Currently,
strategy.process_dataloader()
are called inadvance()
functions infit_loop
,eval_loop
,prediction_loop
.However, dataloader has been initialized in
_prepare_dataloader()
in data_connector.py, which is way before any loop starts.Therefore, we could refactor the call site of
strategy.process_dataloader()
by moving it to_prepare_dataloader()
in data_connector.py.Pitch
We could refactor the call site of
strategy.process_dataloader()
by moving it to_prepare_dataloader()
in data_connector.py.Additional context
cc: @edward-io @four4fish @ananthsub
If you enjoy Lightning, check out our other projects! ⚡
Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.
Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.
Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.
cc @justusschock @awaelchli @ninginthecloud
The text was updated successfully, but these errors were encountered: