Refactor the call site of `strategy.process_dataloader()` from loop to data_connector #12213

ninginthecloud · 2022-03-03T22:00:48Z

Proposed refactor

This issue follows @ananthsub 's #11756 to move strategy-specific dataloader logic to the stategies. During the discussion, we noticed the existing strategy.process_dataloader().

strategy.process_dataloader() aims to provide additional strategy-specific data loading logic change.

https://github.com/PyTorchLightning/pytorch-lightning/blob/cf64f3443474a93d23b5afb0417e4a60298006e6/pytorch_lightning/strategies/strategy.py#L372-L378

Currently, strategy.process_dataloader() are called in advance() functions in fit_loop, eval_loop, prediction_loop.
For example,
https://github.com/PyTorchLightning/pytorch-lightning/blob/6309a59c3cf93e0bfc352efb7cbf6c50b4544372/pytorch_lightning/loops/fit_loop.py#L275

However, dataloader has been initialized in _prepare_dataloader() in data_connector.py, which is way before any loop starts.
Therefore, we could refactor the call site of strategy.process_dataloader() by moving it to _prepare_dataloader() in data_connector.py.

https://github.com/PyTorchLightning/pytorch-lightning/blob/a52a6ea0301abf93a288c27ff6297ec36ca7630d/pytorch_lightning/trainer/connectors/data_connector.py#L278-L285

By doing so, it has two benefits:

we could update dataloader before loop starts reading data.
advance() was called multiple times within a loop, but strategy specific dataloader update should only need once. We could save unnecessary function calls.

Motivation

Currently, strategy.process_dataloader() are called in advance() functions in fit_loop, eval_loop, prediction_loop.
However, dataloader has been initialized in _prepare_dataloader() in data_connector.py, which is way before any loop starts.
Therefore, we could refactor the call site of strategy.process_dataloader() by moving it to _prepare_dataloader() in data_connector.py.

Pitch

We could refactor the call site of strategy.process_dataloader() by moving it to _prepare_dataloader() in data_connector.py.

Additional context

cc: @edward-io @four4fish @ananthsub

If you enjoy Lightning, check out our other projects! ⚡

Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.
Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.
Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

cc @justusschock @awaelchli @ninginthecloud

The text was updated successfully, but these errors were encountered:

ananthsub · 2022-03-03T22:43:53Z

@awaelchli @carmocca @kaushikb11 this would make #11756 easier to do for 1.7 by moving the placement of where the strategy processes the dataloader. by doing this before 1.6, it ensures the distributed-specific dataloading logic can be moved to the strategies without changing the overall control flow.

AFAICT, TPU spawn is the only strategy which implements this function. now that spawning is called earlier, calling TPUSpawnStrategy.process_dataloader should be safe to do when the dataloader is first initialized.

ninginthecloud self-assigned this Mar 3, 2022

ninginthecloud added strategy data handling Generic data-related topic labels Mar 3, 2022

ninginthecloud added this to the 1.6 milestone Mar 3, 2022

ninginthecloud mentioned this issue Mar 7, 2022

Call Strategy.process_dataloader in data_connector.py #12251

Merged

12 tasks

ananthsub closed this as completed in #12251 Mar 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor the call site of `strategy.process_dataloader()` from loop to data_connector #12213

Refactor the call site of `strategy.process_dataloader()` from loop to data_connector #12213

ninginthecloud commented Mar 3, 2022 •

edited

Loading

ananthsub commented Mar 3, 2022 •

edited

Loading

Uh oh!

Refactor the call site of strategy.process_dataloader() from loop to data_connector #12213

Refactor the call site of strategy.process_dataloader() from loop to data_connector #12213

Comments

ninginthecloud commented Mar 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed refactor

Motivation

Pitch

Additional context

If you enjoy Lightning, check out our other projects! ⚡

ananthsub commented Mar 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Refactor the call site of `strategy.process_dataloader()` from loop to data_connector #12213

Refactor the call site of `strategy.process_dataloader()` from loop to data_connector #12213

ninginthecloud commented Mar 3, 2022 •

edited

Loading

ananthsub commented Mar 3, 2022 •

edited

Loading