Refactor distributed_sampler update in `data_connector.py` for distributed strategy #12217

ninginthecloud · 2022-03-03T23:29:39Z

Proposed refactor

This issue follows @ananthsub 's #11756 to move strategy-specific dataloader logic to the stategies.

Motivation

_prepare_dataloader() function in data_connector.py handles the logic whether the dataloader needs to update the distributed sampler when user is using distributed strategy (see the following code link)
https://github.com/PyTorchLightning/pytorch-lightning/blob/cc43d07db1ab77385feff04c01f040c5cad805a9/pytorch_lightning/trainer/connectors/data_connector.py#L360-L372

However, modifying sampler in dataloader can be pushed directly to strategy class. Similar to issue #12216 , we can move the logic to Strategy base class. For any strategy classes have is_distributed=True flag, they can verify if the dataloader need to update the sampler.

Pitch

We can move the logic to Strategy base class. For any strategy classes have is_distributed=True flag, they can verify if the dataloader need to update the sampler.

Additional context

cc: @edward-io @four4fish @ananthsub

If you enjoy Lightning, check out our other projects! ⚡

Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.
Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.
Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

cc @justusschock @awaelchli @rohitgr7 @ninginthecloud @otaj @tchaton @akihironitta @carmocca

The text was updated successfully, but these errors were encountered:

edward-io self-assigned this Mar 4, 2022

ninginthecloud added this to the 1.7 milestone Mar 7, 2022

ninginthecloud added data handling Generic data-related topic fault tolerance strategy labels Mar 7, 2022

carmocca unassigned edward-io Jul 19, 2022

carmocca added refactor priority: 2 Low priority task and removed fault tolerance labels Jul 19, 2022

carmocca modified the milestones: pl:1.7, pl:future Jul 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor distributed_sampler update in `data_connector.py` for distributed strategy #12217

Refactor distributed_sampler update in `data_connector.py` for distributed strategy #12217

ninginthecloud commented Mar 3, 2022 •

edited by github-actions bot

Loading

Refactor distributed_sampler update in data_connector.py for distributed strategy #12217

Refactor distributed_sampler update in data_connector.py for distributed strategy #12217

Comments

ninginthecloud commented Mar 3, 2022 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed refactor

Motivation

Pitch

Additional context

If you enjoy Lightning, check out our other projects! ⚡

Refactor distributed_sampler update in `data_connector.py` for distributed strategy #12217

Refactor distributed_sampler update in `data_connector.py` for distributed strategy #12217

ninginthecloud commented Mar 3, 2022 •

edited by github-actions bot

Loading