Deprecate LightningDistributed and keep broadcast in ddp/ddpSpawn directly #9692

four4fish · 2021-09-24T19:13:51Z

Proposed refactoring or deprecation

LightningDistributed() class only used by ddp and ddpSpawn, and only have one broadcast function for torch collectives. It's unnecessary to have.
Also, we have to set rank and device in set up steps. If subclass extend DDP or DDPSpawn, and overridden function where LightningDistributed.rank and device setted, could cause silent failures.

Now the src is not respected in torch broadcast

Motivation

Simplify the code structure and reduce the possibilities for silent failure

Pitch

Deprecate LightningDistributed. There is only one function

    def broadcast(self, obj: Any, group=_group.WORLD):
        # always wrap into a list so it can be broadcasted.
        obj = [obj]

        if self.rank != 0:
            obj = [None] * len(obj)

        broadcast_object_list(obj, 0, group=group or _group.WORLD)

        return obj[0]

Move to ddp and ddpSpawn

   def broadcast(self, obj: object, src: int = 0) -> object:
        if not distributed_available():
            raise RuntimeError("DDP is not initialized and torch.distributed is not available, can not broadcast object")
        obj = [obj]
        if self.global_rank != 0:
            obj = [None] * len(obj)
        broadcast_object_list(obj, src, group=_group.WORLD)
        return obj[0]

Additional context

Related to #7534

If you enjoy Lightning, check out our other projects! ⚡

_{Metrics: Machine learning metrics for distributed, scalable PyTorch applications.

Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, finetuning and solving problems with deep learning

Bolts: Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

Lightning Transformers: Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.}

The text was updated successfully, but these errors were encountered:

awaelchli · 2021-09-24T21:10:06Z

The chosen sprint is in the past, updating it.

awaelchli · 2021-09-24T21:13:27Z

I'm ok with this change. It is reasonable now that the majority of methods have disappeared from LightningDistributed, and after our acclerator rework, there is no longer a need for this standalone class. And it will be replaced by the collective plugin in that sense.

four4fish added refactor distributed Generic distributed-related topic deprecation Includes a deprecation feature Is an improvement or enhancement labels Sep 24, 2021

four4fish mentioned this issue Sep 24, 2021

Deprecate LightningDistributed and keep logic in ddp/ddpSpawn directly #9691

Merged

12 tasks

ananthsub closed this as completed in #9691 Sep 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Deprecate LightningDistributed and keep broadcast in ddp/ddpSpawn directly #9692

Deprecate LightningDistributed and keep broadcast in ddp/ddpSpawn directly #9692

four4fish commented Sep 24, 2021 •

edited by ananthsub

Loading

awaelchli commented Sep 24, 2021

Uh oh!

awaelchli commented Sep 24, 2021 •

edited

Loading

Uh oh!

Deprecate LightningDistributed and keep broadcast in ddp/ddpSpawn directly #9692

Deprecate LightningDistributed and keep broadcast in ddp/ddpSpawn directly #9692

Comments

four4fish commented Sep 24, 2021 • edited by ananthsub Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed refactoring or deprecation

Motivation

Pitch

Additional context

If you enjoy Lightning, check out our other projects! ⚡

awaelchli commented Sep 24, 2021

Uh oh!

awaelchli commented Sep 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

four4fish commented Sep 24, 2021 •

edited by ananthsub

Loading

awaelchli commented Sep 24, 2021 •

edited

Loading