Support DDP with uneven input sizes without data duplication #5060

alanhdu · 2020-12-10T16:00:30Z

🚀 Feature

Motivation

Right now, if you use the ddp accelerator PyTorch Lightning automatically inserts the torch.utils.data.DistributedSampler which shards a dataset across multiple processes. This has not-obvious behavior when len(dataset) % num_replicas != 0 -- in this case, because items cannot be evenly sharded across batches, DistributedSampler duplicates certain samples to make things evenly distributed.

That means the # of samples that run through the network is actually dependent on num_replicas! IMO, this makes things quite confusing for validation datasets, where you can get different results depending on how you trained the model.

In our particular use-case, we sometimes train models with a single-digit # of validation samples (see #5047), so any duplication makes a big difference in the validation score.

Pitch

Ideally, PyTorch Lightning would not have this "padding" behavior and instead handle uneven DDP samples correctly. Based off of pytorch/pytorch#38174, it looks like this behavior is now supported in upstream PyTorch with a new API.

Alternatives

Another alternative that might "solve" the problem is to add an option to make the validation and/or test steps single-process only (e.g. only run it on a single process and have other processes stall). This would avoid this divisibility behavior, but has the obvious downside that we lose the DDP speed ups for those steps.

The text was updated successfully, but these errors were encountered:

carmocca · 2020-12-10T19:55:24Z

Duplicate of #3325

alanhdu · 2020-12-11T03:41:15Z

Yes, it's definitely a duplicate. Sorry about the noise.

alanhdu added feature Is an improvement or enhancement help wanted Open to be worked on labels Dec 10, 2020

carmocca marked this as a duplicate of #3325 Dec 10, 2020

carmocca added the duplicate This issue or pull request already exists label Dec 10, 2020

alanhdu closed this as completed Dec 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support DDP with uneven input sizes without data duplication #5060

Support DDP with uneven input sizes without data duplication #5060

alanhdu commented Dec 10, 2020

carmocca commented Dec 10, 2020

Uh oh!

alanhdu commented Dec 11, 2020

Uh oh!

Support DDP with uneven input sizes without data duplication #5060

Support DDP with uneven input sizes without data duplication #5060

Comments

alanhdu commented Dec 10, 2020

🚀 Feature

Motivation

Pitch

Alternatives

carmocca commented Dec 10, 2020

Uh oh!

alanhdu commented Dec 11, 2020

Uh oh!