Skip to content

Support DDP with uneven input sizes without data duplication #5060

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
alanhdu opened this issue Dec 10, 2020 · 2 comments
Closed

Support DDP with uneven input sizes without data duplication #5060

alanhdu opened this issue Dec 10, 2020 · 2 comments
Labels
duplicate This issue or pull request already exists feature Is an improvement or enhancement help wanted Open to be worked on

Comments

@alanhdu
Copy link
Contributor

alanhdu commented Dec 10, 2020

🚀 Feature

Motivation

Right now, if you use the ddp accelerator PyTorch Lightning automatically inserts the torch.utils.data.DistributedSampler which shards a dataset across multiple processes. This has not-obvious behavior when len(dataset) % num_replicas != 0 -- in this case, because items cannot be evenly sharded across batches, DistributedSampler duplicates certain samples to make things evenly distributed.

That means the # of samples that run through the network is actually dependent on num_replicas! IMO, this makes things quite confusing for validation datasets, where you can get different results depending on how you trained the model.

In our particular use-case, we sometimes train models with a single-digit # of validation samples (see #5047), so any duplication makes a big difference in the validation score.

Pitch

Ideally, PyTorch Lightning would not have this "padding" behavior and instead handle uneven DDP samples correctly. Based off of pytorch/pytorch#38174, it looks like this behavior is now supported in upstream PyTorch with a new API.

Alternatives

Another alternative that might "solve" the problem is to add an option to make the validation and/or test steps single-process only (e.g. only run it on a single process and have other processes stall). This would avoid this divisibility behavior, but has the obvious downside that we lose the DDP speed ups for those steps.

@alanhdu alanhdu added feature Is an improvement or enhancement help wanted Open to be worked on labels Dec 10, 2020
@carmocca
Copy link
Contributor

Duplicate of #3325

@carmocca carmocca marked this as a duplicate of #3325 Dec 10, 2020
@carmocca carmocca added the duplicate This issue or pull request already exists label Dec 10, 2020
@alanhdu
Copy link
Contributor Author

alanhdu commented Dec 11, 2020

Yes, it's definitely a duplicate. Sorry about the noise.

@alanhdu alanhdu closed this as completed Dec 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists feature Is an improvement or enhancement help wanted Open to be worked on
Projects
None yet
Development

No branches or pull requests

2 participants