Support DDP with uneven input sizes without data duplication #5060
Labels
duplicate
This issue or pull request already exists
feature
Is an improvement or enhancement
help wanted
Open to be worked on
🚀 Feature
Motivation
Right now, if you use the
ddp
accelerator PyTorch Lightning automatically inserts thetorch.utils.data.DistributedSampler
which shards a dataset across multiple processes. This has not-obvious behavior whenlen(dataset) % num_replicas != 0
-- in this case, because items cannot be evenly sharded across batches,DistributedSampler
duplicates certain samples to make things evenly distributed.That means the # of samples that run through the network is actually dependent on
num_replicas
! IMO, this makes things quite confusing for validation datasets, where you can get different results depending on how you trained the model.In our particular use-case, we sometimes train models with a single-digit # of validation samples (see #5047), so any duplication makes a big difference in the validation score.
Pitch
Ideally, PyTorch Lightning would not have this "padding" behavior and instead handle uneven DDP samples correctly. Based off of pytorch/pytorch#38174, it looks like this behavior is now supported in upstream PyTorch with a new API.
Alternatives
Another alternative that might "solve" the problem is to add an option to make the validation and/or test steps single-process only (e.g. only run it on a single process and have other processes stall). This would avoid this divisibility behavior, but has the obvious downside that we lose the DDP speed ups for those steps.
The text was updated successfully, but these errors were encountered: