Large Model Initialization with FSDP #20658
Unanswered
jthomy
asked this question in
DDP / multi-GPU / multi-node
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
With strategies like FSDP: If I have a model that is too large to fit into CPU memory (especically not to fit n_gpus many times into memory), just instantiating in the configure_model() hook will run out of memory.
What is lightning's intended way to initialize the model such that I only load the full model once overall? Optimally I would do this directly on the GPU to have faster startup times. I would load the model from a non-distributed checkpoint, but happy to head any suggestions.
Beta Was this translation helpful? Give feedback.
All reactions