Large Model Initialization with FSDP #20658

jthomy · 2025-03-19T19:49:08Z

jthomy
Mar 19, 2025

With strategies like FSDP: If I have a model that is too large to fit into CPU memory (especically not to fit n_gpus many times into memory), just instantiating in the configure_model() hook will run out of memory.

What is lightning's intended way to initialize the model such that I only load the full model once overall? Optimally I would do this directly on the GPU to have faster startup times. I would load the model from a non-distributed checkpoint, but happy to head any suggestions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large Model Initialization with FSDP #20658

{{title}}

Replies: 0 comments

Select a reply

Large Model Initialization with FSDP #20658

jthomy Mar 19, 2025

Replies: 0 comments

jthomy
Mar 19, 2025