You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/advanced/multi_gpu.rst
+1-13Lines changed: 1 addition & 13 deletions
Original file line number
Diff line number
Diff line change
@@ -813,12 +813,6 @@ Below we describe how to enable all of these to see benefit. **With all these im
813
813
814
814
Also please have a look at our :ref:`deepspeed-zero-stage-3-tips` which contains a lot of helpful information when configuring your own models.
815
815
816
-
.. note::
817
-
Currently we only support non-elastic checkpointing. This means saving the model across GPUs will save shards of the model on all processes, which will then require the same amount of GPUS to load.
818
-
This additionally means for inference you must use the ``Trainer.test`` or ``Trainer.predict`` functionality as described below, to ensure we set up the distributed environment correctly.
819
-
820
-
This limitation is actively being worked on and will be resolved in the near future.
821
-
822
816
.. code-block:: python
823
817
824
818
from pytorch_lightning import Trainer
@@ -848,12 +842,6 @@ We expose a hook that layers initialized within the hook will be sharded instant
848
842
849
843
This reduces the time taken to initialize very large models, as well as ensure we do not run out of memory when instantiating larger models. For more information you can refer to the DeepSpeed docs for `Constructing Massive Models <https://deepspeed.readthedocs.io/en/latest/zero3.html>`_.
850
844
851
-
.. note::
852
-
When using the ``configure_sharded_model`` hook to shard models, note that ``LightningModule.load_from_checkpoint`` may not work for loading saved checkpoints. If you've trained on one GPU, you can manually instantiate the model and call the hook,
853
-
however when using multiple GPUs, this will not work as ``LightningModule.load_from_checkpoint`` doesn't support sharded checkpoints.
854
-
855
-
We recommend using ``Trainer.test`` or ``Trainer.predict`` for inference.
856
-
857
845
.. code-block:: python
858
846
859
847
from pytorch_lightning import Trainer
@@ -950,7 +938,7 @@ Here is some helpful information when setting up DeepSpeed ZeRO Stage 3 with Lig
950
938
* If you're using Adam or AdamW, ensure to use FusedAdam or DeepSpeedCPUAdam (for CPU Offloading) rather than the default torch optimizers as they come with large speed benefits
951
939
* Treat your GPU/CPU memory as one large pool. In some cases, you may not want to offload certain things (like activations) to provide even more space to offload model parameters
952
940
* When offloading to the CPU, make sure to bump up the batch size as GPU memory will be freed
953
-
941
+
* We also support sharded checkpointing. By passing ``save_full_weights=False`` to the ``DeepSpeedPlugin``, we'll save shards of the model which allows you to save extremely large models. However to load the model and run test/validation/predict you must use the Trainer object.
0 commit comments