Skip to content

Commit 7b0b6c3

Browse files
author
Sean Naren
authored
Update DeepSpeed docs after single saving PR (#7036)
1 parent f852a4f commit 7b0b6c3

File tree

1 file changed

+1
-13
lines changed

1 file changed

+1
-13
lines changed

docs/source/advanced/multi_gpu.rst

Lines changed: 1 addition & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -813,12 +813,6 @@ Below we describe how to enable all of these to see benefit. **With all these im
813813

814814
Also please have a look at our :ref:`deepspeed-zero-stage-3-tips` which contains a lot of helpful information when configuring your own models.
815815

816-
.. note::
817-
Currently we only support non-elastic checkpointing. This means saving the model across GPUs will save shards of the model on all processes, which will then require the same amount of GPUS to load.
818-
This additionally means for inference you must use the ``Trainer.test`` or ``Trainer.predict`` functionality as described below, to ensure we set up the distributed environment correctly.
819-
820-
This limitation is actively being worked on and will be resolved in the near future.
821-
822816
.. code-block:: python
823817
824818
from pytorch_lightning import Trainer
@@ -848,12 +842,6 @@ We expose a hook that layers initialized within the hook will be sharded instant
848842

849843
This reduces the time taken to initialize very large models, as well as ensure we do not run out of memory when instantiating larger models. For more information you can refer to the DeepSpeed docs for `Constructing Massive Models <https://deepspeed.readthedocs.io/en/latest/zero3.html>`_.
850844

851-
.. note::
852-
When using the ``configure_sharded_model`` hook to shard models, note that ``LightningModule.load_from_checkpoint`` may not work for loading saved checkpoints. If you've trained on one GPU, you can manually instantiate the model and call the hook,
853-
however when using multiple GPUs, this will not work as ``LightningModule.load_from_checkpoint`` doesn't support sharded checkpoints.
854-
855-
We recommend using ``Trainer.test`` or ``Trainer.predict`` for inference.
856-
857845
.. code-block:: python
858846
859847
from pytorch_lightning import Trainer
@@ -950,7 +938,7 @@ Here is some helpful information when setting up DeepSpeed ZeRO Stage 3 with Lig
950938
* If you're using Adam or AdamW, ensure to use FusedAdam or DeepSpeedCPUAdam (for CPU Offloading) rather than the default torch optimizers as they come with large speed benefits
951939
* Treat your GPU/CPU memory as one large pool. In some cases, you may not want to offload certain things (like activations) to provide even more space to offload model parameters
952940
* When offloading to the CPU, make sure to bump up the batch size as GPU memory will be freed
953-
941+
* We also support sharded checkpointing. By passing ``save_full_weights=False`` to the ``DeepSpeedPlugin``, we'll save shards of the model which allows you to save extremely large models. However to load the model and run test/validation/predict you must use the Trainer object.
954942

955943
Custom DeepSpeed Config
956944
"""""""""""""""""""""""

0 commit comments

Comments
 (0)