Skip to content

Commit 56bc55d

Browse files
kaushikb11awaelchlicarmocca
authored
Update strategy flag in docs (#10000)
Co-authored-by: Adrian Wälchli <[email protected]> Co-authored-by: Carlos Mocholi <[email protected]>
1 parent 4f19a4d commit 56bc55d

35 files changed

+196
-189
lines changed

docs/source/advanced/advanced_gpu.rst

Lines changed: 23 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -71,9 +71,9 @@ To use Sharded Training, you need to first install FairScale using the command b
7171
.. code-block:: python
7272
7373
# train using Sharded DDP
74-
trainer = Trainer(plugins="ddp_sharded")
74+
trainer = Trainer(strategy="ddp_sharded")
7575
76-
Sharded Training can work across all DDP variants by adding the additional ``--plugins ddp_sharded`` flag.
76+
Sharded Training can work across all DDP variants by adding the additional ``--strategy ddp_sharded`` flag.
7777

7878
Internally we re-initialize your optimizers and shard them across your machines and processes. We handle all communication using PyTorch distributed, so no code changes are required.
7979

@@ -156,7 +156,7 @@ Below is an example of using both ``wrap`` and ``auto_wrap`` to create your mode
156156
157157
158158
model = MyModel()
159-
trainer = Trainer(gpus=4, plugins="fsdp", precision=16)
159+
trainer = Trainer(gpus=4, strategy="fsdp", precision=16)
160160
trainer.fit(model)
161161
162162
trainer.test()
@@ -248,7 +248,7 @@ It is recommended to skip Stage 1 and use Stage 2, which comes with larger memor
248248
from pytorch_lightning import Trainer
249249
250250
model = MyModel()
251-
trainer = Trainer(gpus=4, plugins="deepspeed_stage_1", precision=16)
251+
trainer = Trainer(gpus=4, strategy="deepspeed_stage_1", precision=16)
252252
trainer.fit(model)
253253
254254
@@ -265,7 +265,7 @@ As a result, benefits can also be seen on a single GPU. Do note that the default
265265
from pytorch_lightning import Trainer
266266
267267
model = MyModel()
268-
trainer = Trainer(gpus=4, plugins="deepspeed_stage_2", precision=16)
268+
trainer = Trainer(gpus=4, strategy="deepspeed_stage_2", precision=16)
269269
trainer.fit(model)
270270
271271
.. code-block:: bash
@@ -286,7 +286,7 @@ Below we show an example of running `ZeRO-Offload <https://www.deepspeed.ai/tuto
286286
from pytorch_lightning.plugins import DeepSpeedPlugin
287287
288288
model = MyModel()
289-
trainer = Trainer(gpus=4, plugins="deepspeed_stage_2_offload", precision=16)
289+
trainer = Trainer(gpus=4, strategy="deepspeed_stage_2_offload", precision=16)
290290
trainer.fit(model)
291291
292292
@@ -307,7 +307,7 @@ You can also modify the ZeRO-Offload parameters via the plugin as below.
307307
model = MyModel()
308308
trainer = Trainer(
309309
gpus=4,
310-
plugins=DeepSpeedPlugin(offload_optimizer=True, allgather_bucket_size=5e8, reduce_bucket_size=5e8),
310+
strategy=DeepSpeedPlugin(offload_optimizer=True, allgather_bucket_size=5e8, reduce_bucket_size=5e8),
311311
precision=16,
312312
)
313313
trainer.fit(model)
@@ -340,7 +340,7 @@ For even more speed benefit, DeepSpeed offers an optimized CPU version of ADAM c
340340
341341
342342
model = MyModel()
343-
trainer = Trainer(gpus=4, plugins="deepspeed_stage_2_offload", precision=16)
343+
trainer = Trainer(gpus=4, strategy="deepspeed_stage_2_offload", precision=16)
344344
trainer.fit(model)
345345
346346
@@ -383,7 +383,7 @@ Also please have a look at our :ref:`deepspeed-zero-stage-3-tips` which contains
383383
384384
385385
model = MyModel()
386-
trainer = Trainer(gpus=4, plugins="deepspeed_stage_3", precision=16)
386+
trainer = Trainer(gpus=4, strategy="deepspeed_stage_3", precision=16)
387387
trainer.fit(model)
388388
389389
trainer.test()
@@ -403,7 +403,7 @@ You can also use the Lightning Trainer to run predict or evaluate with DeepSpeed
403403
404404
405405
model = MyModel()
406-
trainer = Trainer(gpus=4, plugins="deepspeed_stage_3", precision=16)
406+
trainer = Trainer(gpus=4, strategy="deepspeed_stage_3", precision=16)
407407
trainer.test(ckpt_path="my_saved_deepspeed_checkpoint.ckpt")
408408
409409
@@ -438,7 +438,7 @@ This reduces the time taken to initialize very large models, as well as ensure w
438438
439439
440440
model = MyModel()
441-
trainer = Trainer(gpus=4, plugins="deepspeed_stage_3", precision=16)
441+
trainer = Trainer(gpus=4, strategy="deepspeed_stage_3", precision=16)
442442
trainer.fit(model)
443443
444444
trainer.test()
@@ -463,14 +463,14 @@ DeepSpeed ZeRO Stage 3 Offloads optimizer state, gradients to the host CPU to re
463463
464464
# Enable CPU Offloading
465465
model = MyModel()
466-
trainer = Trainer(gpus=4, plugins="deepspeed_stage_3_offload", precision=16)
466+
trainer = Trainer(gpus=4, strategy="deepspeed_stage_3_offload", precision=16)
467467
trainer.fit(model)
468468
469469
# Enable CPU Offloading, and offload parameters to CPU
470470
model = MyModel()
471471
trainer = Trainer(
472472
gpus=4,
473-
plugins=DeepSpeedPlugin(
473+
strategy=DeepSpeedPlugin(
474474
stage=3,
475475
offload_optimizer=True,
476476
offload_parameters=True,
@@ -492,14 +492,14 @@ Additionally, DeepSpeed supports offloading to NVMe drives for even larger model
492492
493493
# Enable CPU Offloading
494494
model = MyModel()
495-
trainer = Trainer(gpus=4, plugins="deepspeed_stage_3_offload", precision=16)
495+
trainer = Trainer(gpus=4, strategy="deepspeed_stage_3_offload", precision=16)
496496
trainer.fit(model)
497497
498498
# Enable CPU Offloading, and offload parameters to CPU
499499
model = MyModel()
500500
trainer = Trainer(
501501
gpus=4,
502-
plugins=DeepSpeedPlugin(
502+
strategy=DeepSpeedPlugin(
503503
stage=3,
504504
offload_optimizer=True,
505505
offload_parameters=True,
@@ -576,12 +576,12 @@ This saves memory when training larger models, however requires using a checkpoi
576576
model = MyModel()
577577
578578
579-
trainer = Trainer(gpus=4, plugins="deepspeed_stage_3_offload", precision=16)
579+
trainer = Trainer(gpus=4, strategy="deepspeed_stage_3_offload", precision=16)
580580
581581
# Enable CPU Activation Checkpointing
582582
trainer = Trainer(
583583
gpus=4,
584-
plugins=DeepSpeedPlugin(
584+
strategy=DeepSpeedPlugin(
585585
stage=3,
586586
offload_optimizer=True, # Enable CPU Offloading
587587
cpu_checkpointing=True, # (Optional) offload activations to CPU
@@ -670,7 +670,7 @@ In some cases you may want to define your own DeepSpeed Config, to access all pa
670670
}
671671
672672
model = MyModel()
673-
trainer = Trainer(gpus=4, plugins=DeepSpeedPlugin(deepspeed_config), precision=16)
673+
trainer = Trainer(gpus=4, strategy=DeepSpeedPlugin(deepspeed_config), precision=16)
674674
trainer.fit(model)
675675
676676
@@ -682,7 +682,7 @@ We support taking the config as a json formatted file:
682682
from pytorch_lightning.plugins import DeepSpeedPlugin
683683
684684
model = MyModel()
685-
trainer = Trainer(gpus=4, plugins=DeepSpeedPlugin("/path/to/deepspeed_config.json"), precision=16)
685+
trainer = Trainer(gpus=4, strategy=DeepSpeedPlugin("/path/to/deepspeed_config.json"), precision=16)
686686
trainer.fit(model)
687687
688688
@@ -717,7 +717,7 @@ This can reduce peak memory usage and throughput as saved memory will be equal t
717717
from pytorch_lightning.plugins import DDPPlugin
718718
719719
model = MyModel()
720-
trainer = Trainer(gpus=4, plugins=DDPPlugin(gradient_as_bucket_view=True))
720+
trainer = Trainer(gpus=4, strategy=DDPPlugin(gradient_as_bucket_view=True))
721721
trainer.fit(model)
722722
723723
DDP Communication Hooks
@@ -740,7 +740,7 @@ Enable `FP16 Compress Hook for multi-node throughput improvement <https://pytorc
740740
)
741741
742742
model = MyModel()
743-
trainer = Trainer(gpus=4, plugins=DDPPlugin(ddp_comm_hook=default.fp16_compress_hook))
743+
trainer = Trainer(gpus=4, strategy=DDPPlugin(ddp_comm_hook=default.fp16_compress_hook))
744744
trainer.fit(model)
745745
746746
Enable `PowerSGD for multi-node throughput improvement <https://pytorch.org/docs/stable/ddp_comm_hooks.html#powersgd-communication-hook>`__:
@@ -758,7 +758,7 @@ Enable `PowerSGD for multi-node throughput improvement <https://pytorch.org/docs
758758
model = MyModel()
759759
trainer = Trainer(
760760
gpus=4,
761-
plugins=DDPPlugin(
761+
strategy=DDPPlugin(
762762
ddp_comm_state=powerSGD.PowerSGDState(
763763
process_group=None,
764764
matrix_approximation_rank=1,
@@ -787,7 +787,7 @@ Combine hooks for accumulated benefit:
787787
model = MyModel()
788788
trainer = Trainer(
789789
gpus=4,
790-
plugins=DDPPlugin(
790+
strategy=DDPPlugin(
791791
ddp_comm_state=powerSGD.PowerSGDState(
792792
process_group=None,
793793
matrix_approximation_rank=1,

docs/source/advanced/ipu.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,7 @@ IPUs provide further optimizations to speed up training. By using the ``IPUPlugi
8383
from pytorch_lightning.plugins import IPUPlugin
8484
8585
model = MyLightningModule()
86-
trainer = pl.Trainer(ipus=8, plugins=IPUPlugin(device_iterations=32))
86+
trainer = pl.Trainer(ipus=8, strategy=IPUPlugin(device_iterations=32))
8787
trainer.fit(model)
8888
8989
Note that by default we return the last device iteration loss. You can override this by passing in your own ``poptorch.Options`` and setting the AnchorMode as described in the `PopTorch documentation <https://docs.graphcore.ai/projects/poptorch-user-guide/en/latest/reference.html#poptorch.Options.anchorMode>`__.
@@ -102,7 +102,7 @@ Note that by default we return the last device iteration loss. You can override
102102
training_opts.anchorMode(poptorch.AnchorMode.All)
103103
training_opts.deviceIterations(32)
104104
105-
trainer = Trainer(ipus=8, plugins=IPUPlugin(inference_opts=inference_opts, training_opts=training_opts))
105+
trainer = Trainer(ipus=8, strategy=IPUPlugin(inference_opts=inference_opts, training_opts=training_opts))
106106
trainer.fit(model)
107107
108108
You can also override all options by passing the ``poptorch.Options`` to the plugin. See `PopTorch options documentation <https://docs.graphcore.ai/projects/poptorch-user-guide/en/latest/batching.html>`__ for more information.
@@ -124,7 +124,7 @@ Lightning supports dumping all reports to a directory to open using the tool.
124124
from pytorch_lightning.plugins import IPUPlugin
125125
126126
model = MyLightningModule()
127-
trainer = pl.Trainer(ipus=8, plugins=IPUPlugin(autoreport_dir="report_dir/"))
127+
trainer = pl.Trainer(ipus=8, strategy=IPUPlugin(autoreport_dir="report_dir/"))
128128
trainer.fit(model)
129129
130130
This will dump all reports to ``report_dir/`` which can then be opened using the Graph Analyser Tool, see `Opening Reports <https://docs.graphcore.ai/projects/graphcore-popvision-user-guide/en/latest/graph/graph.html#opening-reports>`__.
@@ -174,7 +174,7 @@ Below is an example using the block annotation in a LightningModule.
174174
175175
176176
model = MyLightningModule()
177-
trainer = pl.Trainer(ipus=8, plugins=IPUPlugin(device_iterations=20))
177+
trainer = pl.Trainer(ipus=8, strategy=IPUPlugin(device_iterations=20))
178178
trainer.fit(model)
179179
180180
@@ -217,7 +217,7 @@ You can also use the block context manager within the forward function, or any o
217217
218218
219219
model = MyLightningModule()
220-
trainer = pl.Trainer(ipus=8, plugins=IPUPlugin(device_iterations=20))
220+
trainer = pl.Trainer(ipus=8, strategy=IPUPlugin(device_iterations=20))
221221
trainer.fit(model)
222222
223223

docs/source/advanced/multi_gpu.rst

Lines changed: 28 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -253,11 +253,11 @@ Distributed modes
253253
-----------------
254254
Lightning allows multiple ways of training
255255

256-
- Data Parallel (``accelerator='dp'``) (multiple-gpus, 1 machine)
257-
- DistributedDataParallel (``accelerator='ddp'``) (multiple-gpus across many machines (python script based)).
258-
- DistributedDataParallel (``accelerator='ddp_spawn'``) (multiple-gpus across many machines (spawn based)).
259-
- DistributedDataParallel 2 (``accelerator='ddp2'``) (DP in a machine, DDP across machines).
260-
- Horovod (``accelerator='horovod'``) (multi-machine, multi-gpu, configured at runtime)
256+
- Data Parallel (``strategy='dp'``) (multiple-gpus, 1 machine)
257+
- DistributedDataParallel (``strategy='ddp'``) (multiple-gpus across many machines (python script based)).
258+
- DistributedDataParallel (``strategy='ddp_spawn'``) (multiple-gpus across many machines (spawn based)).
259+
- DistributedDataParallel 2 (``strategy='ddp2'``) (DP in a machine, DDP across machines).
260+
- Horovod (``strategy='horovod'``) (multi-machine, multi-gpu, configured at runtime)
261261
- TPUs (``tpu_cores=8|x``) (tpu or TPU pod)
262262

263263
.. note::
@@ -287,7 +287,7 @@ after which the root node will aggregate the results.
287287
:skipif: torch.cuda.device_count() < 2
288288

289289
# train on 2 GPUs (using DP mode)
290-
trainer = Trainer(gpus=2, accelerator="dp")
290+
trainer = Trainer(gpus=2, strategy="dp")
291291

292292
Distributed Data Parallel
293293
^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -308,10 +308,10 @@ Distributed Data Parallel
308308
.. code-block:: python
309309
310310
# train on 8 GPUs (same machine (ie: node))
311-
trainer = Trainer(gpus=8, accelerator="ddp")
311+
trainer = Trainer(gpus=8, strategy="ddp")
312312
313313
# train on 32 GPUs (4 nodes)
314-
trainer = Trainer(gpus=8, accelerator="ddp", num_nodes=4)
314+
trainer = Trainer(gpus=8, strategy="ddp", num_nodes=4)
315315
316316
This Lightning implementation of DDP calls your script under the hood multiple times with the correct environment
317317
variables:
@@ -356,7 +356,7 @@ In this case, we can use DDP2 which behaves like DP in a machine and DDP across
356356
.. code-block:: python
357357
358358
# train on 32 GPUs (4 nodes)
359-
trainer = Trainer(gpus=8, accelerator="ddp2", num_nodes=4)
359+
trainer = Trainer(gpus=8, strategy="ddp2", num_nodes=4)
360360
361361
Distributed Data Parallel Spawn
362362
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -374,7 +374,7 @@ project module) you can use the following method:
374374
.. code-block:: python
375375
376376
# train on 8 GPUs (same machine (ie: node))
377-
trainer = Trainer(gpus=8, accelerator="ddp_spawn")
377+
trainer = Trainer(gpus=8, strategy="ddp_spawn")
378378
379379
We STRONGLY discourage this use because it has limitations (due to Python and PyTorch):
380380

@@ -446,10 +446,10 @@ Horovod can be configured in the training script to run with any number of GPUs
446446
.. code-block:: python
447447
448448
# train Horovod on GPU (number of GPUs / machines provided on command-line)
449-
trainer = Trainer(accelerator="horovod", gpus=1)
449+
trainer = Trainer(strategy="horovod", gpus=1)
450450
451451
# train Horovod on CPU (number of processes / machines provided on command-line)
452-
trainer = Trainer(accelerator="horovod")
452+
trainer = Trainer(strategy="horovod")
453453
454454
When starting the training job, the driver application will then be used to specify the total
455455
number of worker processes:
@@ -583,11 +583,11 @@ Below are the possible configurations we support.
583583
+-------+---------+----+-----+--------+------------------------------------------------------------+
584584
| Y | | | | Y | `Trainer(gpus=1, precision=16)` |
585585
+-------+---------+----+-----+--------+------------------------------------------------------------+
586-
| | Y | Y | | | `Trainer(gpus=k, accelerator='dp')` |
586+
| | Y | Y | | | `Trainer(gpus=k, strategy='dp')` |
587587
+-------+---------+----+-----+--------+------------------------------------------------------------+
588-
| | Y | | Y | | `Trainer(gpus=k, accelerator='ddp')` |
588+
| | Y | | Y | | `Trainer(gpus=k, strategy='ddp')` |
589589
+-------+---------+----+-----+--------+------------------------------------------------------------+
590-
| | Y | | Y | Y | `Trainer(gpus=k, accelerator='ddp', precision=16)` |
590+
| | Y | | Y | Y | `Trainer(gpus=k, strategy='ddp', precision=16)` |
591591
+-------+---------+----+-----+--------+------------------------------------------------------------+
592592

593593

@@ -616,29 +616,29 @@ In DDP, DDP_SPAWN, Deepspeed, DDP_SHARDED, or Horovod your effective batch size
616616
.. code-block:: python
617617
618618
# effective batch size = 7 * 8
619-
Trainer(gpus=8, accelerator="ddp")
620-
Trainer(gpus=8, accelerator="ddp_spawn")
621-
Trainer(gpus=8, accelerator="ddp_sharded")
622-
Trainer(gpus=8, accelerator="horovod")
619+
Trainer(gpus=8, strategy="ddp")
620+
Trainer(gpus=8, strategy="ddp_spawn")
621+
Trainer(gpus=8, strategy="ddp_sharded")
622+
Trainer(gpus=8, strategy="horovod")
623623
624624
# effective batch size = 7 * 8 * 10
625-
Trainer(gpus=8, num_nodes=10, accelerator="ddp")
626-
Trainer(gpus=8, num_nodes=10, accelerator="ddp_spawn")
627-
Trainer(gpus=8, num_nodes=10, accelerator="ddp_sharded")
628-
Trainer(gpus=8, num_nodes=10, accelerator="horovod")
625+
Trainer(gpus=8, num_nodes=10, strategy="ddp")
626+
Trainer(gpus=8, num_nodes=10, strategy="ddp_spawn")
627+
Trainer(gpus=8, num_nodes=10, strategy="ddp_sharded")
628+
Trainer(gpus=8, num_nodes=10, strategy="horovod")
629629
630630
In DDP2 or DP, your effective batch size will be 7 * num_nodes.
631631
The reason is that the full batch is visible to all GPUs on the node when using DDP2.
632632

633633
.. code-block:: python
634634
635635
# effective batch size = 7
636-
Trainer(gpus=8, accelerator="ddp2")
637-
Trainer(gpus=8, accelerator="dp")
636+
Trainer(gpus=8, strategy="ddp2")
637+
Trainer(gpus=8, strategy="dp")
638638
639639
# effective batch size = 7 * 10
640-
Trainer(gpus=8, num_nodes=10, accelerator="ddp2")
641-
Trainer(gpus=8, accelerator="dp")
640+
Trainer(gpus=8, num_nodes=10, strategy="ddp2")
641+
Trainer(gpus=8, strategy="dp")
642642
643643
644644
.. note:: Huge batch sizes are actually really bad for convergence. Check out:
@@ -652,7 +652,7 @@ Lightning supports the use of Torch Distributed Elastic to enable fault-tolerant
652652

653653
.. code-block:: python
654654
655-
Trainer(gpus=8, accelerator="ddp")
655+
Trainer(gpus=8, strategy="ddp")
656656
657657
To launch a fault-tolerant job, run the following on all nodes.
658658

docs/source/advanced/tpu.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -349,14 +349,14 @@ Don't use ``xm.xla_device()`` while working on Lightning + TPUs!
349349
350350
PyTorch XLA only supports Tensor objects for CPU to TPU data transfer. Might cause issues if the User is trying to send some non-tensor objects through the DataLoader or during saving states.
351351

352-
- **Using `tpu_spawn_debug` Plugin**
352+
- **Using `tpu_spawn_debug` Plugin alias**
353353

354354
.. code-block:: python
355355
356356
import pytorch_lightning as pl
357357
358358
my_model = MyLightningModule()
359-
trainer = pl.Trainer(tpu_cores=8, plugins="tpu_spawn_debug")
359+
trainer = pl.Trainer(tpu_cores=8, strategy="tpu_spawn_debug")
360360
trainer.fit(my_model)
361361
362362
Example Metrics report:

0 commit comments

Comments
 (0)