Skip to content

Trainer: auto default #16847

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Feb 23, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 20 additions & 19 deletions docs/source-pytorch/accelerators/gpu_basic.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,30 +14,31 @@ A Graphics Processing Unit (GPU), is a specialized hardware accelerator designed

----

Train on 1 GPU
--------------

Make sure you're running on a machine with at least one GPU. There's no need to specify any NVIDIA flags
as Lightning will do it for you.

.. testcode::
:skipif: torch.cuda.device_count() < 1

trainer = Trainer(accelerator="gpu", devices=1)

----------------


.. _multi_gpu:

Train on multiple GPUs
----------------------
Train on GPUs
-------------

To use multiple GPUs, set the number of devices in the Trainer or the index of the GPUs.
The Trainer will run on all available GPUs by default. Make sure you're running on a machine with at least one GPU.
There's no need to specify any NVIDIA flags as Lightning will do it for you.

.. code::
.. code-block:: python

# run on as many GPUs as available by default
trainer = Trainer(accelerator="auto", devices="auto", strategy="auto")
# equivalent to
trainer = Trainer()

trainer = Trainer(accelerator="gpu", devices=4)
# run on one GPU
trainer = Trainer(accelerator="gpu", devices=1)
# run on multiple GPUs
trainer = Trainer(accelerator="gpu", devices=8)
# choose the number of devices automatically
trainer = Trainer(accelerator="gpu", devices="auto")

.. note::
Setting ``accelerator="gpu"`` will also automatically choose the "mps" device on Apple sillicon GPUs.
If you want to avoid this, you can set ``accelerator="cuda"`` instead.

Choosing GPU devices
^^^^^^^^^^^^^^^^^^^^
Expand Down
40 changes: 16 additions & 24 deletions docs/source-pytorch/accelerators/hpu_basic.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,25 +25,30 @@ For more information, check out `Gaudi Architecture <https://docs.habana.ai/en/l

----

Run on 1 Gaudi
--------------
Run on Gaudi
------------

To enable PyTorch Lightning to utilize the HPU accelerator, simply provide ``accelerator="hpu"`` parameter to the Trainer class.

.. code-block:: python

trainer = Trainer(accelerator="hpu", devices=1)

----
# run on as many Gaudi devices as available by default
trainer = Trainer(accelerator="auto", devices="auto", strategy="auto")
# equivalent to
trainer = Trainer()

Run on multiple Gaudis
----------------------
The ``devices=8`` and ``accelerator="hpu"`` parameters to the Trainer class enables the Habana accelerator for distributed training with 8 Gaudis.
It uses :class:`~pytorch_lightning.strategies.hpu_parallel.HPUParallelStrategy` internally which is based on DDP strategy with the addition of Habana's collective communication library (HCCL) to support scale-up within a node and scale-out across multiple nodes.
# run on one Gaudi device
trainer = Trainer(accelerator="hpu", devices=1)
# run on multiple Gaudi devices
trainer = Trainer(accelerator="hpu", devices=8)
# choose the number of devices automatically
trainer = Trainer(accelerator="hpu", devices="auto")

.. code-block:: python

trainer = Trainer(devices=8, accelerator="hpu")
The ``devices>1`` parameter with HPUs enables the Habana accelerator for distributed training.
It uses :class:`~pytorch_lightning.strategies.hpu_parallel.HPUParallelStrategy` internally which is based on DDP
strategy with the addition of Habana's collective communication library (HCCL) to support scale-up within a node and
scale-out across multiple nodes.

----

Expand Down Expand Up @@ -81,19 +86,6 @@ On Node 2:

----

Select Gaudis automatically
---------------------------

Lightning can automatically detect the number of Gaudi devices to run on. This setting is enabled by default if the devices argument is missing.

.. code-block:: python

# equivalent
trainer = Trainer(accelerator="hpu")
trainer = Trainer(accelerator="hpu", devices="auto")

----

How to access HPUs
------------------

Expand Down
25 changes: 14 additions & 11 deletions docs/source-pytorch/accelerators/ipu_basic.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,23 +24,26 @@ See the `Graphcore Glossary <https://docs.graphcore.ai/projects/graphcore-glossa

----

Run on 1 IPU
------------
To use a single IPU, set the accelerator and devices argument.
Run on IPU
----------

.. code-block:: python

trainer = pl.Trainer(accelerator="ipu", devices=1)

----
To enable PyTorch Lightning to utilize the IPU accelerator, simply provide ``accelerator="ipu"`` parameter to the Trainer class.

Run on multiple IPUs
--------------------
To use multiple IPUs set the devices to a number that is a power of 2 (i.e: 2, 4, 8, 16, ...)

.. code-block:: python

trainer = pl.Trainer(accelerator="ipu", devices=8)
# run on as many IPUs as available by default
trainer = Trainer(accelerator="auto", devices="auto", strategy="auto")
# equivalent to
trainer = Trainer()

# run on one IPU
trainer = Trainer(accelerator="ipu", devices=1)
# run on multiple IPUs
trainer = Trainer(accelerator="ipu", devices=8)
# choose the number of devices automatically
trainer = Trainer(accelerator="ipu", devices="auto")

----

Expand Down
40 changes: 15 additions & 25 deletions docs/source-pytorch/accelerators/tpu_basic.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,36 +32,26 @@ some subset of those 2048 cores.

----

Run on 1 TPU core
-----------------
Enable the following Trainer arguments to run on 1 TPU.

.. code::

trainer = Trainer(accelerator="tpu", devices=1)

----

Run on multiple TPU cores
-------------------------
For multiple TPU cores, change the value of the devices flag.

.. code::

trainer = Trainer(accelerator="tpu", devices=8)

----

Run on a specific TPU core
--------------------------
Run on TPU cores
----------------

To run on a specific core, specify the index of the TPU core.
To run on different cores, modify the ``devices`` argument.

.. code-block:: python

trainer = pl.Trainer(accelerator="tpu", devices=[5])
# run on as many TPUs as available by default
trainer = Trainer(accelerator="auto", devices="auto", strategy="auto")
# equivalent to
trainer = Trainer()

This example runs on the 5th core, not on five cores.
# run on one TPU core
trainer = Trainer(accelerator="tpu", devices=1)
# run on multiple TPU cores
trainer = Trainer(accelerator="tpu", devices=8)
# run on the 5th core
trainer = Trainer(accelerator="tpu", devices=[5])
# choose the number of cores automatically
trainer = Trainer(accelerator="tpu", devices="auto")

----

Expand Down
4 changes: 2 additions & 2 deletions docs/source-pytorch/common/trainer.rst
Original file line number Diff line number Diff line change
Expand Up @@ -200,7 +200,7 @@ as well as custom accelerator instances.
# Training with GPU Accelerator using the DistributedDataParallel strategy
trainer = Trainer(devices=4, accelerator="gpu", strategy="ddp")

.. note:: The ``"auto"`` option recognizes the machine you are on, and selects the respective ``Accelerator``.
.. note:: The ``"auto"`` option recognizes the machine you are on, and selects the appropriate ``Accelerator``.

.. code-block:: python

Expand Down Expand Up @@ -417,7 +417,7 @@ Number of devices to train on (``int``), which devices to train on (``list`` or

.. code-block:: python

# If your machine has GPUs, it will use all the available GPUs for training
# Use whatever hardware your machine has available
trainer = Trainer(devices="auto", accelerator="auto")

# Training with CPU Accelerator using 1 process
Expand Down
4 changes: 4 additions & 0 deletions src/lightning/pytorch/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,10 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).

### Changed


- The `Trainer` now chooses `accelerator="auto", strategy="auto", devices="auto"` as defaults ([#16847](https://github.com/Lightning-AI/lightning/pull/16847))


- "Native" suffix removal ([#16490](https://github.com/Lightning-AI/lightning/pull/16490))
* `strategy="fsdp_native"` is now `strategy="fsdp"`
* `strategy="fsdp_native_full_shard_offload"` is now `strategy="fsdp_cpu_offload"`
Expand Down
Loading