Lightning-AI · carmocca · Feb 23, 2023 · Feb 23, 2023 · Feb 23, 2023 · Feb 23, 2023
@@ -14,30 +14,31 @@ A Graphics Processing Unit (GPU), is a specialized hardware accelerator designed
 
 ----
 
-Train on 1 GPU
---------------
-
-Make sure you're running on a machine with at least one GPU. There's no need to specify any NVIDIA flags
-as Lightning will do it for you.
-
-.. testcode::
-    :skipif: torch.cuda.device_count() < 1
-
-    trainer = Trainer(accelerator="gpu", devices=1)
-
-----------------
-
-
 .. _multi_gpu:
 
-Train on multiple GPUs
-----------------------
+Train on GPUs
+-------------
 
-To use multiple GPUs, set the number of devices in the Trainer or the index of the GPUs.
+The Trainer will run on all available GPUs by default. Make sure you're running on a machine with at least one GPU.
+There's no need to specify any NVIDIA flags as Lightning will do it for you.
 
-.. code::
+.. code-block:: python
+
+    # run on as many GPUs as available by default
+    trainer = Trainer(accelerator="auto", devices="auto", strategy="auto")
+    # equivalent to
+    trainer = Trainer()
 
-    trainer = Trainer(accelerator="gpu", devices=4)
+    # run on one GPU
+    trainer = Trainer(accelerator="gpu", devices=1)
+    # run on multiple GPUs
+    trainer = Trainer(accelerator="gpu", devices=8)
+    # choose the number of devices automatically
+    trainer = Trainer(accelerator="gpu", devices="auto")
+
+.. note::
+    Setting ``accelerator="gpu"`` will also automatically choose the "mps" device on Apple sillicon GPUs.
+    If you want to avoid this, you can set ``accelerator="cuda"`` instead.
 
 Choosing GPU devices
 ^^^^^^^^^^^^^^^^^^^^

@@ -25,25 +25,30 @@ For more information, check out `Gaudi Architecture <https://docs.habana.ai/en/l
 
 ----
 
-Run on 1 Gaudi
---------------
+Run on Gaudi
+------------
 
 To enable PyTorch Lightning to utilize the HPU accelerator, simply provide ``accelerator="hpu"`` parameter to the Trainer class.
 
 .. code-block:: python
 
-    trainer = Trainer(accelerator="hpu", devices=1)
-
-----
+    # run on as many Gaudi devices as available by default
+    trainer = Trainer(accelerator="auto", devices="auto", strategy="auto")
+    # equivalent to
+    trainer = Trainer()
 
-Run on multiple Gaudis
-----------------------
-The ``devices=8`` and ``accelerator="hpu"`` parameters to the Trainer class enables the Habana accelerator for distributed training with 8 Gaudis.
-It uses :class:`~pytorch_lightning.strategies.hpu_parallel.HPUParallelStrategy` internally which is based on DDP strategy with the addition of Habana's collective communication library (HCCL) to support scale-up within a node and scale-out across multiple nodes.
+    # run on one Gaudi device
+    trainer = Trainer(accelerator="hpu", devices=1)
+    # run on multiple Gaudi devices
+    trainer = Trainer(accelerator="hpu", devices=8)
+    # choose the number of devices automatically
+    trainer = Trainer(accelerator="hpu", devices="auto")
 
-.. code-block:: python
 
-    trainer = Trainer(devices=8, accelerator="hpu")
+The ``devices>1`` parameter with HPUs enables the Habana accelerator for distributed training.
+It uses :class:`~pytorch_lightning.strategies.hpu_parallel.HPUParallelStrategy` internally which is based on DDP
+strategy with the addition of Habana's collective communication library (HCCL) to support scale-up within a node and
+scale-out across multiple nodes.
 
 ----
 
@@ -81,19 +86,6 @@ On Node 2:
 
 ----
 
-Select Gaudis automatically
----------------------------
-
-Lightning can automatically detect the number of Gaudi devices to run on. This setting is enabled by default if the devices argument is missing.
-
-.. code-block:: python
-
-    # equivalent
-    trainer = Trainer(accelerator="hpu")
-    trainer = Trainer(accelerator="hpu", devices="auto")
-
-----
-
 How to access HPUs
 ------------------
 

@@ -24,23 +24,26 @@ See the `Graphcore Glossary <https://docs.graphcore.ai/projects/graphcore-glossa
 
 ----
 
-Run on 1 IPU
-------------
-To use a single IPU, set the accelerator and devices argument.
+Run on IPU
+----------
 
-.. code-block:: python
-
-    trainer = pl.Trainer(accelerator="ipu", devices=1)
-
-----
+To enable PyTorch Lightning to utilize the IPU accelerator, simply provide ``accelerator="ipu"`` parameter to the Trainer class.
 
-Run on multiple IPUs
---------------------
 To use multiple IPUs set the devices to a number that is a power of 2 (i.e: 2, 4, 8, 16, ...)
 
 .. code-block:: python
 
-    trainer = pl.Trainer(accelerator="ipu", devices=8)
+    # run on as many IPUs as available by default
+    trainer = Trainer(accelerator="auto", devices="auto", strategy="auto")
+    # equivalent to
+    trainer = Trainer()
+
+    # run on one IPU
+    trainer = Trainer(accelerator="ipu", devices=1)
+    # run on multiple IPUs
+    trainer = Trainer(accelerator="ipu", devices=8)
+    # choose the number of devices automatically
+    trainer = Trainer(accelerator="ipu", devices="auto")
 
 ----
 

@@ -32,36 +32,26 @@ some subset of those 2048 cores.
 
 ----
 
-Run on 1 TPU core
------------------
-Enable the following Trainer arguments to run on 1 TPU.
-
-.. code::
-
-    trainer = Trainer(accelerator="tpu", devices=1)
-
-----
-
-Run on multiple TPU cores
--------------------------
-For multiple TPU cores, change the value of the devices flag.
-
-.. code::
-
-    trainer = Trainer(accelerator="tpu", devices=8)
-
-----
-
-Run on a specific TPU core
---------------------------
+Run on TPU cores
+----------------
 
-To run on a specific core, specify the index of the TPU core.
+To run on different cores, modify the ``devices`` argument.
 
 .. code-block:: python
 
-    trainer = pl.Trainer(accelerator="tpu", devices=[5])
+    # run on as many TPUs as available by default
+    trainer = Trainer(accelerator="auto", devices="auto", strategy="auto")
+    # equivalent to
+    trainer = Trainer()
 
-This example runs on the 5th core, not on five cores.
+    # run on one TPU core
+    trainer = Trainer(accelerator="tpu", devices=1)
+    # run on multiple TPU cores
+    trainer = Trainer(accelerator="tpu", devices=8)
+    # run on the 5th core
+    trainer = Trainer(accelerator="tpu", devices=[5])
+    # choose the number of cores automatically
+    trainer = Trainer(accelerator="tpu", devices="auto")
 
 ----
 

@@ -200,7 +200,7 @@ as well as custom accelerator instances.
     # Training with GPU Accelerator using the DistributedDataParallel strategy
     trainer = Trainer(devices=4, accelerator="gpu", strategy="ddp")
 
-.. note:: The ``"auto"`` option recognizes the machine you are on, and selects the respective ``Accelerator``.
+.. note:: The ``"auto"`` option recognizes the machine you are on, and selects the appropriate ``Accelerator``.
 
 .. code-block:: python
 
@@ -417,7 +417,7 @@ Number of devices to train on (``int``), which devices to train on (``list`` or
 
 .. code-block:: python
 
-    # If your machine has GPUs, it will use all the available GPUs for training
+    # Use whatever hardware your machine has available
     trainer = Trainer(devices="auto", accelerator="auto")
 
     # Training with CPU Accelerator using 1 process

@@ -52,6 +52,10 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 
 ### Changed
 
+
+- The `Trainer` now chooses `accelerator="auto", strategy="auto", devices="auto"` as defaults ([#16847](https://github.com/Lightning-AI/lightning/pull/16847))
+
+
 - "Native" suffix removal ([#16490](https://github.com/Lightning-AI/lightning/pull/16490))
  * `strategy="fsdp_native"` is now `strategy="fsdp"`
  * `strategy="fsdp_native_full_shard_offload"` is now `strategy="fsdp_cpu_offload"`