Skip to content

PyTorch documentation updates #11739

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Feb 21, 2022
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 15 additions & 16 deletions docs/source/starter/introduction_guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
.. _introduction_guide:

#########################
Step-by-step walk-through
Step-by-step Walk-through
#########################
This guide will walk you through the core pieces of PyTorch Lightning.

Expand All @@ -16,7 +16,7 @@ We'll accomplish the following:
- Implement an MNIST classifier.
- Use inheritance to implement an AutoEncoder

.. note:: Any DL/ML PyTorch project fits into the Lightning structure. Here we just focus on 3 types
.. note:: Any DL/ML PyTorch project fits into the Lightning structure. Here we just focus on three types
of research to illustrate.

--------------
Expand All @@ -30,7 +30,7 @@ Installing Lightning
====================


Lightning is trivial to install. We recommend using conda environments
Lightning is easy to install. We recommend using conda environments

.. code-block:: bash

Expand All @@ -51,7 +51,7 @@ Or conda.

-------------

The research
The Research
============

The Model
Expand Down Expand Up @@ -99,7 +99,7 @@ Let's first start with the model. In this case, we'll design a 3-layer neural ne
return x

Notice this is a :doc:`lightning module <../common/lightning_module>` instead of a ``torch.nn.Module``. A LightningModule is
equivalent to a pure PyTorch Module except it has added functionality. However, you can use it **EXACTLY** the same as you would a PyTorch Module.
equivalent to a pure PyTorch module except it has added functionality. However, you can use it **exactly** the same as you would a PyTorch module.

.. testcode::

Expand All @@ -122,7 +122,7 @@ equivalent to a pure PyTorch Module except it has added functionality. However,
torch.Size([1, 10])


Now we add the ``training_step`` which has all our training loop logic:
Now, we add the ``training_step`` which has all our training loop logic:

.. testcode::

Expand All @@ -137,7 +137,7 @@ Now we add the ``training_step`` which has all our training loop logic:
Optimizer
---------

Next we choose which optimizer to use for training our system.
Next, we choose which optimizer to use for training our system.
In PyTorch, we do it as follows:

.. code-block:: python
Expand Down Expand Up @@ -217,7 +217,7 @@ Lightning operates on pure dataloaders. Here's the PyTorch code for loading MNIS
Processing...
Done!

You can use DataLoaders in 3 ways:
You can use DataLoaders in three ways:

1. Pass DataLoaders to .fit()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down Expand Up @@ -412,7 +412,7 @@ This code is not restricted which means it can be as complicated as a full seq-2

----------------

The engineering
The Engineering
===============

Training
Expand Down Expand Up @@ -501,16 +501,15 @@ Once your training starts, you can view the logs by using your favorite logger o

tensorboard --logdir ./lightning_logs


Which will generate automatic tensorboard logs (or with the logger of your choice).
this generates automatic tensorboard logs (or with the logger of your choice).

.. figure:: ../_static/images/mnist_imgs/mnist_tb.png
:alt: mnist CPU bar
:width: 500

|

But you can also use any of the :doc:`number of other loggers <../common/loggers>` we support.
You can also use any of the :doc:`number of other loggers <../common/loggers>` we support.


Train on CPU
Expand Down Expand Up @@ -1003,15 +1002,15 @@ Child Modules

----------

*********************
Why PyTorch Lightning
*********************
**********************
Why PyTorch Lightning?
**********************

a. Less boilerplate
===================

Research and production code starts with simple code, but quickly grows in complexity
once you add GPU training, 16-bit, checkpointing, logging, etc...
once you add GPU training, 16-bit, checkpointing, logging, and so on.

PyTorch Lightning implements these features for you and tests them rigorously to make sure you can
instead focus on the research idea.
Expand Down
58 changes: 29 additions & 29 deletions docs/source/starter/lightning_lite.rst
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ The ``run`` function contains custom training loop used to train ``MyModel`` on
Convert to LightningLite
========================

Here are 5 required steps to convert to :class:`~pytorch_lightning.lite.LightningLite`.
Here are five required steps to convert to :class:`~pytorch_lightning.lite.LightningLite`.

1. Subclass :class:`~pytorch_lightning.lite.LightningLite` and override its :meth:`~pytorch_lightning.lite.LightningLite.run` method.
2. Move the body of your existing ``run`` function into :class:`~pytorch_lightning.lite.LightningLite` ``run`` method.
Expand Down Expand Up @@ -127,7 +127,7 @@ Here are 5 required steps to convert to :class:`~pytorch_lightning.lite.Lightnin
That's all. You can now train on any kind of device and scale your training.

:class:`~pytorch_lightning.lite.LightningLite` takes care of device management, so you don't have to.
You should remove any device specific logic within your code.
You should remove any device-specific logic within your code.

Here is how to train on 8 GPUs with `torch.bfloat16 <https://pytorch.org/docs/1.10.0/generated/torch.Tensor.bfloat16.html>`_ precision:

Expand Down Expand Up @@ -189,7 +189,7 @@ If you require custom data or model device placement, you can deactivate
:class:`~pytorch_lightning.lite.LightningLite` automatic placement by doing
``self.setup_dataloaders(..., move_to_device=False)`` for the data and
``self.setup(..., move_to_device=False)`` for the model.
Futhermore, you can access the current device from ``self.device`` or
Furthermore, you can access the current device from ``self.device`` or
rely on :meth:`~pytorch_lightning.lite.LightningLite.to_device`
utility to move an object to the current device.

Expand All @@ -198,11 +198,11 @@ utility to move an object to the current device.

.. tip::

If you have hundreds or thousands of line within your :meth:`~pytorch_lightning.lite.LightningLite.run` function
If you have hundreds or thousands of lines within your :meth:`~pytorch_lightning.lite.LightningLite.run` function
and you are feeling weird about it then this is right feeling.
Back in 2019, our :class:`~pytorch_lightning.core.lightning.LightningModule` was getting larger
and we got the same feeling. So we started to organize our code for simplicity, interoperability and standardization.
This is definitely a good sign that you should consider refactoring your code and / or switch to
In 2019, our :class:`~pytorch_lightning.core.lightning.LightningModule` was getting larger
and we got the same feeling, so we started to organize our code for simplicity, interoperability and standardization.
This is definitely a good sign that you should consider refactoring your code and / or switching to
:class:`~pytorch_lightning.core.lightning.LightningModule` ultimately.


Expand All @@ -221,28 +221,28 @@ but there are several major challenges ahead of you now:
:header-rows: 0

* - Processes divergence
- This happens when processes execute a different section of the code due to different if/else conditions, race condition on existing files, etc., resulting in hanging.
- This happens when processes execute a different section of the code due to different if/else conditions, race conditions on existing files, etc., resulting in hanging.
* - Cross processes reduction
- Miscalculated metrics or gradients due to errors in their reduction.
* - Large sharded models
- Instantiation, materialization and state management of large models.
* - Rank 0 only actions
- Logging, profiling, etc.
- Logging, profiling, and so on.
* - Checkpointing / Early stopping / Callbacks / Logging
- Ability to easily customize your training behaviour and make it stateful.
- Ability to customize your training behavior easily and make it stateful.
* - Fault-tolerant training
- Ability to resume from a failure as if it never happened.


If you are facing one of those challenges then you are already meeting the limit of :class:`~pytorch_lightning.lite.LightningLite`.
If you are facing one of those challenges, then you are already meeting the limit of :class:`~pytorch_lightning.lite.LightningLite`.
We recommend you to convert to :doc:`Lightning <../starter/new-project>`, so you never have to worry about those.

----------

Convert to Lightning
====================

:class:`~pytorch_lightning.lite.LightningLite` is a stepping stone to fully transition to the Lightning API and benefit
:class:`~pytorch_lightning.lite.LightningLite` is a stepping stone to transition fully to the Lightning API and benefit
from its hundreds of features.

You can see our :class:`~pytorch_lightning.lite.LightningLite` class as a
Expand Down Expand Up @@ -333,7 +333,7 @@ Finally, change the :meth:`~pytorch_lightning.lite.LightningLite.run` into a
trainer.fit(LightningModel(), datamodule=BoringDataModule())


You have successfully converted to PyTorch Lightning and can now benefit from its hundred of features!
You have successfully converted to PyTorch Lightning, and can now benefit from its hundred of features!

----------

Expand All @@ -342,7 +342,7 @@ Lightning Lite Flags
********************

Lite is specialized in accelerated distributed training and inference. It offers you convenient ways to configure
your device and communication strategy, and to seamlessly switch from one to the other. The terminology and usage is
your device and communication strategy and to switch seamlessly from one to the other. The terminology and usage are
identical to Lightning, which means minimum effort for you to convert when you decide to do so.


Expand All @@ -365,7 +365,7 @@ Choose one of ``"cpu"``, ``"gpu"``, ``"tpu"``, ``"auto"`` (IPU support is coming
# Running with GPU Accelerator using the DistributedDataParallel strategy
lite = Lite(devices=4, accelerator="gpu", strategy="ddp")

The ``"auto"`` option recognizes the machine you are on, and selects the available accelerator.
The ``"auto"`` option recognizes the machine you are on and selects the available accelerator.

.. code-block:: python

Expand Down Expand Up @@ -416,7 +416,7 @@ Configure the devices to run on. Can be of type:
# equivalent
lite = Lite(devices=0)

# int: run on 2 GPUs
# int: run on to GPUs
lite = Lite(devices=2, accelerator="gpu")

# list: run on GPUs 1, 4 (by bus ordering)
Expand All @@ -436,7 +436,7 @@ Shorthand for setting ``devices=X`` and ``accelerator="gpu"``.

.. code-block:: python

# Run on 2 GPUs
# Run on two GPUs
lite = Lite(gpus=2)

# Equivalent
Expand All @@ -450,7 +450,7 @@ Shorthand for ``devices=X`` and ``accelerator="tpu"``.

.. code-block:: python

# Run on 8 TPUs
# Run on eight TPUs
lite = Lite(tpu_cores=8)

# Equivalent
Expand Down Expand Up @@ -479,7 +479,7 @@ precision
=========

Lightning Lite supports double precision (64), full precision (32), or half precision (16) operation (including `bfloat16 <https://pytorch.org/docs/1.10.0/generated/torch.Tensor.bfloat16.html>`_).
Half precision, or mixed precision, is the combined use of 32 and 16 bit floating points to reduce the memory footprint during model training.
Half precision, or mixed precision, is the combined use of 32 and 16-bit floating points to reduce the memory footprint during model training.
This can result in improved performance, achieving significant speedups on modern GPUs.

.. code-block:: python
Expand Down Expand Up @@ -536,7 +536,7 @@ Lightning Lite Methods
run
===

The run method servers two purposes:
The run method serves two purposes:

1. Override this method from the :class:`~pytorch_lightning.lite.lite.LightningLite` class and put your
training (or inference) code inside.
Expand All @@ -551,7 +551,7 @@ You can optionally pass arguments to the run method. For example, the hyperparam

class Lite(LightningLite):

# Input arguments are optional, put whatever you need
# Input arguments are optional; put whatever you need
def run(self, learning_rate, num_layers):
"""Here goes your training loop"""

Expand All @@ -563,15 +563,15 @@ You can optionally pass arguments to the run method. For example, the hyperparam
setup
=====

Setup a model and corresponding optimizer(s). If you need to setup multiple models, call ``setup()`` on each of them.
Set up a model and corresponding optimizer(s). If you need to set up multiple models, call ``setup()`` on each of them.
Moves the model and optimizer to the correct device automatically.

.. code-block:: python

model = nn.Linear(32, 64)
optimizer = torch.optim.SGD(model.parameters(), lr=0.001)

# Setup model and optimizer for accelerated training
# Set up model and optimizer for accelerated training
model, optimizer = self.setup(model, optimizer)

# If you don't want Lite to set the device
Expand All @@ -584,8 +584,8 @@ cast automatically.
setup_dataloaders
=================

Setup one or multiple dataloaders for accelerated operation. If you are running a distributed strategy (e.g., DDP), Lite
will replace the sampler automatically for you. In addition, the dataloader will be configured to move the returned
Set up one or multiple dataloaders for accelerated operation. If you are running a distributed strategy (e.g., DDP), Lite
replaces the sampler automatically for you. In addition, the dataloader will be configured to move the returned
data tensors to the correct device automatically.

.. code-block:: python
Expand All @@ -605,7 +605,7 @@ data tensors to the correct device automatically.
backward
========

This replaces any occurences of ``loss.backward()`` and will make your code accelerator and precision agnostic.
This replaces any occurrences of ``loss.backward()`` and makes your code accelerator and precision agnostic.

.. code-block:: python

Expand Down Expand Up @@ -683,7 +683,7 @@ save
====

Save contents to a checkpoint. Replaces all occurrences of ``torch.save(...)`` in your code. Lite will take care of
handling the saving part correctly, no matter if you are running single device, multi-device or multi-node.
handling the saving part correctly, no matter if you are running a single device, multi-devices or multi-nodes.

.. code-block:: python

Expand All @@ -694,8 +694,8 @@ handling the saving part correctly, no matter if you are running single device,
load
====

Load checkpoint contents from a file. Replaces all occurences of ``torch.load(...)`` in your code. Lite will take care of
handling the loading part correctly, no matter if you are running single device, multi-device or multi-node.
Load checkpoint contents from a file. Replaces all occurrences of ``torch.load(...)`` in your code. Lite will take care of
handling the loading part correctly, no matter if you are running a single device, multi-devices or multi-nodes.

.. code-block:: python

Expand Down
Loading