Skip to content

Commit 9eb2c22

Browse files
authored
Merge branch 'main' into angelayi/export4
2 parents 74e9d00 + 0f40dc9 commit 9eb2c22

File tree

7 files changed

+368
-0
lines changed

7 files changed

+368
-0
lines changed

_static/img/distributed/fsdp_tp.png

250 KB
Loading
290 KB
Loading
774 KB
Loading

distributed/home.rst

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ PyTorch with each method having their advantages in certain use cases:
1313

1414
* `DistributedDataParallel (DDP) <#learn-ddp>`__
1515
* `Fully Sharded Data Parallel (FSDP) <#learn-fsdp>`__
16+
* `Tensor Parallel (TP) <#learn-tp>`__
1617
* `Device Mesh <#device-mesh>`__
1718
* `Remote Procedure Call (RPC) distributed training <#learn-rpc>`__
1819
* `Custom Extensions <#custom-extensions>`__
@@ -84,6 +85,24 @@ Learn FSDP
8485
+++
8586
:octicon:`code;1em` Code
8687

88+
89+
.. _learn-tp:
90+
91+
Learn Tensor Parallel (TP)
92+
---------------
93+
94+
.. grid:: 3
95+
96+
.. grid-item-card:: :octicon:`file-code;1em`
97+
Large Scale Transformer model training with Tensor Parallel (TP)
98+
:link: https://pytorch.org/tutorials/intermediate/TP_tutorial.html
99+
:link-type: url
100+
101+
This tutorial demonstrates how to train a large Transformer-like model across hundreds to thousands of GPUs using Tensor Parallel and Fully Sharded Data Parallel.
102+
+++
103+
:octicon:`code;1em` Code
104+
105+
87106
.. _device-mesh:
88107

89108
Learn DeviceMesh

index.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ Welcome to PyTorch Tutorials
33

44
What's new in PyTorch tutorials?
55

6+
* `Large Scale Transformer model training with Tensor Parallel <https://pytorch.org/tutorials/intermediate/TP_tutorial.html>`__
67
* `PyTorch Inference Performance Tuning on AWS Graviton Processors <https://pytorch.org/tutorials/recipes/inference_tuning_on_aws_graviton.html>`__
78
* `Using TORCH_LOGS python API with torch.compile <https://pytorch.org/tutorials/recipes/torch_logs.html>`__
89
* `PyTorch 2 Export Quantization with X86 Backend through Inductor <https://pytorch.org/tutorials/prototype/pt2e_quant_x86_inductor.html>`__
@@ -696,6 +697,13 @@ What's new in PyTorch tutorials?
696697
:link: intermediate/dist_tuto.html
697698
:tags: Parallel-and-Distributed-Training
698699

700+
.. customcarditem::
701+
:header: Large Scale Transformer model training with Tensor Parallel
702+
:card_description: Learn how to train large models with Tensor Parallel package.
703+
:image: _static/img/thumbnails/cropped/Large-Scale-Transformer-model-training-with-Tensor-Parallel.png
704+
:link: intermediate/TP_tutorial.html
705+
:tags: Parallel-and-Distributed-Training
706+
699707
.. customcarditem::
700708
:header: Customize Process Group Backends Using Cpp Extensions
701709
:card_description: Extend ProcessGroup with custom collective communication implementations.
@@ -1081,6 +1089,7 @@ Additional Resources
10811089
intermediate/dist_tuto
10821090
intermediate/FSDP_tutorial
10831091
intermediate/FSDP_adavnced_tutorial
1092+
intermediate/TP_tutorial
10841093
intermediate/process_group_cpp_extension_tutorial
10851094
intermediate/rpc_tutorial
10861095
intermediate/rpc_param_server_tutorial

intermediate_source/TP_tutorial.rst

Lines changed: 340 additions & 0 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)