pytorch
diff --git a/‎_static/img/distributed/fsdp_tp.png
250 KB b/‎_static/img/distributed/fsdp_tp.png
250 KB
diff --git a/‎_static/img/distributed/loss_parallel.png
290 KB b/‎_static/img/distributed/loss_parallel.png
290 KB
diff --git a/‎_static/img/distributed/megatron_lm.png
774 KB b/‎_static/img/distributed/megatron_lm.png
774 KB
diff --git a/‎_static/img/thumbnails/cropped/Large-Scale-Transformer-model-training-with-Tensor-Parallel.png
34.9 KB b/‎_static/img/thumbnails/cropped/Large-Scale-Transformer-model-training-with-Tensor-Parallel.png
34.9 KB
diff --git a/‎distributed/home.rst
Lines changed: 19 additions & 0 deletions b/‎distributed/home.rst
Lines changed: 19 additions & 0 deletions
diff --git a/‎index.rst
Lines changed: 9 additions & 0 deletions b/‎index.rst
Lines changed: 9 additions & 0 deletions
diff --git a/‎intermediate_source/TP_tutorial.rst
Lines changed: 340 additions & 0 deletions b/‎intermediate_source/TP_tutorial.rst
Lines changed: 340 additions & 0 deletions
@@ -13,6 +13,7 @@ PyTorch with each method having their advantages in certain use cases:
 
 * `DistributedDataParallel (DDP) <#learn-ddp>`__
 * `Fully Sharded Data Parallel (FSDP) <#learn-fsdp>`__
+* `Tensor Parallel (TP) <#learn-tp>`__
 * `Device Mesh <#device-mesh>`__
 * `Remote Procedure Call (RPC) distributed training <#learn-rpc>`__
 * `Custom Extensions <#custom-extensions>`__
@@ -84,6 +85,24 @@ Learn FSDP
         +++
         :octicon:`code;1em` Code
 
+
+.. _learn-tp:
+
+Learn Tensor Parallel (TP)
+---------------
+
+.. grid:: 3
+
+     .. grid-item-card:: :octicon:`file-code;1em`
+        Large Scale Transformer model training with Tensor Parallel (TP)
+        :link: https://pytorch.org/tutorials/intermediate/TP_tutorial.html
+        :link-type: url
+
+        This tutorial demonstrates how to train a large Transformer-like model across hundreds to thousands of GPUs using Tensor Parallel and Fully Sharded Data Parallel.
+        +++
+        :octicon:`code;1em` Code
+
+
 .. _device-mesh:
 
 Learn DeviceMesh
 
@@ -3,6 +3,7 @@ Welcome to PyTorch Tutorials
 
 What's new in PyTorch tutorials?
 
+* `Large Scale Transformer model training with Tensor Parallel <https://pytorch.org/tutorials/intermediate/TP_tutorial.html>`__
 * `PyTorch Inference Performance Tuning on AWS Graviton Processors <https://pytorch.org/tutorials/recipes/inference_tuning_on_aws_graviton.html>`__
 * `Using TORCH_LOGS python API with torch.compile <https://pytorch.org/tutorials/recipes/torch_logs.html>`__
 * `PyTorch 2 Export Quantization with X86 Backend through Inductor <https://pytorch.org/tutorials/prototype/pt2e_quant_x86_inductor.html>`__
@@ -696,6 +697,13 @@ What's new in PyTorch tutorials?
    :link: intermediate/dist_tuto.html
    :tags: Parallel-and-Distributed-Training
 
+.. customcarditem::
+   :header: Large Scale Transformer model training with Tensor Parallel
+   :card_description: Learn how to train large models with Tensor Parallel package.
+   :image: _static/img/thumbnails/cropped/Large-Scale-Transformer-model-training-with-Tensor-Parallel.png
+   :link: intermediate/TP_tutorial.html
+   :tags: Parallel-and-Distributed-Training
+
 .. customcarditem::
    :header: Customize Process Group Backends Using Cpp Extensions
    :card_description: Extend ProcessGroup with custom collective communication implementations.
@@ -1081,6 +1089,7 @@ Additional Resources
    intermediate/dist_tuto
    intermediate/FSDP_tutorial
    intermediate/FSDP_adavnced_tutorial
+   intermediate/TP_tutorial
    intermediate/process_group_cpp_extension_tutorial
    intermediate/rpc_tutorial
    intermediate/rpc_param_server_tutorial