Skip to content

Commit fd25e06

Browse files
authored
Merge branch 'main' into pruning_tutorial-fix
2 parents 06407ac + 53e4142 commit fd25e06

File tree

2 files changed

+5
-8
lines changed

2 files changed

+5
-8
lines changed

Diff for: intermediate_source/FSDP_adavnced_tutorial.rst

+4-4
Original file line numberDiff line numberDiff line change
@@ -74,8 +74,8 @@ summarization using WikiHow dataset. The main focus of this tutorial is to
7474
highlight different available features in FSDP that are helpful for training
7575
large scale model above 3B parameters. Also, we cover specific features for
7676
Transformer based models. The code for this tutorial is available in `Pytorch
77-
Examples
78-
<https://github.com/HamidShojanazeri/examples/tree/FSDP_example/distributed/FSDP/>`__.
77+
examples
78+
<https://github.com/pytorch/examples/tree/main/distributed/FSDP/>`__.
7979

8080

8181
*Setup*
@@ -97,13 +97,13 @@ Please create a `data` folder, download the WikiHow dataset from `wikihowAll.csv
9797
`wikihowSep.cs <https://ucsb.app.box.com/s/7yq601ijl1lzvlfu4rjdbbxforzd2oag>`__,
9898
and place them in the `data` folder. We will use the wikihow dataset from
9999
`summarization_dataset
100-
<https://github.com/HamidShojanazeri/examples/blob/FSDP_example/distributed/FSDP/summarization_dataset.py>`__.
100+
<https://github.com/pytorch/examples/blob/main/distributed/FSDP/summarization_dataset.py>`__.
101101

102102
Next, we add the following code snippets to a Python script “T5_training.py”.
103103

104104
.. note::
105105
The full source code for this tutorial is available in `PyTorch examples
106-
<https://github.com/HamidShojanazeri/examples/tree/FSDP_example/distributed/FSDP>`__.
106+
<https://github.com/pytorch/examples/tree/main/distributed/FSDP/>`__.
107107

108108
1.3 Import necessary packages:
109109

Diff for: recipes_source/recipes/tuning_guide.py

+1-4
Original file line numberDiff line numberDiff line change
@@ -193,15 +193,12 @@ def fused_gelu(x):
193193
#
194194
# numactl --cpunodebind=N --membind=N python <pytorch_script>
195195

196-
###############################################################################
197-
# More detailed descriptions can be found `here <https://software.intel.com/content/www/us/en/develop/articles/how-to-get-better-performance-on-pytorchcaffe2-with-intel-acceleration.html>`_.
198-
199196
###############################################################################
200197
# Utilize OpenMP
201198
# ~~~~~~~~~~~~~~
202199
# OpenMP is utilized to bring better performance for parallel computation tasks.
203200
# ``OMP_NUM_THREADS`` is the easiest switch that can be used to accelerate computations. It determines number of threads used for OpenMP computations.
204-
# CPU affinity setting controls how workloads are distributed over multiple cores. It affects communication overhead, cache line invalidation overhead, or page thrashing, thus proper setting of CPU affinity brings performance benefits. ``GOMP_CPU_AFFINITY`` or ``KMP_AFFINITY`` determines how to bind OpenMP* threads to physical processing units. Detailed information can be found `here <https://software.intel.com/content/www/us/en/develop/articles/how-to-get-better-performance-on-pytorchcaffe2-with-intel-acceleration.html>`_.
201+
# CPU affinity setting controls how workloads are distributed over multiple cores. It affects communication overhead, cache line invalidation overhead, or page thrashing, thus proper setting of CPU affinity brings performance benefits. ``GOMP_CPU_AFFINITY`` or ``KMP_AFFINITY`` determines how to bind OpenMP* threads to physical processing units.
205202

206203
###############################################################################
207204
# With the following command, PyTorch run the task on N OpenMP threads.

0 commit comments

Comments
 (0)