Skip to content

Add tutorial for user defined triton kernels #2783

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Apr 20, 2024

Conversation

oulgen
Copy link
Contributor

@oulgen oulgen commented Mar 1, 2024

No description provided.

Copy link

pytorch-bot bot commented Mar 1, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/2783

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 8ee52f7 with merge base 5fbef68 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@oulgen oulgen force-pushed the user_defined_kernel branch from 3dae479 to e4ff64e Compare March 1, 2024 20:54
Comment on lines 74 to 78
if not has_triton:
print("Skipping because triton is not supported on this device.")
else:
import triton
from triton import language as tl
Copy link
Contributor

@malfet malfet Mar 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not very readable. Why not

if not has_trition:
   print("Skipping because triton is not supported on this device.")
   sys.exit(1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other tutorials are also doing this: https://pytorch.org/tutorials/recipes/torch_logs.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me sad, let me try to propose a bit more elegant solution

Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to add larger preamble that references back documentation, explains what the triton is, how it will only work with GPUs, what auto-tuning is for, again, link back to some document about dynamic shapes, etc

Comment on lines 74 to 78
if not has_triton:
print("Skipping because triton is not supported on this device.")
else:
import triton
from triton import language as tl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me sad, let me try to propose a bit more elegant solution

@svekars svekars added the 2.3 label Mar 11, 2024
@svekars svekars changed the base branch from main to 2.3-RC-TEST March 19, 2024 22:51
…utorial.py


Merge a small fix to kick off the build
@oulgen oulgen marked this pull request as ready for review March 22, 2024 00:58
Copy link
Contributor

@svekars svekars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some editorial suggestions. Let me know if you have questions! Please add a card and an entry to recipes_index.html

# -*- coding: utf-8 -*-

"""
Using User Defined Triton Kernels with ``torch.compile``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Using User Defined Triton Kernels with ``torch.compile``
Using User-Defined Triton Kernels with ``torch.compile``

"""

######################################################################
# This tutorial explains how to use user defined Triton kernels with ``torch.compile``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add a better introduction here. Maybe something like this:

Suggested change
# This tutorial explains how to use user defined Triton kernels with ``torch.compile``.
# User-defined Triton kernels can be used to optimize specific parts of your
# model's computation. These kernels are written in Triton's language, which is designed
# to make it easier to achieve peak hardware performance. By using user-defined Triton
# kernels with ``torch.compile``, you can integrate these optimized computations into
# your PyTorch model, potentially achieving significant performance improvements.
#
# This recipes demonstrates how you can use user-defined Triton kernels with ``torch.compile``.

#
# In this example, we will use a simple vector addition kernel from the Triton documentation
# with ``torch.compile``.
# Reference: https://triton-lang.org/main/getting-started/tutorials/01-vector-add.html
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Reference: https://triton-lang.org/main/getting-started/tutorials/01-vector-add.html
# For reference, see `Triton documentation <https://triton-lang.org/main/getting-started/tutorials/01-vector-add.html>`__.


######################################################################
# Advanced Usage
# ------------
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be as long or longer than the title

Suggested change
# ------------
# ----------------------

# Advanced Usage
# ------------
#
# It is also possible to triton.autotune with ``torch.compile``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to expand the intro a bit. Maybe something like this:

Suggested change
# It is also possible to triton.autotune with ``torch.compile``.
# Triton's autotune feature is a powerful tool that automatically optimizes the configuration
# parameters of your Triton kernels. It explores a range of possible configurations and
# selects the one that delivers the best performance for your specific use case.
#
# When used with ``torch.compile``, ``triton.autotune`` can help ensure that your PyTorch
# model is running as efficiently as possible. Here is an example of using ``torch.compile``
# and ``triton.autotune``.

Comment on lines 123 to 131
# As for PyTorch 2.3, the user defined triton kernel support in ``torch.compile``
# composes with dynamic shapes, ``torch.autograd.Function``, JIT inductor and
# AOT inductor.
#
# The support for tensor subclasses and other advanced features currently do
# not exist.
# Support for ``triton.heuristics`` exists when it is used by itself or before
# ``triton.autotune``; however, support for using ``triton.heuristic`` after
# ``triton.autotune`` is not yet supported.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# As for PyTorch 2.3, the user defined triton kernel support in ``torch.compile``
# composes with dynamic shapes, ``torch.autograd.Function``, JIT inductor and
# AOT inductor.
#
# The support for tensor subclasses and other advanced features currently do
# not exist.
# Support for ``triton.heuristics`` exists when it is used by itself or before
# ``triton.autotune``; however, support for using ``triton.heuristic`` after
# ``triton.autotune`` is not yet supported.
# As of PyTorch 2.3, the support for user-defined Triton kernels in ``torch.compile``
# includes dynamic shapes, ``torch.autograd.Function``, JIT inductor, and AOT inductor.
# You can use these features together to build complex, high-performance models.
#
# However, there are certain limitations to be aware of:
#
# * **Tensor Subclasses:** Currently, there is no support for
# tensor subclasses and other advanced features.
# * **Triton Features:** While ``triton.heuristics`` can be used either standalone or
# before ``triton.autotune``, it cannot be used after ```triton.autotune``. This
# implies that if ``triton.heuristics`` and ``triton.autotune`` are to be used
# together, ``triton.heuristics`` must be used first.

# not exist.
# Support for ``triton.heuristics`` exists when it is used by itself or before
# ``triton.autotune``; however, support for using ``triton.heuristic`` after
# ``triton.autotune`` is not yet supported.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, need to add a Conclusion:

Suggested change
# ``triton.autotune`` is not yet supported.
# ``triton.autotune`` is not yet supported.
#
# Conclusion
# -----------
# In this recipe, we explored how to utilize user-defined Triton kernels
# with ``torch.compile``. We delved into the basic usage of a simple
# vector addition kernel and advanced usage involving Triton's autotune
# feature. We also discussed the composability of user-defined Triton
# kernels with other PyTorch features and highlighted some current limitations.

Can you also add what else the user should read?

Comment on lines 12 to 13
# .. note::
# This tutorial requires PyTorch 2.3 or later and a GPU that supports Triton.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have this in the Prerequisites section

Suggested change
# .. note::
# This tutorial requires PyTorch 2.3 or later and a GPU that supports Triton.

@oulgen oulgen requested a review from zou3519 March 22, 2024 17:41
@oulgen oulgen force-pushed the user_defined_kernel branch from ecca06a to 5ca9fbf Compare March 22, 2024 17:46
@svekars
Copy link
Contributor

svekars commented Mar 22, 2024

LGTM from the publishing perspective, please get a technical reviewer to approve. We should not merge until 2.3 binaries are available on the main branch.

Copy link
Contributor

@zou3519 zou3519 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great!

@svekars svekars deleted the branch pytorch:main April 19, 2024 15:59
@svekars svekars closed this Apr 19, 2024
@svekars svekars reopened this Apr 19, 2024
@svekars svekars changed the base branch from 2.3-RC-TEST to main April 19, 2024 16:29
Copy link
Contributor

@svekars svekars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor editorial fixes

@svekars svekars merged commit 6771cf5 into pytorch:main Apr 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants