Skip to content

Performance Tuning Guide is very out of date #2861

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
msaroufim opened this issue May 9, 2024 · 9 comments · Fixed by #2889 or #2912
Closed

Performance Tuning Guide is very out of date #2861

msaroufim opened this issue May 9, 2024 · 9 comments · Fixed by #2889 or #2912

Comments

@msaroufim
Copy link
Member

msaroufim commented May 9, 2024

🚀 Descirbe the improvement or the new tutorial

The first thing you see when you Google PyTorch performance is this. The recipe is well written but it's very much out of data today
https://pytorch.org/tutorials/recipes/recipes/tuning_guide.html

Some concrete things we should fix

  1. For fusions we should talk about torch.compile instead of jit.script
  2. We should mention overhead reduction with cudagraphs
  3. We should talk about the *-fast series as places people can learn more
  4. For CPU specific optimization the most important one is launcher core pinning so we should either make that a default or explain the point more
  5. Instead of the CPU section we can instead go more into the inductor CPU backend
  6. AMP section is fine but maybe expand to quantization
  7. DDP section needs to be moved somewhere else with some FSDP performance guide
  8. GPU sync section is good
  9. Mention tensor cores and how to enable them and why they're not enabled by default

cc @sekyondaMeta @svekars @kit1980 @drisspg who first made me aware of this with an internal note that was important enough to make public

Existing tutorials on this topic

No response

Additional context

No response

@svekars
Copy link
Contributor

svekars commented May 15, 2024

Related: #2695

@orion160
Copy link
Contributor

orion160 commented Jun 4, 2024

It's pretty interesting. I'll go over a PR on this days updating it!

@msaroufim
Copy link
Member Author

feel free to ping me here or on https://discord.gg/FBMQJQJn whenever you're ready for a review

@orion160
Copy link
Contributor

orion160 commented Jun 4, 2024

/assigntome

@orion160
Copy link
Contributor

orion160 commented Jun 5, 2024

@msaroufim What are the *-fast series?

@orion160
Copy link
Contributor

orion160 commented Jun 6, 2024

[X] Ops fusion with torch.compile -> done by @desertfire
[] CUDA graphs
[] *-fast -> ???
[] core pinning -> Already in the doc, put a minor section explaining it
[] inductor CPU backend -> ???
[] Tensor core explanation

@msaroufim
Copy link
Member Author

@orion160
Copy link
Contributor

orion160 commented Jun 7, 2024

Mmmmm I read it, I could mention it but it feels a bit out of place... The tuning perf leverages high level tweaks and these blog entries leverages optimizations in an adhoc model centric way

@orion160
Copy link
Contributor

orion160 commented Jun 7, 2024

It could be like an epilogue mentioning these case studies.

@msaroufim Do you think with that changes the issue can be complete?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants