-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[MachineLICM][AArch64] Hoist COPY instructions with other uses in the loop #71403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
475cc90
Squashed commit of the following:
rin-arm fa91086
Defer IsLoopInvariantInst check until after the cheaper register checks
rin-arm 0147f50
Value capture of CurLoop
rin-arm 55468a2
Merge branch 'main' into hoist_dups
616f3c7
Fix failing tests added after rebase with main
rin-arm File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For info: Downstream this caused regressions in some of our benchmarks. Not sure if you need to care about that (we will probably guard this with a check for our downstream target), but thought it might be nice to mention it.
Haven't fully investigated what happens, but I think that hoisting the COPY increases register pressure resulting in spill. The COPY instructions I see that are hoisted in that benchmark can be mapped to two categories:
So for example hoisting the cross register bank COPY results in increasing the register pressure for the general gn registers in the path leading up to the use. Similarly, by hoisting the subreg extract the register pressure on the gn registers increase.
I wonder if the heuristic here perhaps should look closer at the using instructions to see if they actually can be hoisted as well? After all, we are in the path where CanCauseHighRegPressure has returned true. So traditinonally we have been more careful here and only hoisted trivially rematerializable MI:s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello. From what I've seen in our benchmarks this has been positive, but there is often some noise from hoisting/sinking. You are right that this could be more conservative, but in our case cross register bank copies will be relatively expensive and we would want to hoist them if we could. I'm not sure about the subreg extracts, but a lot of COPYs are removed prior to register allocation and it would be good if it knew where best to re-add them, if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a bit tricky of course, depending on the properties of the target. For a VLIW target like ours the COPY could be really cheap, at least if it can be executed in parallel with something else without increasing latency. A cross-bank copy might actually be fee (zero cycles) even if it is inside the loop. OTOH. if we need to spill/reload a register, then that could be much more expensive, even if it is hoisted to some outer loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want to check my PR #81735 that is intended to fix the regression caused by spilling in AMDGPU target. It might help you too.