-
Notifications
You must be signed in to change notification settings - Fork 89
Regalloc: optimize stack slots #1464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
Please rebase this PR against main
(it's currently against your private branch).
Comparing this PR to #1399, looks like both PRs use the same algorithm and should have the same effect on the number of stack slots, and this PR is a postprocessing step so not limited to linscan, which is nice.
While I don't expect performance improvements in generated code at all from this optimization, I hope that this PR is enough to avoid long frames #797 in most if not all cases we have seen. I checked it for #1399 on a couple of examples and they were all below long frames threshold.
In this PR, Intervals.t
and Buckets.t
tables are proportional to the number of stack slots, which can be pretty big for long frames. It may also be a problem in 1399 with the free list, not sure.
Out of curiosity, what is the largest value you have seen? |
I don't have the numbers any more, but after PR1339 it was still in the thousands (not sure if it was the stack size or number of slots). |
Indeed, my hunch was that what matters for the limit |
77169fd
to
a60e772
Compare
a7ee462
to
b01633d
Compare
6c14159
to
cc5ee3c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dune and CI only
This pull request adds a post-processing pass to
the register allocation, whose goal is to try and
reduce the number of stack slots.
The basic idea is that, within the same register
class, it is possible to use the very same slot for
several registers if their use intervals do not
overlap.
The new pass is essentially a simplified version
of linscan; the main differences are that:
registers;
(arg/res/live) when computing the intervals;
there is never a reason to restart the computation.
The new pass is guarded by a new "regalloc-param",
namely
STACK_SLOTS_OPTIM
, to ease testing, but thepass is expected to be cheap enough (in particular
because the number of slots is often fairly low) to
be always enabled once we are confident it is correct.
On that topic, the pass is technically part of the
register allocation and hence covered by the validator.
The effect of this pull request has been measured by
counting the total number of slots in all the functions
of the compiler distribution. When comparing to upstream:
uses 40% more slots;
uses 6% fewer slots.