Skip to content

Regalloc: optimize stack slots #1464

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 16, 2023
Merged

Regalloc: optimize stack slots #1464

merged 7 commits into from
Jun 16, 2023

Conversation

xclerc
Copy link
Contributor

@xclerc xclerc commented Jun 8, 2023

This pull request adds a post-processing pass to
the register allocation, whose goal is to try and
reduce the number of stack slots.

The basic idea is that, within the same register
class, it is possible to use the very same slot for
several registers if their use intervals do not
overlap.

The new pass is essentially a simplified version
of linscan; the main differences are that:

  • we are interested in stack slots rather than
    registers;
  • we do not distinguish the different kinds of uses
    (arg/res/live) when computing the intervals;
  • we do not track potential "holes" in the intervals;
  • we know that, at worst, we do not save any slots but
    there is never a reason to restart the computation.

The new pass is guarded by a new "regalloc-param",
namely STACK_SLOTS_OPTIM, to ease testing, but the
pass is expected to be cheap enough (in particular
because the number of slots is often fairly low) to
be always enabled once we are confident it is correct.
On that topic, the pass is technically part of the
register allocation and hence covered by the validator.

The effect of this pull request has been measured by
counting the total number of slots in all the functions
of the compiler distribution. When comparing to upstream:

  • when the optimization is disabled, the IRC allocator
    uses 40% more slots;
  • when the optimization is enabled, the IRC allocator
    uses 6% fewer slots.

@xclerc
Copy link
Contributor Author

xclerc commented Jun 8, 2023

note: b023145 only move stack slots to
a new file and e770bde introduces the
new pass

Copy link
Contributor

@gretay-js gretay-js left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Please rebase this PR against main (it's currently against your private branch).

Comparing this PR to #1399, looks like both PRs use the same algorithm and should have the same effect on the number of stack slots, and this PR is a postprocessing step so not limited to linscan, which is nice.

While I don't expect performance improvements in generated code at all from this optimization, I hope that this PR is enough to avoid long frames #797 in most if not all cases we have seen. I checked it for #1399 on a couple of examples and they were all below long frames threshold.

In this PR, Intervals.t and Buckets.t tables are proportional to the number of stack slots, which can be pretty big for long frames. It may also be a problem in 1399 with the free list, not sure.

@xclerc
Copy link
Contributor Author

xclerc commented Jun 9, 2023

In this PR, Intervals.t and Buckets.t tables are proportional to the number of stack slots, which can be pretty big for long frames. It may also be a problem in 1399 with the free list, not sure.

Out of curiosity, what is the largest value you have seen?
(on the compiler distribution, I saw nothing above 400.)

@xclerc xclerc mentioned this pull request Jun 14, 2023
@gretay-js
Copy link
Contributor

In this PR, Intervals.t and Buckets.t tables are proportional to the number of stack slots, which can be pretty big for long frames. It may also be a problem in 1399 with the free list, not sure.

Out of curiosity, what is the largest value you have seen? (on the compiler distribution, I saw nothing above 400.)

I don't have the numbers any more, but after PR1339 it was still in the thousands (not sure if it was the stack size or number of slots).

@xclerc
Copy link
Contributor Author

xclerc commented Jun 14, 2023

(not sure if it was the stack size or number of slots)

Indeed, my hunch was that what matters for the limit
is ~the number of points multiplied by the average
number of slots, and that the latter would remain low.

@xclerc xclerc force-pushed the remote-old-irc-split branch 2 times, most recently from 77169fd to a60e772 Compare June 16, 2023 09:17
@xclerc xclerc force-pushed the optmize-stack-slots branch from a7ee462 to b01633d Compare June 16, 2023 09:32
@xclerc xclerc changed the base branch from remote-old-irc-split to main June 16, 2023 09:32
@xclerc xclerc force-pushed the optmize-stack-slots branch from 6c14159 to cc5ee3c Compare June 16, 2023 10:31
Copy link
Collaborator

@mshinwell mshinwell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dune and CI only

@xclerc xclerc merged commit 36eb121 into main Jun 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants