Skip to content

Remove local sharing logic #2540

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Oct 23, 2022
Merged

Remove local sharing logic #2540

merged 20 commits into from
Oct 23, 2022

Conversation

dcodeIO
Copy link
Member

@dcodeIO dcodeIO commented Oct 21, 2022

New version of #1260, that drops all the logic to add and free temporary locals, instead using a new and unique local for each var, let, const respectively temporary. Has a couple advantages:

  • Compiler becomes much simpler
    • No more need to keep track of temporaries to then free locals and scopes etc.
    • No more weird side-effects from local reuse if compilation order is special
    • Generally more reliably because less can go wrong
  • Named locals, even if scoped, maintain their name in debug info now, since there's no more sharing
    • Can now simply use let everywhere, as var is no longer necessary to have debug names
  • Unique locals aid future implementation of local-dependent features, say closures

There is one drawback: The sharing logic we had before was something an optimizer would typically do (similar to optimize-locals). When it gets to the shadow stack, it helped to keep stack frames compact before Binaryen, since Binaryen can no longer reason about local spilling (which just becomes loads and stores) from just the instructions. Frame size about doubles it appears, while performance should remain about the same given other more significant costs.

Closes #1260

  • I've read the contributing guidelines
  • I've added my name and email to the NOTICE file

Comment on lines -1649 to +1651
i64.const 0
i64.store $0
local.get $0
i64.const 0
i64.store $0 offset=8
i32.const 0
i32.const 108
memory.fill $0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it looks like this signifintally affect to some cases like this one. When we got memory.fill instead of two stores

Copy link
Member Author

@dcodeIO dcodeIO Oct 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a side-effect of the shadow stack now being less compact, so where previously it had two slots it now has more, producing a fill. I also had to increase the default stack size because of it. Would say the stack effects are in fact the most notable downside.

In a sense, the compiler was doing some sort of local optimization before, even when not optimizing, which now is gone, yet Binaryen cannot reason about the shadow stack (which is just memory accesses), leading to the effects.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After increased stacksize this downgrade change still present, hmm

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there are more locals now, so shadow stack frames become larger. The cause is that locals are no longer shared, but there is one per named local now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that's a good change. If binaryen could handle all this unique locals into the same number as before (at least for release) well maybe this simplification would make sense, but apparently this simplification will increase optimization time, increase size and degrade speed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The shadow stack is already quite a suffering on its own (that's why using the minimal runtime has its benefits), and it appears that frame size ultimately doesn't matter much except that the shadow stack becomes larger when there are many scoped locals. On the contrary, this change unblocks quite a few annoying things, like that we now can have actual debug names on all local, and ofc that all this complexity, that like one person on the planet has an idea how to approach, is gone from the compiler.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're share perf data for bootstraping. But as far as I remember it doesn't use incremental GC and therefore shadow stack right?

It uses incremental GC:

"runtime": "incremental",

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To give an outlook towards what this ultimately enables: Closures need to retroactively lift locals to a captured environment, and the obvious way to do this in an almost-single-pass compiler is to compile a function, find that it closed over a variable, and then amend any function also using that closed over variable. This is easy if each local has a unique index, but extremely hard with local sharing in place. In a sense, we previously performed optimizations that lost critical information, preventing us from creating passes that modify already generated code. No sane compiler would actually do this, I'd say, and would actually do the opposite, like utilize SSA to actually be able to reason about this stuff. Now we don't need SSA for closures, but compacting locals would make this task incredibly hard. Makes a lot of sense to remove this optimization, as it simply happens too early in the pipeline.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I see a reason. Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course you are right that the ideal opportunity to remove this is once we generate Wasm GC code, so Binaryen has all the information, optimizing before we'd actually downlevel to a bundled runtime with a shadow stack. Then, Binaryen does the compaction, and we only generate the shadow stack after, yielding about the same result as before. That's the ultimate goal of course, but still we'll have to start somewhen, and the data suggests that this somewhen can as well be now so we can move forward.

@dcodeIO dcodeIO marked this pull request as ready for review October 22, 2022 22:54
@dcodeIO dcodeIO requested a review from MaxGraey October 23, 2022 11:24
@MaxGraey
Copy link
Member

It looks like top-function var is not recommended anymore and let / const more preferable? Perhaps we should add according lint rule?

@dcodeIO
Copy link
Member Author

dcodeIO commented Oct 23, 2022

Using let everywhere, now that all locals get a proper debug name, makes sense to aid refactoring, i.e. no more switching between var and let when moving code in or out of an inner scope. Makes sense now to stick to let throughout the compiler's codebase, though tests may differ of course so we test both let and var. Fine with a lint rule for this.

@dcodeIO dcodeIO merged commit 8bb4fae into main Oct 23, 2022
@HerrCai0907 HerrCai0907 deleted the simplify-locals branch October 17, 2023 08:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants