-
-
Notifications
You must be signed in to change notification settings - Fork 670
Remove local sharing logic #2540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
i64.const 0 | ||
i64.store $0 | ||
local.get $0 | ||
i64.const 0 | ||
i64.store $0 offset=8 | ||
i32.const 0 | ||
i32.const 108 | ||
memory.fill $0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it looks like this signifintally affect to some cases like this one. When we got memory.fill
instead of two stores
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be a side-effect of the shadow stack now being less compact, so where previously it had two slots it now has more, producing a fill. I also had to increase the default stack size because of it. Would say the stack effects are in fact the most notable downside.
In a sense, the compiler was doing some sort of local optimization before, even when not optimizing, which now is gone, yet Binaryen cannot reason about the shadow stack (which is just memory accesses), leading to the effects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After increased stacksize this downgrade change still present, hmm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there are more locals now, so shadow stack frames become larger. The cause is that locals are no longer shared, but there is one per named local now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that's a good change. If binaryen could handle all this unique locals into the same number as before (at least for release) well maybe this simplification would make sense, but apparently this simplification will increase optimization time, increase size and degrade speed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The shadow stack is already quite a suffering on its own (that's why using the minimal runtime has its benefits), and it appears that frame size ultimately doesn't matter much except that the shadow stack becomes larger when there are many scoped locals. On the contrary, this change unblocks quite a few annoying things, like that we now can have actual debug names on all local, and ofc that all this complexity, that like one person on the planet has an idea how to approach, is gone from the compiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're share perf data for bootstraping. But as far as I remember it doesn't use incremental GC and therefore shadow stack right?
It uses incremental GC:
assemblyscript/src/asconfig.json
Line 10 in 78b2d1a
"runtime": "incremental", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To give an outlook towards what this ultimately enables: Closures need to retroactively lift locals to a captured environment, and the obvious way to do this in an almost-single-pass compiler is to compile a function, find that it closed over a variable, and then amend any function also using that closed over variable. This is easy if each local has a unique index, but extremely hard with local sharing in place. In a sense, we previously performed optimizations that lost critical information, preventing us from creating passes that modify already generated code. No sane compiler would actually do this, I'd say, and would actually do the opposite, like utilize SSA to actually be able to reason about this stuff. Now we don't need SSA for closures, but compacting locals would make this task incredibly hard. Makes a lot of sense to remove this optimization, as it simply happens too early in the pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I see a reason. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course you are right that the ideal opportunity to remove this is once we generate Wasm GC code, so Binaryen has all the information, optimizing before we'd actually downlevel to a bundled runtime with a shadow stack. Then, Binaryen does the compaction, and we only generate the shadow stack after, yielding about the same result as before. That's the ultimate goal of course, but still we'll have to start somewhen, and the data suggests that this somewhen can as well be now so we can move forward.
It looks like top-function |
Using |
New version of #1260, that drops all the logic to add and free temporary locals, instead using a new and unique local for each
var
,let
,const
respectively temporary. Has a couple advantages:let
everywhere, asvar
is no longer necessary to have debug namesThere is one drawback: The sharing logic we had before was something an optimizer would typically do (similar to optimize-locals). When it gets to the shadow stack, it helped to keep stack frames compact before Binaryen, since Binaryen can no longer reason about local spilling (which just becomes loads and stores) from just the instructions. Frame size about doubles it appears, while performance should remain about the same given other more significant costs.
Closes #1260