Skip to content

Propose (Load,Store)WithOffset #28

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 5, 2015
Merged

Conversation

lukewagner
Copy link
Member

This is a proposal for discussion, relevant to the separate heap/OOB discussion in #23.

A minor side benefit is that, because most loads/stores have a non-zero immediate offset (based on a trivial pattern match), the polyfill prototype shows a 4% size win from this opcode (compared to Load(Add(index, Lit(offset)))).

@titzer
Copy link

titzer commented May 4, 2015

Would it be useful to allow offsets that weren't literals as well?

@lukewagner
Copy link
Member Author

Good question. The worry is that we'd impose extra work for some OS/platform/impl-strategy. The good thing about a constant offset is that, when using dynamic branching, we can always just do 1 bounds check (against $(heapLength - imm)) and this should be GVNable (by taking max(imm), subject to certain dominance requirements). But, if the offset is dynamic, I don't see how we'd get around, in general, needing an additional jump-if-overflow (after the add) followed by the normal bounds check.

@kripken
Copy link
Member

kripken commented May 4, 2015

Nice.

@titzer
Copy link

titzer commented May 4, 2015

W.r.t combining and hoisting bounds checks: we explored many such tricks in TurboFan. It gets really tricky if you want to guarantee the out-of-bounds error occurs at exactly the right instruction. You also don't want to generate errors early (i.e. outside of loops) if the program would not actually have executed an out-of-bounds access.

@lukewagner
Copy link
Member Author

Yes, those have been concerns for us as well and that specifically was the motivation for the "cannot execute module after OOB" proposal. That way, you can hoist the bounds check caring only about dominance and external visibility (via calls to imports); once the OOB is detected, noone can see exactly where it happened or what had already executed. For the loop case, the "loop rotation" technique Dan mentioned in #27 would allow the bounds check to be pulled out of the loop but still be guarded by the condition.

@titzer
Copy link

titzer commented May 4, 2015

The check could still be hoisted above a write to the buffer, which would be visible to the outside.

@lukewagner
Copy link
Member Author

By "poisoning the heap" (from V1.md#heap), the idea is that any external JS ArrayBuffer aliasing the heap would also be subsequently unavailable (detached, which is what is proposed to happen upon resize as well).

@jfbastien
Copy link
Member

I would also hold off this change if it's just a size saving, and aim towards automated compression on real-world code.

@lukewagner
Copy link
Member Author

I'm not sure if you read most of the comments, but it's specifically not motivated by size.

@jfbastien
Copy link
Member

Isn't that the same as reg = mov imm ; load var + reg ?

@lukewagner
Copy link
Member Author

I'm not sure what you're asking. The way LoadWithOffset(base, offset) is different than Load(Add(base, Imm(offset))) is that the latter can wrap-around (when base is out-of-bounds) to be in-bounds so you can't simply throw if 'base' is out of bounds.

@jfbastien
Copy link
Member

Hmm, that's an interesting point: overflow on the int is defined, but at the C++ level it certainly isn't defined for addressing purpose.

@titzer
Copy link

titzer commented May 5, 2015

We explored bounds check combining extensively in TurboFan in the early days. There be dragons here if you don't have a backup path to run the "slow version" up to the point where the actual OOB access occurs. For example, if you try to lift bounds checks out of branches to a dominator (with loop invariant code motion just being a special case of this), you can actually introduce checks that fail earlier that would not necessarily have failed in the execution (in particular because the program had a branch that guarded that OOB access). This is easy to see if you have accesses i+1 and i+2, where i+1 is inbounds and i+2 is out of bounds. Then combining the check for both i+1 and i+2 is tricky; if you check i+2 is in bounds, it'll fail--but where? At the access to i+1? That's surprising, especially if the program never executes the i+2 access. The way you'd do this with deoptimization is to do the check but just deopt if i+2 is out of bounds, then run slow code up to the point where the i+2 access happens, if at all. If you have such a sophisticated system I don't see that having special instructions really makes it easier.

@lukewagner
Copy link
Member Author

So I've been thinking about a combination of approaches for just that use case you mentioned: on 32-bit, you can have an extra guard page such that, as long as the base is in-bounds, base+offset (where offset < guard page length) will either be in-bounds or fault. Thus, you could hoist the bounds check (assuming we have the no-overflow property of loadWithOffset!) in the case you talked about.

That being said, I am hoping that, even without these advanced techniques, just optimizing straight-line (load p; load p+1; load p+2; load p+3) and loop (where we can hoist into the loop-rotated head) cases will remove the vast majority of bounds checks (at least dynamically executed ones). We plan to do more experimentation to measure this % to help make the case for deterministic OOB.

@titzer
Copy link

titzer commented May 5, 2015

On Tue, May 5, 2015 at 5:39 PM, Luke Wagner [email protected]
wrote:

So I've been thinking about a combination of approaches for just that use
case you mentioned: on 32-bit, you can have an extra guard page such that,
as long as the base is in-bounds, base+offset (where offset < guard page
length) will either be in-bounds or fault. Thus, you could hoist the bounds
check (assuming we have the no-overflow property of loadWithOffset!) in the
case you talked about.

Tried multiple versions of that, with and without faulting. With special
chaos areas for writing, etc. It doesn't really work (at least for array
buffers), because the bit pattern you get for int reads (0) is not a NaN
pattern.

That being said, I am hoping that, even without these advanced techniques,
just optimizing straight-line (load p; load p+1; load p+2; load p+3) and
loop (where we can hoist into the loop-rotated head) cases will remove the
vast majority of bounds checks (at least dynamically executed ones). We
plan to do more experimentation to measure this % to help make the case for
deterministic OOB.


Reply to this email directly or view it on GitHub
WebAssembly/spec#28 (comment).

@lukewagner
Copy link
Member Author

Well, I'm saying the guard page is PROT_NONE so you fault and the signal handler does The Right Thing. (This is something we already do for asm.js but it would be way easier with throw-on-OOB semantics: the signal handler just moves $pc to the throw trampoline.)

@jfbastien
Copy link
Member

NaCl uses guard pages on x83-32, x86-64, ARM and MIPS, exactly for the purpose of preventing out-of-bounds reg+imm accesses. There reason NaCl can do it easily is because it's in a separate process.

I'm wary of designing the spec with such implementation requirements. I think we should leave this as implementation-defined behavior, and try to converge when we have better experience.

@titzer
Copy link

titzer commented May 5, 2015

We actually implemented guard pages with combining bounds checks almost
exactly as Luke described. We had a prototype working in TurboFan, and we
did a security review with the Chrome Security Team and they had some real
concerns about it...Eventually we decided not to ship the signal-handler
based approach because the speedup was not as big as expected, and it added
a lot of complexity. The hoisting is also a potential performance cliff,
since it could cause out-of-bounds accesses in the cases I described above.
Without deoptimizing to slower code that checks every access, an errant
program (or attacker program) could really hammer the signal handler.

On Tue, May 5, 2015 at 6:24 PM, JF Bastien [email protected] wrote:

NaCl uses guard pages on x83-32, x86-64, ARM and MIPS, exactly for the
purpose of preventing out-of-bounds reg+imm accesses. There reason NaCl can
do it easily is because it's in a separate process.

I'm wary of designing the spec with such implementation requirements. I
think we should leave this as implementation-defined behavior, and try to
converge when we have better experience.


Reply to this email directly or view it on GitHub
WebAssembly/spec#28 (comment).

@lukewagner
Copy link
Member Author

@jfbastien Of course the implementation would be an implementation-defined behavior :) Also, having a separate process is (demonstrably) not a requirement for using signal handlers.

@titzer I'd be interested to hear about the security concerns. One real one we take seriously is not accidentally squelching crash reports. Also, if the guard pages are only hit when there is a real OOB (at which point you jump to throw; asm.js doesn't have this property, it might wrap back around to be in-bounds, hence this proposal) I don't see how there could be a performance cliff for non-OOB code.

So bringing the conversation back from possible optimization techniques to the main PR: I think loadWithOff() should help regardless of the impl technique used. Any concerns on this point?

@jfbastien
Copy link
Member

Separate process isn't an issue with signal handling, I was mentioning it because I worry about address space availability and fragmentation when in-process.

@titzer
Copy link

titzer commented May 5, 2015

We had a couple ideas on what to do when the signal handler hit. Both were premised on the idea that we wanted to continue back to the code that faulted after fixing things. The easiest one was to look up a handler in a table based on the PC of the fault and just return to there. The handler would materialize either 0, NaN, undefined, or whatever was necessary for the out of bounds read. The other option was to have metadata that basically encoded which value and where to materialize it. The signal handler would interpret that and simply jump over the faulting instruction.

The security teams concerns centered around attacks to either the metadata table or other data in the system that could trick the handler into jumping into attacker-controlled code (or using ROP some juicy code already in Chrome).

@lukewagner
Copy link
Member Author

@jfbastien We were talking about an extra guard page (a few KB) which doesn't significantly change address space availability.

@titzer Odin currently does the latter (fills in 0/NaN and jumps to the next pc). Anyhow, it sounds like these security concerns wouldn't be an issue with WebAssembly since, with the loadWithOffset() semantics, the signal handler always sets PC to a fixed (throw trampoline) address. This whole operation can be (is, in Odin) guarded on the PC being inside WebAssembly code and at a known heap access.

@jfbastien
Copy link
Member

Luke: it does on x86-64 :-)

@lukewagner
Copy link
Member Author

@jfbastien I think we must be talking about different things. I'm talking about a single extra guard region (say 4kb) tacked onto the end of the big heap used on 32-bit platforms that allows the hoisting of dynamic bounds checks. Given heaps in the 64-512mb range, this is insignificant. I'm not talking about the "map 4gb" trick for 64-bit.

@jfbastien
Copy link
Member

Understood. I'm just not sure where we limit the redzone size? I want to make sure we don't assume a certain implementation in the spec. What's the maximum immediate size, if any? What do we do if bigger is provided?

@pizlonator
Copy link
Contributor

On May 5, 2015, at 9:39 AM, titzer [email protected] wrote:

We had a couple ideas on what to do when the signal handle hit. Both were premised on the idea that we wanted to continue back to the code that faulted after fixing things. The easiest one was to look up a handler in a table based on the PC of the fault and just return to there. The handler would materialize either 0, NaN, undefined, or whatever was necessary for the out of bounds read. The other option was to have metadata that basically encoded which value and where to materialize it. The signal handler would interpret that and simply jump over the faulting instruction.

The security teams concerns centered around attacks to either the metadata table or other data in the system that could trick the handler into jumping into attacker-controlled code (or using ROP some juicy code already in Chrome).

FWIW, we (WebKit) are unlikely to have signal handling magic - mainly because we live in everyone’s apps as a framework, and signal handler registration isn’t really invisible to a client app. The client app could have its own signal handler tricks, and it may not be doing chaining properly. So, it would probably break apps if we did signal tricks.

Therefore, I’m in favor of any spec feature that lets a signal-magic-free system on mainstream hardware get good performance. This makes me like (Load,Store)WithOffset and heap poisoning, though I haven’t thought about either of them very deeply.

-Filip

@lukewagner
Copy link
Member Author

@jfbastien The spec would not talk about redzone size and would not, e.g., limit the immediate size other than to something reasonably large (e.g., int32).

@pizlonator Totally agreed; we definitely want something that runs well on platforms where signal tricks aren't possible, don't work, don't make sense, etc. It still requires proof, but I am quite hopeful that (Load,Store)WithOffset and heap poisoning will allow aggressive-enough hoisting that deterministic halt-on-oob semantics won't be significantly slower than the non-deterministic semantics we were discussing in #23.

@pizlonator
Copy link
Contributor

On May 5, 2015, at 12:59 PM, Luke Wagner [email protected] wrote:

@jfbastien https://github.com/jfbastien The spec would not talk about redzone size and would not, e.g., limit the immediate size other than to something reasonably large (e.g., int32).

@pizlonator https://github.com/pizlonator Totally agreed; we definitely want something that runs well on platforms where signal tricks aren't possible, don't work, don't make sense, etc. It still requires proof, but I am quite hopeful that (Load,Store)WithOffset and heap poisoning will allow aggressive-enough hoisting that deterministic halt-on-oob semantics won't be significantly slower than the non-deterministic semantics we were discussing in #23 WebAssembly/spec#23.

I want to think about this poisoning more. Have you written this up in any more detail?

My initial take is:

Option #1: Speculatively hoist checks to a dominating but not control-flow-equivalent position and recompile if that hurts.

Option #2: Hoist checks to a control-flow-equivalent dominating position and use OSR exit to get to the real point of failure.

Option #3: Hoist checks to a control-flow-equivalent dominating position and claim in the webasm spec that this is OK.

I’m assuming you’re trying to tackle #3. I’m assuming that #1 and #2 shouldn’t be required by the spec.

-Filip

@lukewagner
Copy link
Member Author

@pizlonator I haven't written it up; it's actually a recent idea @sunfishcode proposed when we were thinking about the same bounds-check-hoisting scenarios that it sounds like we've all be thinking about. You're right, option 3 is the goal and we'd like to avoid 1 and 2.

@pizlonator
Copy link
Contributor

Something else to consider: we could further reduce the burden on webasm implementations by allowing the webasm generator perform check hoisting.

Imagine if webasm had a checkAddress(base, offset) operation. In my strawman, this operation would be required to do a bounds check, and throw/abort/whatever at that point, if it fails. load/store would still semantically require bounds checks, but with the caveat that checkAddress(base, offset1) subsumes a dominated checkAddress(base, offset2) if offset2 <= offset1. Then, eliminating the checks in the webasm implementation would just be a matter of doing CSE.

This certainly increases the amount of work that the webasm generator must do, but maybe it’s simpler for the spec?

-Filip

On May 5, 2015, at 1:19 PM, Luke Wagner [email protected] wrote:

@pizlonator https://github.com/pizlonator I haven't not written it up; it's actually a recent idea @sunfishcode https://github.com/sunfishcode proposed when we were thinking about the same bounds-check-hoisting scenarios that it sounds like we've all be thinking about. You're right, option 3 is the goal and we'd like to avoid 1 and 2.


Reply to this email directly or view it on GitHub WebAssembly/spec#28 (comment).

@titzer
Copy link

titzer commented May 5, 2015

On Tue, May 5, 2015 at 10:12 PM, pizlonator [email protected]
wrote:

On May 5, 2015, at 12:59 PM, Luke Wagner [email protected]
wrote:

@jfbastien https://github.com/jfbastien The spec would not talk about
redzone size and would not, e.g., limit the immediate size other than to
something reasonably large (e.g., int32).

@pizlonator https://github.com/pizlonator Totally agreed; we
definitely want something that runs well on platforms where signal tricks
aren't possible, don't work, don't make sense, etc. It still requires
proof, but I am quite hopeful that (Load,Store)WithOffset and heap
poisoning will allow aggressive-enough hoisting that deterministic
halt-on-oob semantics won't be significantly slower than the
non-deterministic semantics we were discussing in #23 <
https://github.com/WebAssembly/spec/pull/23>.

I want to think about this poisoning more. Have you written this up in any
more detail?

My initial take is:

Option #1: Speculatively hoist checks to a dominating but not
control-flow-equivalent position and recompile if that hurts.

Option #2: Hoist checks to a control-flow-equivalent dominating position
and use OSR exit to get to the real point of failure.

Option #3: Hoist checks to a control-flow-equivalent dominating position
and claim in the webasm spec that this is OK.

What do you mean exactly with #3? You can pull a check out of a branch to
its dominator? If the check failing aborts the program, then the program
cannot even check a condition that guards the access is safe. #2 is at
least better, but it still punishes the program with poor performance for
something it didn't actually do (but the implementation assumed it was
likely to do).

I’m assuming you’re trying to tackle #3. I’m assuming that #1 and #2
shouldn’t be required by the spec.

-Filip


Reply to this email directly or view it on GitHub
WebAssembly/spec#28 (comment).

@pizlonator
Copy link
Contributor

On May 5, 2015, at 1:39 PM, titzer [email protected] wrote:

On Tue, May 5, 2015 at 10:12 PM, pizlonator [email protected]
wrote:

On May 5, 2015, at 12:59 PM, Luke Wagner [email protected]
wrote:

@jfbastien https://github.com/jfbastien The spec would not talk about
redzone size and would not, e.g., limit the immediate size other than to
something reasonably large (e.g., int32).

@pizlonator https://github.com/pizlonator Totally agreed; we
definitely want something that runs well on platforms where signal tricks
aren't possible, don't work, don't make sense, etc. It still requires
proof, but I am quite hopeful that (Load,Store)WithOffset and heap
poisoning will allow aggressive-enough hoisting that deterministic
halt-on-oob semantics won't be significantly slower than the
non-deterministic semantics we were discussing in #23 <
https://github.com/WebAssembly/spec/pull/23>.

I want to think about this poisoning more. Have you written this up in any
more detail?

My initial take is:

Option #1: Speculatively hoist checks to a dominating but not
control-flow-equivalent position and recompile if that hurts.

Option #2: Hoist checks to a control-flow-equivalent dominating position
and use OSR exit to get to the real point of failure.

Option #3: Hoist checks to a control-flow-equivalent dominating position
and claim in the webasm spec that this is OK.

What do you mean exactly with #3? You can pull a check out of a branch to
its dominator? If the check failing aborts the program, then the program
cannot even check a condition that guards the access is safe. #2 is at
least better, but it still punishes the program with poor performance for
something it didn't actually do (but the implementation assumed it was
likely to do).

#3 refers to Luke’s vague poisoning proposal. It requires control-flow-equivalence. So, we wouldn’t pull a check out of a branch, and your problem doesn’t manifest.

-Filip

I’m assuming you’re trying to tackle #3. I’m assuming that #1 and #2
shouldn’t be required by the spec.

-Filip


Reply to this email directly or view it on GitHub
WebAssembly/spec#28 (comment).


Reply to this email directly or view it on GitHub WebAssembly/spec#28 (comment).

@lukewagner
Copy link
Member Author

That's an interesting option and one we've discussed (without conclusion) before. I guess the big question question for me is if there are any big opportunities that this opens up where LLVM knows something (from C++ semantics) that allows it to hoist the checkAddress() up higher than the wasm backend could (esp. with poisoning/WithOffset semantics). On case where the wasm backend is limited is the control-flow-equivalence/domination requirement, so that leads to the question of whether the C++ spec buys us any UB we can leverage here. It seems like, since you can write if (foo) *bar and foo could imply *bar is valid that by default LLVM wouldn't have this opportunity. Of course non-LLVM wasm code generators could for their own domain-specific reasons.

One concern is that this depends on the wasm code generator having a pretty precise model of the CSE capabilities of the wasm engine. Otherwise, the code generator could be generating checkAddress()s that weren't being eliminated and thus creating more work. Also, some wasm compilers (e.g., the baseline compiler we might use for fast startup) won't have any CSE to each checkAddress() would cost.

One variation of this idea and possible fix we've considered is having checkAddress() return some "validated pointer" type that is consumed by loads/stores and part of the type system. The advantage here is that you could have a guarantee that even a dumb (read: baseline) wasm compiler would be able to elide the bounds checks. It would also allow doing some potentially fancy hoisting (inter-procedural, but not heap, since you can't trust anything coming out of the heap).

Another concern is that, in cases where we are using signal magic, the checkAddress would actually be imposing additional requirements, likely requiring a spurious load (to trigger the fault). If code generators generated tons of checkAddress, this could end up hurting.

@titzer
Copy link

titzer commented May 5, 2015

On Tue, May 5, 2015 at 10:42 PM, pizlonator [email protected]
wrote:

On May 5, 2015, at 1:39 PM, titzer [email protected] wrote:

On Tue, May 5, 2015 at 10:12 PM, pizlonator [email protected]
wrote:

On May 5, 2015, at 12:59 PM, Luke Wagner [email protected]
wrote:

@jfbastien https://github.com/jfbastien The spec would not talk
about
redzone size and would not, e.g., limit the immediate size other than
to
something reasonably large (e.g., int32).

@pizlonator https://github.com/pizlonator Totally agreed; we
definitely want something that runs well on platforms where signal
tricks
aren't possible, don't work, don't make sense, etc. It still requires
proof, but I am quite hopeful that (Load,Store)WithOffset and heap
poisoning will allow aggressive-enough hoisting that deterministic
halt-on-oob semantics won't be significantly slower than the
non-deterministic semantics we were discussing in #23 <
https://github.com/WebAssembly/spec/pull/23>.

I want to think about this poisoning more. Have you written this up in
any
more detail?

My initial take is:

Option #1: Speculatively hoist checks to a dominating but not
control-flow-equivalent position and recompile if that hurts.

Option #2: Hoist checks to a control-flow-equivalent dominating
position
and use OSR exit to get to the real point of failure.

Option #3: Hoist checks to a control-flow-equivalent dominating
position
and claim in the webasm spec that this is OK.

What do you mean exactly with #3? You can pull a check out of a branch to
its dominator? If the check failing aborts the program, then the program
cannot even check a condition that guards the access is safe. #2 is at
least better, but it still punishes the program with poor performance for
something it didn't actually do (but the implementation assumed it was
likely to do).

#3 refers to Luke’s vague poisoning proposal. It requires
control-flow-equivalence. So, we wouldn’t pull a check out of a branch, and
your problem doesn’t manifest.

It would definitely require poisoning the heap, since the check could still
be moved above stores, which would then be observable.

For error reporting and debugging modes, moving checks above stores, or
with respect to each other, sounds less than ideal. In those modes I think
you want a precise position and a precise state of the heap at the point of
the error so that the tools can report a precise stack trace and source
position corresponding to the failure.

The only advantage that I see with (Load|Store)WithOffset is guaranteeing
no wraparound. One could imagine doing the same check hoisting trick with
regular loads and stores whether the input happens to be an add with an
immediate. You'd have to either first prove it didn't wrap around or have
the hoisted check also check for overflow.

In general I think we should avoid surprising behaviors for small gains in
performance. Our experiments with hoisting checks in a similar manner
indicates it was usually less than a 5% win. Compilers can do ABCD and/or
loop versioning which will pretty much subsumption this.

-Filip

I’m assuming you’re trying to tackle #3. I’m assuming that #1 and #2
shouldn’t be required by the spec.

-Filip


Reply to this email directly or view it on GitHub
WebAssembly/spec#28 (comment).


Reply to this email directly or view it on GitHub <
https://github.com/WebAssembly/spec/pull/28#issuecomment-99213901>.


Reply to this email directly or view it on GitHub
WebAssembly/spec#28 (comment).

@lukewagner
Copy link
Member Author

@titzer Agreed poisoning is mostly orthogonal to *WithOffset. What is the surprising behavior you're referring to? Having the heap (really, the module) not be accessible after an OOB? This could be considered (rightly, I think) a security measure as the module may be in a corrupt state making it vulnerable to XSS-type attacks.

@pizlonator
Copy link
Contributor

Yup, all good points.

-Filip

On May 5, 2015, at 1:53 PM, Luke Wagner [email protected] wrote:

That's an interesting option and one we've discussed (without conclusion) before. I guess the big question question for me is if there are any big opportunities that this opens up where LLVM knows something (from C++ semantics) that allows it to hoist the checkAddress() up higher than the wasm backend could (esp. with poisoning/WithOffset semantics). On case where the wasm backend is limited is the control-flow-equivalence/domination requirement, so that leads to the question of whether the C++ spec buys us any UB we can leverage here. It seems like, since you can write if (foo) *bar and foo could imply *bar is valid that by default LLVM wouldn't have this opportunity. Of course non-LLVM wasm code generators could for their own domain-specific reasons.

One concern is that this depends on the wasm code generator having a pretty precise model of the CSE capabilities of the wasm engine. Otherwise, the code generator could be generating checkAddress()s that weren't being eliminated and thus creating more work. Also, some wasm compilers (e.g., the baseline compiler we might use for fast startup) won't have any CSE to each checkAddress() would cost.

One variation of this idea and possible fix we've considered is having checkAddress() return some "validated pointer" type that is consumed by loads/stores and part of the type system. The advantage here is that you could have a guarantee that even a dumb (read: baseline) wasm compiler would be able to elide the bounds checks. It would also allow doing some potentially fancy hoisting (inter-procedural, but not heap, since you can't trust anything coming out of the heap).

Another concern is that, in cases where we are using signal magic, the checkAddress would actually be imposing additional requirements, likely requiring a spurious load (to trigger the fault). If code generators generated tons of checkAddress, this could end up hurting.


Reply to this email directly or view it on GitHub WebAssembly/spec#28 (comment).

@pizlonator
Copy link
Contributor

On May 5, 2015, at 1:53 PM, titzer [email protected] wrote:

On Tue, May 5, 2015 at 10:42 PM, pizlonator [email protected]
wrote:

On May 5, 2015, at 1:39 PM, titzer [email protected] wrote:

On Tue, May 5, 2015 at 10:12 PM, pizlonator [email protected]
wrote:

On May 5, 2015, at 12:59 PM, Luke Wagner [email protected]
wrote:

@jfbastien https://github.com/jfbastien The spec would not talk
about
redzone size and would not, e.g., limit the immediate size other than
to
something reasonably large (e.g., int32).

@pizlonator https://github.com/pizlonator Totally agreed; we
definitely want something that runs well on platforms where signal
tricks
aren't possible, don't work, don't make sense, etc. It still requires
proof, but I am quite hopeful that (Load,Store)WithOffset and heap
poisoning will allow aggressive-enough hoisting that deterministic
halt-on-oob semantics won't be significantly slower than the
non-deterministic semantics we were discussing in #23 <
https://github.com/WebAssembly/spec/pull/23>.

I want to think about this poisoning more. Have you written this up in
any
more detail?

My initial take is:

Option #1: Speculatively hoist checks to a dominating but not
control-flow-equivalent position and recompile if that hurts.

Option #2: Hoist checks to a control-flow-equivalent dominating
position
and use OSR exit to get to the real point of failure.

Option #3: Hoist checks to a control-flow-equivalent dominating
position
and claim in the webasm spec that this is OK.

What do you mean exactly with #3? You can pull a check out of a branch to
its dominator? If the check failing aborts the program, then the program
cannot even check a condition that guards the access is safe. #2 is at
least better, but it still punishes the program with poor performance for
something it didn't actually do (but the implementation assumed it was
likely to do).

#3 refers to Luke’s vague poisoning proposal. It requires
control-flow-equivalence. So, we wouldn’t pull a check out of a branch, and
your problem doesn’t manifest.

It would definitely require poisoning the heap, since the check could still
be moved above stores, which would then be observable.

For error reporting and debugging modes, moving checks above stores, or
with respect to each other, sounds less than ideal. In those modes I think
you want a precise position and a precise state of the heap at the point of
the error so that the tools can report a precise stack trace and source
position corresponding to the failure.

The only advantage that I see with (Load|Store)WithOffset is guaranteeing
no wraparound. One could imagine doing the same check hoisting trick with
regular loads and stores whether the input happens to be an add with an
immediate. You'd have to either first prove it didn't wrap around or have
the hoisted check also check for overflow.

In general I think we should avoid surprising behaviors for small gains in
performance. Our experiments with hoisting checks in a similar manner
indicates it was usually less than a 5% win. Compilers can do ABCD and/or
loop versioning which will pretty much subsumption this.

I think I agree with you. My reading of what you wrote sounds like what I was trying to say earlier.

The question is: should the spec allow some fudge or features to make some check hoisting easier if the implementation doesn’t use versioning or deopt? My understanding of poisoning is that it is something that the spec would have to say things about, to loosen semantics to allow a check to float above a store. I’m trying to decide if I like poisoning being in the spec and to do that, I’d like to hear more about the specifics.

-Filip

-Filip

I’m assuming you’re trying to tackle #3. I’m assuming that #1 and #2
shouldn’t be required by the spec.

-Filip


Reply to this email directly or view it on GitHub
WebAssembly/spec#28 (comment).


Reply to this email directly or view it on GitHub <
https://github.com/WebAssembly/spec/pull/28#issuecomment-99213901>.


Reply to this email directly or view it on GitHub
WebAssembly/spec#28 (comment).


Reply to this email directly or view it on GitHub WebAssembly/spec#28 (comment).

@lukewagner
Copy link
Member Author

Yes, poisoning would definitely need to be in the spec if it were accepted. Experimentation is needed (microbenchmarks being totally different than large C++ apps), so I'm really just interested in putting the ideas out until the implementations get to a state of maturity where we can experiment directly.

Back to the PR, everyone seems positive on overflow-free (Load|Store)WithOfffset that I can tell, so I'll merge that. Happy to continue discussing the broader issue of bounds checking, though, here or in a new issue.

lukewagner added a commit that referenced this pull request May 5, 2015
@lukewagner lukewagner merged commit 911917c into master May 5, 2015
@lukewagner lukewagner deleted the load-store-with-offset branch May 5, 2015 21:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants