-
Notifications
You must be signed in to change notification settings - Fork 696
Request for opcode: dup #1365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @windelbouwman, I took the liberty of transferring this issue to the design repo, where spec proposals usually start. You can see the full process all changes would need to go through to become standardized here. Can you explain the motivation for this proposal a bit more? |
Not only the code size is improved, but also the readability of the code, the speed of the direct interpretation, also there will be no need of a local variable (that further obscures the simple operation of a duplication). This |
Hi! Thanks for moving this request. I understand the request for a concrete use case for this instruction. The specific use case where this would be handy, is when implementing a for loop. As a gimmick, I tried to make a python -> wasm compiler, and when implementing the for loop, it would be handy to leave the loop index value on the stack. Now I solved this by creating a new local per for loop, but this results potentially in sloppy code, when for loops are not nested. Some extra logic is required to generate the least amount of locals for the for loops. Exact spot of the use case is here: https://github.com/windelbouwman/corepython/blob/master/corepython/src/compile.rs#L205 This is one use case, but I can imagine there are more use cases. Notice also the JVM has a It might be worth noting that the JVM machine and webassembly machine are pretty similar, so there might be more ideas to copy from the JVM. Hope this clarifies the request a bit. Regards, |
No, I do not. Interesting question though, how would one create such an estimation? I can see compilers exploiting this instruction, and spec implementations implementing this pretty straight forward. |
The function-references proposal introduces a |
I remember discussing with @rossberg a few years ago that the typing of syntactically dead code would become much more interesting if a e.g. the code fragment
is well-typed, but the code fragment
would be ill-typed. But this requires some more sophisticated book-keeping than has been necessary up to this point, The smallest solution might be to require that |
You are right, a |
Still my primary use case for this would be implementation of a for loop without having to generate locals. |
I think that a combination of the validation logic that applies to |
Similar discussion in the interface-types proposal: WebAssembly/interface-types#118 |
@verbessern Another solution, avoiding the explicit type annotation, would be to weaken Wasm's validation so that syntactically dead code is no longer validated. i.e. introduce a
I've been told that there were extensive discussions over this during Wasm's initial design. FWIW, the same issue applies to |
Ignoring any dead code would be a wonderful simplification to make, yes. Failing that, add That wastes opcodes, but certainly it being polymorphic wasn't useful anyway. |
It might be most natural to have it be one opcode with a static parameter, otherwise it wouldn't scale to reference/GC types. This slightly reduces the impact of code size savings though. Also, in the scheme I gave above, dead code would still have to be parsed for "malformedness"/encoding errors. The spec's distinction between parse/validation-time errors has cropped up as a gotcha in tests before and this would become another example. Flinging out 4 (really 3) ways forward:
(also, all of the above but for |
One thing I didn't initially think about. For
In an archetypal single-pass validator like the OCaml reference implementation, typing the |
I'm not convinced that the losing of the stack polymorphism is a good idea. I think its a nice thing to have. |
What's the advantage of stronger dead code validation? It's a corner case that shouldn't come up very often in actual wasm uses, right? |
@taralx I believe the original motivation for the current approach to typing dead code was to ensure that a well-typed sequence of instructions can be arbitrarily split, and the resulting contiguous subsequences are guaranteed to be well-typed. I don't know if any tooling is relying on this. @rossberg might have a first-hand memory of the decision-making process. |
Some browser vendors raised concerns at the time against allowing unvalidated code. I believe there were fears that it could become part of a potential attack vector, which is a reasonable concern. Another option that was discussed was disallowing dead code altogether. The rationale for the current design is summarised here. The simplest solution would be a dup with a type annotation or a pick instruction with a stack type annotation of the appropriate depth. However, |
What about the use case of wanting to square or cube a number that is the result of the previous operation? One example is computing the distance between two points. It seems odd to have to put the result of x1 - x2 in a local variable and then get it twice in order to computing the square. |
Since WebAssembly is meant to be a compilation target rather than a convenient language to write by hand, it’s generally not a problem that certain code patterns require the use of locals. Using stack instructions instead of locals would not affect the performance of optimizing engines. The primary reasons to potentially add stack duplications instructions are to handle values that could not reasonably be stored in normal locals, such as non-nullable reference types (depending on your point of view). |
I understand that WebAssembly is not typically hand written, but what is the down-side in making that practice easier/nicer by adding an instruction? Beyond |
Besides the typing issue discussed above, there's probably no downside in the sense you're thinking of. It takes a surprising amount of work and time to spec new instructions and get them implemented in every tool and engine out there, though, so generally only changes with significant benefits get all the way through the process. Unfortunately that means that there are a lot of "nice to have" proposals, even tiny ones like adding a single This is one of the costs of standards-based work; if Wasm were controlled by a single party, it would be easy to add a single instruction like |
Thanks for explaining that Thomas! |
There may be a hidden benefit to not supporting a dup instruction: it makes reasoning with linear types a great deal easier. Of course, linear types is also a thing for the potential future. |
what @tlively said should be the intro for "So you want to propose a feature?" faq :P |
Coming back to this, it's unclear to me whether the general consensus was that Actively shipping a JIT right now, having these opcodes would make it possible to generate smaller and more efficient code, since right now it seems like the use of temporary locals to duplicate a value can in many cases cause the value to unnecessarily get stored to the wasm shadow stack in memory. The need to have temporary locals to house values the user intends to duplicate or swap also makes the stack frame of every function larger, and I feel like that is another good justification for having opcodes that allow skipping the temporary locals. In an SSA form, I think each time that temporary local is overwritten for a dup or swap, that would introduce another SSA variable and complicate things, right? To provide a couple specific examples (in pseudocode), null checks look like this:
This is awkward and bloated, especially if I'm throwing around multiple nullable pointers. But one can live with it, and it's just one reusable local (that sadly is messy to optimize or analyze since it's constantly being overwritten). Overflow/underflow checks get worse though, like if I want to convert a float to a uint16 it looks something like this:
We now have two loads from a named temp local, and every new type added to the wasm type system means at least one of these locals that needs to be dealt with separately. Once we have custom struct types or gc types etc I can imagine having dozens of these. To go further, we have to overflow/underflow check binary operations, like multiplies or divides - which means we need two temporary locals of each type, one to cache the lhs and one to cache the rhs. It gets out of hand quickly. For the single-value case, The validation concerns make sense and it seems like having to explicitly pass an expected type to dup and swap would not be a big problem. It would potentially remove most of the code size advantage since Is this something that is too difficult to implement in engines like v8 and spidermonkey? Is there no appetite for it since most people are just generating code with llvm and it deals with all of this under the hood? |
If anything, that may be an argument for introducing |
Now that |
I assume If we only get one shot at having this kind of opcode added it feels like |
Hi,
I would like to request a webassembly opcode:
dup
. The semantics would be that the top of stack is duplicated.Regards,
Windel
The text was updated successfully, but these errors were encountered: