-
Notifications
You must be signed in to change notification settings - Fork 0
Design of API for setting JS prototypes #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
That seems actively harmful to me. It's going to quickly break any codegen whose first descriptor field has, by chance, that type. For example, because of the type of JS string builtins, strings typically get that type. The first field of a descriptor might be the class name, for run-time reflection. That could very well have the type |
Another idea due to @syg is that we could simply have each custom descriptor object start out with an associated empty prototype object that will be the JS prototype of the objects described by the descriptor. After instantiation, the prototype could be retrieved from an exported custom descriptor and then populated with methods. The benefits of this idea are that it does not require custom sections and it still allows custom descriptors to be used in constant expressions. The downsides are the overhead of allocating an associated prototype object for every custom descriptor (but I don't expect this to be a problem in practice) and the need to use |
That is exactly what I was thinking as I read the message. Doesn't that trigger severe deoptimization paths in the main engines? |
Yes, I believe so. A workaround would be to provide the supertype's custom descriptor when allocating the subtype's custom descriptor and have an annotation telling the engine to make the proper connection between the associated prototypes, but that gets us back to where we started with the custom sections. |
Yeah, I think that making the prototype of a custom descriptor immutably set at creation is the ideal way to go here. As yet another alternative to using a custom section here, what if we added a new struct creation instruction for descriptors that let us pass embedding specific host information? e.g.
It takes an extra optional This would be minimally invasive on the core semantics, and I believe it would be useful for other embeddings than just JS. JS isn't going to be the only host for wasm that might want to have these wasm GC objects interop with their host type system. |
@rossberg, how would you feel about adding new allocations instructions for descriptor types that take an arbitrary externref as an extension point for host interop? |
As a quick follow up to And also that there's no |
@tlively, to be honest, it strikes me as a bit random. Is there any concrete host use case outside JS's peculiar prototype mechanism? If we had type imports, we could perhaps introduce a js builtin module that defines an opaque type |
I don't think it's just about JS prototypes, there are other parts of how a wasm-GC struct is reflected to JS that we'll want to control. At least one thing I think we'll want is to have specified wasm fields get reflected as JS own properties with specific names. The For non-JS hosts, I could imagine similar use-cases. If you embedded Wasm-GC code in a Java VM or .NET VM, you'd probably want to have the ability to control how these objects are reflected to the rest of the code in the system.
I think that might help out here, but I haven't been able to fully think through how type imports + compile type imports (basically what js builtins gives us) work together. |
Okay, yes, but as you say: it is about reflecting Wasm values in JS. It is not relevant to Wasm itself, so would be strange as a Wasm instruction.
I am not so sure. The expectation to be able to directly use foreign values as if they were "natively implemented" host language objects is a very unique JavaScriptism. I doubt that anything like that will be done in any other embedding. It's not how embedded VMs or data representations usually work. You normally have an API or wrapper objects for accessing embedded values. |
I share the concern about internal shape descriptors flowing out to JS as first-class values; I don't think we would want that (mostly for code robustness reasons). I think our implementation will use an indirection there: Wasm struct points to internal shape descriptor, internal shape descriptor points to Wasm-level "custom descriptor". That way the custom descriptor can flow out to JS without the internal shape descriptor ever floating around userspace code as a first-class value. To make the custom descriptor be the thing that's passed to I strongly share the concern that replacing a prototype after any object using that descriptor has already been exposed to user code (e.g. via the "start" function) is a no-go: we'd have to allocate a fresh internal shape descriptor for the new prototype, and updating all pointers to that would be somewhere between "a big mess" and "infeasible". (Having the prototype itself be mutable is fine, it just needs to preserve its object identity.) For scaling reasons (many shapes with many methods installed on prototypes) I believe that we'll really want a declarative way to specify them, which AFAICS will mean some kind of section (because that's the declarative mechanism that Wasm has). So I think we should be spending our brainstorming time on coming up with a kind of section (core Wasm? traditional "custom section"? something new in between? something... else?) that we can agree on for this purpose.
|
I consider these to be the requirements constraining the solution space here:
These requirements imply that if a solution sets up an association between a custom descriptor and a particular prototype object, the information about that association must be encapsulated in an imported value passed to the allocation of the custom RTT. More details about these requirements and their implications is included in the spoiler at the bottom of this post. Furthermore, the specified algorithm for [[GetPrototypeOf]] must be able to look up the prototype from some combination of the custom descriptor's identity, type, and value. This seems to give four families of solutions, although there might be other solutions I am missing. All are described below, although the last solution is the only new one. 1. Trivial PrototypesThe simplest solution is to use only the custom descriptor's identity to look up the associated prototype. In the spec, this would mean there is a global map from custom descriptor addresses to prototype objects. In implementations, this would mean that every custom descriptor has an eagerly allocated empty associated prototype. One downside of this approach is the (probably negligible) overhead of allocating an empty prototype for every custom descriptor, whether it needs one or not. All other solutions import values to use during custom descriptor allocation to set up the association with the prototypes. These imported values could be the prototypes themselves, but it is probably better for them to be option bags containing the prototypes to allow us to additionally configure own properties or other things in the future. 2. Field Order ConventionIf we're not using the custom descriptor's identity to look up the prototype, we have to use some combination of its type and value. In fact, we must use the type to at least ensure we are getting the prototype from an immutable field. This ensures that the spec algorithm that respects Wasm abstraction boundaries and what real engines would do, namely eagerly copy the prototype into the prototype slot at custom descriptor allocation time, will produce the same value. The simplest way to look up the prototype from the custom descriptor is to use some fixed convention of where to look. For example:
We can bikeshed the precise convention later. No matter what the convention is, it is something producers have to be aware of. In some cases, they might need to insert a null placeholder field to prevent some other field from being unintentionally picked as holding the prototype. 3. Distinguished TypesWe can alternatively allow producers to communicate their intention for a particular field to contain the prototype by having a distinguished type (i.e. not With this solution, the spec algorithm would use the first immutable field with the distinguished builtin type to lookup the prototype. Real engines would do a similar iteration, considering only fields referring to imported subtypes of 4. Distinguished Values (✨NEW!✨)Rather than iterating through the type of the custom descriptor to find the field used to look up the prototype, it would be possible to iterate through the values of the fields instead. We could have an API that sets an internal sentinel property on objects intended to be imported for use as prototypes (or option bags containing prototypes), and the spec algorithm could iterate through immutable fields looking for the first value that had that sentinel property set. Real engines would do the same search when allocating custom descriptors. The advantage of this approach is that it lets producers signal their intent explicitly, just like with distinguished types, but it doesn't depend on type imports. The downside is that there might be more fields to search through than there would be with the distinguished types approach, but I doubt this would make much of a difference in practice. Declarative APIsAll of the above solutions admit a declarative API where a custom section describes the intended shape and contents of the prototypes and the engine automatically realizes that plan via some combination of synthesizing imports and mutating prototypes at the end of instantiation. In all the solutions, methods cannot be attached to the prototypes until after instantiation finishes because the exported functions the methods should call are not available until then. The start function is either able to observe that the methods don't yet work or is able to observe that this core Wasm abstraction has been broken, and we don't want to allow the latter. On the other hand, all solutions but the first allow things that do not depend on Wasm exports, such as own properties or prototype chains, to be configured eagerly when the imported prototypes (or option bags containing prototypes) are first created. The first solution, Trivial Prototypes, does not allow any eager configuration at all. A declarative API for this solution would require mapping custom descriptor identities to intended prototype configurations, but the custom descriptor identities are not available outside the instance until exports are made available at the end of instantiation. The prototypes would all have to have the same, default shape during the start function. Explanation of requirements and implicationsImmutable Prototype IdentityUltimately, engines need to install the prototype on the underlying shape descriptor for the WasmGC object. This engine-managed shape descriptor may or may not be the same as the user-visible "custom descriptor" object in core Wasm. Both SpiderMonkey and V8 require the prototype identity associated with a particular shape descriptor to be immutable. The prototype object itself can be mutable, but its identity must be the same for the lifetime of the shape descriptor. In particular, this means that the prototype object must be available when the shape descriptor is allocated. The shape descriptor must be allocated before the shape of a described object can be observed, and in particular before a described object can flow into a cast or out to JS. In practice, this means that the shape descriptor must be allocated at the same time as the custom descriptor. Trying to allocate it lazily while ensuring it is allocated before it is needed would be too complicated and would introduce new work at the Wasm/JS boundary and on casts. The prototype objects must therefore be available when custom descriptors are allocated. Custom Descriptors Available in Constant ExpressionsToday, vtables are allocated in immutable Wasm globals. This allows Binaryen optimizations to safely propagate their fields and eventually devirtualize many method calls. With the Custom RTTs proposal, the vtable structs will become custom descriptors and will be associated with the JS prototypes for the objects they describe. To keep Binaryen's devirtualization optimizations working, the vtable custom descriptors must also be allocated in immutable Wasm globals. The custom descriptors must therefore be allocatable in constant expressions or imported. Due to the previous requirement, this means that the JS prototypes must be available when constant expressions, and in particular global initializers, are evaluated. Maintain Core Wasm AbstractionsAs a matter of design hygiene, the WebAssembly CG requires that the design of the mechanism for associating JS prototypes with custom descriptors be specifiable in terms of the existing Core Wasm embedding API. Implementations of the mechanism may of course punch through any abstraction boundaries they want, but they must not be forced to do so by the nature of the design. In particular this means that the specification of this mechanism in the JS embedder spec may:
However, because the JS prototypes must be associated with descriptor objects and the objects they describe at all times, and because these prototypes can be observed before instantiation is finished via calls to imports from the start function, the specification of the mechanism cannot depend on an instance's exports. Since separate instances of a module must be able to have different custom descriptors and different JS prototypes to access the separate state of each instance, no solution can depend solely on a module's import and export definitions or custom sections. The solution must therefore depend on observing and possibly modifying the imports provided at instantiation time since that's the only capability that has not been ruled out. Since the custom descriptors must be available in constant expressions, they may themselves be imported or the information necessary to describe their association with a JS prototype may be imported and then used to allocate the custom descriptors. But custom descriptors must contain non-nullable references to functions in the same instance to support devirtualization, so it is not possible to initialize them from outside the instance before instantiation has finished and the functions can be made available as exports. That rules out importing the custom descriptors. Instead, we must import information sufficient to form the association with the JS prototypes themselves when a custom descriptor is allocated. No Additions to Core WasmAs another matter of design hygiene, the WebAssembly GG also requires that we not add anything to core Wasm only for the benefit of particular embedders. Building on existing proposals or proposing new mechanisms that have independent value to core Wasm is acceptable, though. No Slowdown for Unrelated CodeThe proposed mechanism should not have performance penalties for code that does not use the new mechanism. In particular, the design should require no new work on the Wasm/JS boundary or for casts. This is the requirement that rules out lazily allocating shape descriptors. Admits a Declarative APIThe proposed mechanism must be declarative in nature or allow a declarative API to be layered on top of it. This is a requirement from the V8 team, which expects users to configure thousands of prototypes with tens of thousands of methods and expects that doing so declaratively will be the only way to get reasonable startup performance. |
The distinguished value idea is a clever option, that could work. Both option 3 and 4 could be combined with the most constrained version of option 2 to avoid any search through the fields. That is, only consider the very first field, and if that does not have the right attribute+type+value, the magic behaviour does not apply and the prototype is left as null. |
For module size reasons, it would actually be interesting to not require functions that are installed as prototype methods to also be exported on their own. I fully expect that "design hygiene" will take precedence over practical concerns like module size, but wanted to have this consideration on the record nevertheless. |
Note that exporting all 200k functions will also completely kill wasm-opts's ability to meaningfully optimize types, so the long-term solution for this will be to a combined JS-Wasm callgraph analysis to remove unused exports. Binaryen has a tool for this called wasm-metadce that is used by Emscripten, so there is precedent for this kind of thing. |
Originally posted by @eqrion in ea16da7
This seems to suggest that the original API from the design issue was actually a better fit. That API design set the prototype immutably at the point where the custom descriptor was allocated, and it did so in a way that the custom descriptor could be used from subsequent constant initializers. Alternative designs that e.g. import functions to allocate custom descriptors would lack this last property.
The only downside of the original API was that it depended on a custom section for identifying which field of the custom descriptor was meant to be the prototype. We could get rid of the custom section by standardizing a convention like "the first field is used as a prototype if it has a type that matches
(ref null extern)
," but that would be super ad hoc, and I don't think that's any better than using a custom section.Additional ideas would be very welcome here!
The text was updated successfully, but these errors were encountered: