-
-
Notifications
You must be signed in to change notification settings - Fork 670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A mechanism for virtual overloads #693
Comments
I'm also not pretty experienced in different approaches but agree with your notes and also add on my own:
|
Sounds reasonably to me. Let's see whether @willemneal has something to add. |
In this case, the functions would look like function MyClass~virtual(memberId: u32): u32 { // classId implicitly is idof<MyClass>
switch (memberId) {
case 0: return SOME_CONST;
case 1: return SOME_CONST;
case 2: return SOME_CONST;
default: unreachable();
}
} Questions are:
The alternative is to reverse classId/memberId by using a single function that first switches over memberId, and only then over classId: function ~virtual(memberId: u32, classId: u32): u32 {
switch (memberId) { // guaranteed to be adjacent because memberId is a global counter
case 0: {
switch (classId) {
// might become a br_table or a series of ifs depending on classId distribution here
}
}
case 1: {
...
}
default: unreachable();
}
} In this case, the most important question is whether such a function will optimize well. On a first glimpse this "looks" like Binaryen can rereloop a lot of stuff, but haven't seen it in action yet ofc. Hmm... |
My first question is are we going to now implicitly inherit from an Also do we want interfaces to be structural and nominal like typescript? Personally I think we should. I think the per class method would be a good first step; you could use the class id as the index into the function table or a memory address it's What do you mean by "each subclass would need one case even if the member isn't overloaded deeper in the tree". I also have another question. When are class id's computed currently? After parsing? Or during compilation? |
I think we should do this in the long term, yeah. If we can do this now, sure, why not.
Depends if a static compiler can do that. Consider those two classes for example: class Foo { a: i32; b: f32; }
class Bar { b: f32; a: i32; } These are statically incompatible even though structurally compatible, but interfaces might not suffer from this due to all members being virtual. So I guess that there might be ways to make something structurally compatible sometimes, not but every time, which leaves the question whether we should do it at all. Might even be that structural things like these don't work well with static typing.
In a single global function there can be switch fall-throughs, while if there's one function per class every function must either implement all the cases up the tree or fall-through by means of calling the overloaded virtual table function, leading to more code overhead than just a big switch (which might also optimize better because it's one large blob of code that can possibly take advantage of re-relooping).
The class ids of ArrayBuffer, String, ArrayBufferView are generated when the program is initialized, as these are static, while all the others are generated as soon as a concrete class is resolved (from type arguments), which happens during compilation when the compiler sees the use of a type or expression that requires resolving a class. |
If we are going to make fields, e.i. getter functions, have a unique member ID, we will always be able to determine structurally compatibility. Consider this interface: interface AB {
a: i32;
b: f32;
} Then both of the above classes would implement it structurally even if they aren't in the same order. If there is ever flow where either class could be used where an
But the use of virtual functions means it doesn't matter. I understand the want to keep everything in one function for compactness and potential optimizations, but one down side is a lack of modularity. For example, assuming we could merge wasm binaries (which will be a future runtime feature of the loader), we couldn't merge the two big virtual method lookup functions, whereas it would if each class is responsible for their table. Perhaps we should add a step between parsing and compiling, where we resolve classes/interfaces. A special transform that starts at the entries and produces a condensed program object, which only contains statements that will be compiled and handle the type checking. |
If field layout isn't statically compatible, accessing any of its members structurally must resort to doing a virtual lookup. So I assume what you are proposing here is to have something like hidden interfaces for class types that are used interchangeably in a structural way? This incurs some overhead (that might ofc be worth it).
Not so sure about this point. There are already things that cannot be merged easily having just executables, like RC's More generally spoken, the way to go here seems to first get interfaces up and running, and only then think about structural compatibility of other types, like using hidden interfaces which we can always do. Does that sound reasonable? |
Yeah that's what we should do for sure. My point was once we have the nominal interfaces, the implementing structural becomes easy. Assuming we have a virtual lookup for each method of a class that implements the interface, then all it would take is to add all classes that match it structurally as implementing the interface. Regardless I think the place to start is what you first suggested: function MyClass~virtual(memberId: u32): u32 { // classId implicitly is idof<MyClass>
switch (memberId) {
case 0: return SOME_CONST;
case 1: return SOME_CONST;
case 2: return SOME_CONST;
default: unreachable();
}
} Each Then each class has its methods in the table and function above would map the method ids to their table entries. let func = call_indirect(load<u32>(ref - 8), memberid); // Calls MyClass~virtual.
call_indirect(func,...) // make actual function call. Then after we get this to work interfaces are just classes with static methods that do the above look up on a constant Does this make sense? |
Another pro for a single function might be that multiple levels of indirect calls can be avoided, since we don't have to look up the respective virtual function first. I also wonder if storing something (for methods) in class instance's memory can be avoided, e.g. let func = ~virtual(memberId, classId); // direct call returning the function index, memberId is static
call_indirect(func, this, ...args) // make the actual function call where |
How about Static Polymorphism? And CRTP which desugared (generated) by compiler? |
So this code: class B {
i: i32 = 1;
add(i: i32): void { // abstract method
}
}
class D extends B {
add(i: i32): void { // override subtyped method foo
this.i += i;
}
}
let boo: B = new D();
boo.foo(1);
log(boo.i); // should be 2 generate as: class B {
i: i32 = 1;
add(i: i32): void { // abstract method
}
}
class _B<T extends D> {
i: i32 = 1;
add(i: i32): void {
changetype<T>(this).add(i); // generated in compile time
}
}
class D extends _B<D> {
add(i: i32): void {
this.i += i;
}
}
let boo: _B<D> = new D(); // _B<D> generated in compile time
boo.add(1);
log(boo.i); // -> 2 Of course it as always speed / size trade-off so main disadvantage of this approach is quickly growing codesize and slower compilation but I guess this could be partially solved. |
When we have a function callMe(actuallyD: B): void {
actuallyD.add(1);
} how would the compiler not have to do a lookup there? I might be missing something, but CRTP seems like an alternative pattern that can be used in some situations in C++, but doesn't necessarily solve our inheritance model. |
We don't cast to See this approach in C++ fiddle. I don't use |
Isn't the use case there more like this, with interfaces, with |
That I guess called F-bounded quantification |
What I'm trying to get at with my fiddle is that in your example the class For instance, how would that turn out? class Foo {
hi(): void {}
}
class Bar extends Foo {
hi(): void {}
}
function callMe(actuallyBar: Foo): void {
actuallyBar.hi();
} |
Yeah, CRTP have some limitations as well as "static" instanceof. In rest scenario we could use double dispatch (two indirect calls) for example: https://gieseanw.wordpress.com/2018/12/29/stop-reimplementing-the-virtual-table-and-start-using-double-dispatch I try imagine could we avoid vtable in some sort or this totally impossible) |
Maybe it makes sense to elaborate a bit on my idea from above so we can compare performance of different alternative approaches to this one. What I was proposing is the following mechanism: Every object in AssemblyScript has a class Foo { // classId=1
hi(): void {} // memberId=1, placed at functionIndex=0
}
class Bar extends Foo { // classId=2
hi(): void {} // memberId=1, placed at functionIndex=1
}
class Baz extends Bar {} // classId=3
function callMe(actuallyBar: Foo): void {
actuallyBar.hi();
} we can see inside of We can generate the following lookup function function ~virtual(classId=2, memberId=1): u32 {
switch (memberId) {
case 1: { // Foo|Bar|Baz#hi
switch (classId) {
case 1: return 0; // Foo#hi
case 2: return 1; // Bar#hi
case 3: return 1; // Bar#hi
}
}
}
unreachable();
// the same goes for virtual fields, just that these don't return
// a function table index, but a memory offset to load from
} doing the following call inside of call_indirect(~virtual(classId, memberId), []); leading to exactly one level of indirection through two br_tables, always, with no memory wasted on the actual instances to store function table indexes. Also, if a class does not have any subclasses, like in function callMe(baz: Baz): void {
baz.hi();
} which will be the case quite often, we can look up the So, if there are better approaches, that's about the baseline to beat: load -> direct call -> br_table -> br_table -> call_indirect |
I see. Two jump tables. Probably this could have same speed as vtable: by the way I wonder why compilers codgen linear search for jump tables instead binary search? Which outperform vtables in performance |
Also that about use one switch table instead two? function ~virtual(classId: i32, memberId: i32): u32 {
switch ((memberId << 16) | classId) {
case (1 << 16) | 1: return 0; // Foo#hi
case (1 << 16) | 2: return 1; // Bar#hi
case (1 << 16) | 3: return 1; // Bar#hi
}
unreachable();
// the same goes for virtual fields, just that these don't return
// a function table index, but a memory offset to load from
} or function ~virtual(classId: i32, memberId: i32): u32 {
switch ((<u64>memberId << 32) | <u64>classId) {
case (<u64>1 << 32) | 1: return 0; // Foo#hi
case (<u64>1 << 32) | 2: return 1; // Bar#hi
case (<u64>1 << 32) | 3: return 1; // Bar#hi
}
unreachable();
// the same goes for virtual fields, just that these don't return
// a function table index, but a memory offset to load from
} |
The idea about the double switch is that we can guarantee that For a But actually, I see now why making multiple functions has an advantage (we talked about that above): We can skip the first level switch if we make one function per unique member: function Foo|Bar|Baz#hi~virtual(classId): u32 {
switch (classId) {
case 1: return 0; // Foo#hi
case 2: return 1; // Bar#hi
case 3: return 1; // Bar#hi
}
unreachable();
} bringing this down to: load -> direct call -> br_table/br_ifs -> indirect call |
Hmm, I created example with two and single tables and I see binaryen create gen better table for single table one. Bench results pretty the same (in Chrome & Firefox):
Aslo asm codegen: 0x000000: ; 0x000000 from: [0x000007, 0x000010, 0x000019, 0x00001f, 0x000027, 0x000030, 0x000039, 0x00003f, 0x000046, 0x00004e, 0x000057, 0x000060, 0x000066, 0x00006d, 0x000075, 0x00007e, 0x00008c, 0x0000a1, 0x0000ab, 0x0000b0]
sub rsp, 8 ; 0x000000 48 83 ec 08
cmp esi, 1 ; 0x000004 83 fe 01
je 0x24 ; 0x000007 0f 84 17 00 00 00
0x00000d:
cmp esi, 6 ; 0x00000d 83 fe 06
je 0x4b ; 0x000010 0f 84 35 00 00 00
0x000016:
cmp esi, 8 ; 0x000016 83 fe 08
jne 0xb0 ; 0x000019 0f 85 91 00 00 00
0x00001f:
jmp 0x72 ; 0x00001f e9 4e 00 00 00
0x000024:
cmp edi, 1 ; 0x000024 83 ff 01
je 0x44 ; 0x000027 0f 84 17 00 00 00
0x00002d:
cmp edi, 2 ; 0x00002d 83 ff 02
je 0xb5 ; 0x000030 0f 84 7f 00 00 00
0x000036:
cmp edi, 3 ; 0x000036 83 ff 03
jne 0x4b ; 0x000039 0f 85 0c 00 00 00
0x00003f:
jmp 0xb5 ; 0x00003f e9 71 00 00 00
0x000044:
xor eax, eax ; 0x000044 33 c0
jmp 0xba ; 0x000046 e9 6f 00 00 00
0x00004b:
cmp edi, 1 ; 0x00004b 83 ff 01
je 0xb5 ; 0x00004e 0f 84 61 00 00 00
0x000054:
cmp edi, 2 ; 0x000054 83 ff 02
je 0x6b ; 0x000057 0f 84 0e 00 00 00
0x00005d:
cmp edi, 3 ; 0x00005d 83 ff 03
jne 0x72 ; 0x000060 0f 85 0c 00 00 00
0x000066:
jmp 0xb5 ; 0x000066 e9 4a 00 00 00
0x00006b:
xor eax, eax ; 0x00006b 33 c0
jmp 0xba ; 0x00006d e9 48 00 00 00
0x000072:
cmp edi, 1 ; 0x000072 83 ff 01
je 0xb5 ; 0x000075 0f 84 3a 00 00 00
0x00007b:
cmp edi, 2 ; 0x00007b 83 ff 02
je 0x9f ; 0x00007e 0f 84 1b 00 00 00
0x000084:
sub edi, 3 ; 0x000084 83 ef 03
mov ecx, edi ; 0x000087 8b cf
cmp ecx, 3 ; 0x000089 83 f9 03
jae 0xb0 ; 0x00008c 0f 83 1e 00 00 00
0x000092:
movabs rax, 0 ; 0x000092 48 b8 00 00 00 00 00 00 00 00
jmp qword ptr [rax + rcx*8] ; 0x00009c ff 24 c8
0x00009f:
xor eax, eax ; 0x00009f 33 c0
jmp 0xba ; 0x0000a1 e9 14 00 00 00
0x0000a6:
mov eax, 2 ; 0x0000a6 b8 02 00 00 00
jmp 0xba ; 0x0000ab e9 0a 00 00 00
0x0000b0:
jmp 0xfd ; 0x0000b0 e9 48 00 00 00
0x0000b5:
mov eax, 1 ; 0x0000b5 b8 01 00 00 00
nop ; 0x0000ba 66 90
add rsp, 8 ; 0x0000bc 48 83 c4 08
ret ; 0x0000c0 c3 and wasm-function[1]:
0x000000: ; 0x000000 from: [0x00000f, 0x00001b, 0x000027, 0x000033, 0x00003f, 0x00004b, 0x000057, 0x000063, 0x00006f, 0x00007b, 0x000087, 0x00008d, 0x000094, 0x00009b, 0x0000a2, 0x0000ac, 0x0000b1]
sub rsp, 8 ; 0x000000 48 83 ec 08
shl esi, 0x10 ; 0x000004 c1 e6 10
or esi, edi ; 0x000007 0b f7
cmp esi, 0x10001 ; 0x000009 81 fe 01 00 01 00
je 0x92 ; 0x00000f 0f 84 7d 00 00 00
0x000015:
cmp esi, 0x10002 ; 0x000015 81 fe 02 00 01 00
je 0xb6 ; 0x00001b 0f 84 95 00 00 00
0x000021:
cmp esi, 0x10003 ; 0x000021 81 fe 03 00 01 00
je 0xb6 ; 0x000027 0f 84 89 00 00 00
0x00002d:
cmp esi, 0x60001 ; 0x00002d 81 fe 01 00 06 00
je 0xb6 ; 0x000033 0f 84 7d 00 00 00
0x000039:
cmp esi, 0x60002 ; 0x000039 81 fe 02 00 06 00
je 0x99 ; 0x00003f 0f 84 54 00 00 00
0x000045:
cmp esi, 0x60003 ; 0x000045 81 fe 03 00 06 00
je 0xb6 ; 0x00004b 0f 84 65 00 00 00
0x000051:
cmp esi, 0x80001 ; 0x000051 81 fe 01 00 08 00
je 0xb6 ; 0x000057 0f 84 59 00 00 00
0x00005d:
cmp esi, 0x80002 ; 0x00005d 81 fe 02 00 08 00
je 0xa0 ; 0x000063 0f 84 37 00 00 00
0x000069:
cmp esi, 0x80003 ; 0x000069 81 fe 03 00 08 00
je 0xb6 ; 0x00006f 0f 84 41 00 00 00
0x000075:
cmp esi, 0x80004 ; 0x000075 81 fe 04 00 08 00
je 0xa7 ; 0x00007b 0f 84 26 00 00 00
0x000081:
cmp esi, 0x80005 ; 0x000081 81 fe 05 00 08 00
jne 0xb1 ; 0x000087 0f 85 24 00 00 00
0x00008d:
jmp 0xb6 ; 0x00008d e9 24 00 00 00
0x000092:
xor eax, eax ; 0x000092 33 c0
jmp 0xbb ; 0x000094 e9 22 00 00 00
0x000099:
xor eax, eax ; 0x000099 33 c0
jmp 0xbb ; 0x00009b e9 1b 00 00 00
0x0000a0:
xor eax, eax ; 0x0000a0 33 c0
jmp 0xbb ; 0x0000a2 e9 14 00 00 00
0x0000a7:
mov eax, 2 ; 0x0000a7 b8 02 00 00 00
jmp 0xbb ; 0x0000ac e9 0a 00 00 00
0x0000b1:
jmp 0xe3 ; 0x0000b1 e9 2d 00 00 00
0x0000b6:
mov eax, 1 ; 0x0000b6 b8 01 00 00 00
nop ; 0x0000bb 66 90
add rsp, 8 ; 0x0000bd 48 83 c4 08
ret ; 0x0000c1 c3 |
The Wasm codegen there seems off. Maybe because Unrelated: Also noticed that something is wrong with Wasm Studio in that if one outputs x86 code, asc breaks. Maybe that conflicts with Binaryen versions or something. |
btw great article how this solving in different languages (C++, Java and C#): |
So it seems that they are all in memory tables. I've been working on it and I've currently updated the compiler so that whenever there is a conversion from a concrete type to an interface (e.g. passed into a function that has an interface as the parameter type), the concrete class is added to the interface's list of implementing classes (the class is also checked if it adheres to the interface, meaning that it handles structural interfaces as well). Then at call sites, the call convert into calling Then at the end of compilation each compiled interface method looks up the corresponding function prototypes of the implementing classes and compiles if necessary, adds to the function table, and collects the table entries. I'm stuck figuring out the reloooper, but am getting close. However, after reading the article I am drawn back to adding in memory vtables and fat pointers seem like the best way to add mixins/traits/ default interface methods. Also above once the interfaces methods are known, we could reassign them method ids so that they are sequential and then recompile the interface methods using the new ids. All in all I think that there is some middle ground here. I think we should start with the table as is and then see how it affects performance. |
Isn't one important difference to the mechanisms in the above article that we have a |
I've finally got the relooper working and am close to pushing what I have. It depends on if we want to leave ourselves open to adding new features that JS doesn't support. Also in the example you give, isn't that more of a syntax issue? Since JS is dynamic you could imagine the syntax |
What I am trying to get at is that JS with its prototypes is more like those languages at the bottom of that article that "do method lookup by name" and we should use that to our advantage. For instance, the type before |
One additional aspect is that we should abstract these internals in a way that one can still call virtual methods from the outside normally. For example, in a case like export abstract class Foo { // or even an interface
abstract bar(): i32;
}
class Baz extends Foo {
bar(): i32 { return 42; }
}
export var foo = new Baz(); one might want to call function Foo#bar(this): i32 {
switch (LOAD_RT_ID(this)) {
case Baz_ID: return Baz#bar(this);
}
return unreachable();
}
function Baz#bar(this): i32 {
return 42;
} Notice how this doesn't even have to |
My current method in #862 still allows for calling the method directly and passing the instance, but does bury the class lookup in the virtual method. So it's a simple change to move that logic into each method. |
Little bit about de-virtualization in LLVM: |
Different approaches for dynamic dispatching with benchmarks (if-else sequence, binary tree dispatch, switch case and vtable): |
fixed by #862 |
In #862 I just added a Is this still the case? If so, it wouldn't be that hard to change now, but I think using the decorator approach allows for more flexibility. |
After some feedback from @MaxGraey I make all overloaded methods virtual by default and as |
Figured that another issue here is that overloading instance fields/properties is problematic. interface Foo {
foo: i32;
} Both a field and a property implementation meet the criteria as of TS, but in AS we are compiling these differently. Field accesses are just loads and do not retain for efficiency purposes (can assume that there's at least one remaining ref), while property getters do retain as these are functions (can return something dropping to RC=0). Must be taken into account by either enforcing overriding with properties or doing some magic under the hood. |
Currently, the compiler does not support virtual overloads of class members, as one would expect from JS, but instead resolves the respective member statically from the contextual class type. This is noted in the docs so far but we should aim at resolving this.
What's necessary here is to decide on a mechanism to resolve the virtual function or field at runtime, using the unique class ids we already have in RTTI plus, possibly, unique ids for each virtual member.
Let's say we have
A#foo()
withfoo
possibly being overloaded by a classB extends A
, the implementation must callB#foo()
here if the actual instance we are dealing with is aB
, not anA
. Interfaces are similar.One mechanism I thought about so far is to use the unique class ids (here: of
A
andB
), and give thefoo
member another internal unique id, so we can generate a runtime lookup function that performs a(memberId, classId) -> functionIndex or fieldOffset
mapping through a compiler-generated big switch. The implementation on the compiler side should skip making a runtime lookup if it is not necessary (that is: the class has no subclasses or respective interfaces) for perf reasons. If it is necessary, calling a virtually overloaded functions becomes acall_indirect
.This isn't exactly scientific but is solely based on what I think would work, so if this problem has already been solved in better ways in the past, feel free to comment :)
@MaxGraey, @willemneal, @bowenwang1996
The text was updated successfully, but these errors were encountered: