Skip to content

When are generic arguments resolved? #43

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
nikic opened this issue Jan 22, 2020 · 9 comments
Open

When are generic arguments resolved? #43

nikic opened this issue Jan 22, 2020 · 9 comments

Comments

@nikic
Copy link
Collaborator

nikic commented Jan 22, 2020

In PHP, types are usually resolved at the latest possible point. For example, if you write

function foo(): Bar {}

then we will not load class Bar when the function is declared. The type is only loaded when the type is checked, and even then only when necessary. This is fairly important, because we do not want to load a large number of classes that are ultimately unused (apart from appearing in type checks).

Generic arguments are just types, and the question is now: When should those be resolved? I think it's pretty clear that something like

function foo(): Bar<Baz> {}

should resolve neither Bar nor Baz when the function is declared and only do so when the type is checked. The most interesting case is what happens to generic class uses outside type declarations, for example when inheriting from a generic class:

abstract class Foo<T> {} // generic type param unused
class Bar extends Foo<UnknownClass> {}
$bar = new Bar;

Note that the type parameter T is unused, so even if UnknownClass doesn't exist, this code could still work fine. Should it though? This would be consistent with types being lazy-resolved in general and prevents unnecessary type loading, but based on chat feedback, people seem to prefer getting an undefined class error here.

A variation on this is something like Foo<DateTime|IntlDateTimeZone>, where say the first type is known (builtin) but the second one may or may not be available (optional extension). Writing this kind of type would be precluded if types are resolved eagerly.

From an implementation point of view, it would probably be easier to early resolve types that are involved in inheritance, because it allows canonicalizing bound generic arguments...

@nikic
Copy link
Collaborator Author

nikic commented Jan 22, 2020

cc @bwoebi

@morrisonlevi
Copy link
Collaborator

My thoughts:

  1. Types involved directly in inheritance should probably be eager loaded.
  2. Methods used in inheritance should be loaded when those types get checked, hopefully using the same mechanism we currently use for delayed inheritance checks.
  3. Freestanding functions only need runtime checks, so no need to load anything... except don't we have an edge case with aliased classes?

@nikic
Copy link
Collaborator Author

nikic commented Jan 22, 2020

Types involved directly in inheritance should probably be eager loaded.

I think there are basically two areas we have: Those where class names are currently not immediately loaded, which is type declarations and instanceof, and those where they are, which is everything else :) We'd probably want to have the same behavior in inheritance as anywhere else that immediately loads classes.

This does mean that it affects class instantiation as well, and in particular writing something like new Collection<DateTime|IntlDateTimeZone> where the latter class doesn't exist will not be possible.

Is that something we're okay with?

@markrandall
Copy link

As mentioned in chat, I believe that the emphasis should be on reporting the error at the earliest opportunity, and be aggressive in doing so. Anything which can be errored out at compile time, rather than runtime, is big benefit for reliability, and presumably allows shifting at least some of the work onto preloading (where available).

As this is new functionality, I am heavily in favour of going straight to check-on-declare.

If this behaviour is used from the start, it will allow certain future modifications without changing loading behaviour, for example a template class used for inheritance could be fully resolved and then baked so it could be stored in opcache.

@nikic
Copy link
Collaborator Author

nikic commented Jan 22, 2020

Also related question: When are type bounds an default types resolved? That is class Foo<T : Bound> and class Foo<T = Default>. I guess that if we eagerly resolve generic arguments, then it would also make sense to eagerly resolve type bounds and defaults when creating the class.

@bwoebi
Copy link

bwoebi commented Jan 22, 2020

We generally try to resolve as less as possible to avoid issues with:
a) circular dependencies and order of declaration
b) Allow usage of actual unknown, not required, types.

a) I remember what a mess that was to cheeck variance...

Regarding b) e.g. Amp accepts arrays of react and amp promises for its combinators, effectively array<AmpPromise|ReactPromise> , though many amp users do not actually pull react. This would effectively disallow proper typing at that place. Now you might argue that it's probably most problematic for arrays and we could allow it there, but that sounds like a weird asymmetry.

And ultimately it will have some overhead, which, I think, we should avoid, in particular in the short-lived world of PHP requests. I'm unable to estimate how much code would be referenced by this (loads triggering even more loads). (In particular with generics nearly everything can be and most probably will be typed.)

I guess the desire to resolve them immediately is understandable (you want to catch errors as soon as possible), though I see significant drawbacks here.

I would resolve bounds and defaults at exact the same time the type is needed for the first time, just like for explicitly defined types.

@nikic
Copy link
Collaborator Author

nikic commented Jan 23, 2020

Regarding b) e.g. Amp accepts arrays of react and amp promises for its combinators, effectively array<AmpPromise|ReactPromise> , though many amp users do not actually pull react. This would effectively disallow proper typing at that place. Now you might argue that it's probably most problematic for arrays and we could allow it there, but that sounds like a weird asymmetry.

Depending on where we draw the line, this could still work. If we say that generic parameters in type hints are lazy resolved, and we say that array<out T> thanks to by-val, then array<AmpPromise|ReactPromise> could still accept an array<AmpPromise> from the consumer without requiring that ReactPromise be known. It would only not be possible to actually instantiate an array<AmpPromise|ReactPromise> if ReactPromise is not known.

That's one possible choice of semantics, not saying that it's necessarily the right one.

And ultimately it will have some overhead, which, I think, we should avoid, in particular in the short-lived world of PHP requests. I'm unable to estimate how much code would be referenced by this (loads triggering even more loads). (In particular with generics nearly everything can be and most probably will be typed.)

The question of overhead is not so simple. First, how likely is it that new Collection<User>(), on further use, will actually get away with not loading the User class? Doesn't seem particularly likely. Second, this allows us to guarantee that the bound generic parameters are always in resolved (and likely canonicalized) form, which is going to make all further subtyping checks on them faster.

@markrandall
Copy link

I think this one has a answer found not in generics, but in union types themselves, and that is to modify them to accept optional types which if unable to be loaded / resolved, are either silently discarded from the list or replaced with an inbuilt MISSING type which will never match anything.

array<AmpPromise|@ReactPromise>

I'm not so super worried about lots of loading classes, I suspect there's maybe additional optimisations to make in the autoloader (skipping calling userland code) and perhaps even taking a page out of preloading's book and keeping them cached (and suitably namespaced), provided that enabling that cache is done before any autoload calls.

@bwoebi
Copy link

bwoebi commented Jan 23, 2020

I agree with you there - except that we actually should be able to instantiate such a collection (if we don't focus on the array special case here). Like a list of pending promises. I agree with the idea by @marandall of possibly prefixing the type with @ to indicate some optional type, which does not trigger a load then.

I guess the combination instantiation time plus explicitly as optional declared unknown types for instantiations would be most flexible. That way round you still leave all doors open while being safe by default.

From a performance perspective I think its fair to force a class load on instantiation time. (But not at compile or link time.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants