Skip to content

Lazy loading of types #557

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 61 commits into from
Jun 19, 2020
Merged

Lazy loading of types #557

merged 61 commits into from
Jun 19, 2020

Conversation

shmax
Copy link
Contributor

@shmax shmax commented Oct 17, 2019

Implements #425
Proof of concept of full support for lazy loading of types.

In a nutshell, you can substitute a callable any place that you would ordinarily supply a Type, and it won't be executed by the library until it is actually needed.

For example, you can create a Types class that looks like this:

class Types {
	public static function get(string $classname)
	{
		return function() use ($classname) {
			return static::fromCache($classname);
		};
	}

	protected static function fromCache(string $classname) {
		$parts = explode("\\", $classname);
		$cacheName = strtolower(preg_replace('~Type$~', '', $parts[count($parts) - 1]));
		$type = null;

		if (!isset(self::$types[$cacheName])) {
			if (class_exists($classname)) {
				$type = new $classname();
			}

			self::$types[$cacheName] = $type;
		}

		$type = self::$types[$cacheName];

		if (!$type) {
			throw new Exception("Unknown graphql type: " . $classname);
		}
		return $type;
	}

    public static function currency() { return static::get(CurrencyType::class);}
    // ... I explicitly stub out these type methods for the benefit of intellisense and [stanning](https://github.com/phpstan/phpstan), but they're not necessary
}
    

And then your field/arg definitions look the same as usual:

...
	'currency' => [
		'type' => Types::currency(), // this won't load until it is specifically needed!
		'description' => 'A currency record',

It may be difficult to imagine this being of any appreciable benefit for small or toy programs, but in my real project I dynamically generate my mutation types (using this) and doing even a small mutation triggers a flood of activity and the instantiation of 656 types(!).

With this change, that number drops to 1.

I have run this in the profiler:
without lazy loading
with lazy loading

For me, this equates to a savings of around a quarter of a second and 3MB of memory.

So, what do you think? It's only around 50 lines of mostly trivial code. If you think it looks promising I'll start in on tests and documentation and further polishing.

@shmax shmax changed the title lazy load types Lazy loading of types Oct 17, 2019
@vladar
Copy link
Member

vladar commented Oct 17, 2019

Thanks for the PR. Ironically we had this in place before and removed in #35

I realize that true lazy types are a MUST but one other option worth exploring is to allow defining types using SDL instead (and maybe always require type loader to be defined):

$type = new ObjectType([
    'name' => 'Foo',
    'fields' => function() {
        return [
            'bar' => [
                'type' => 'String!',
            ],
        ];
    }
]);

This will likely require a significant rework but it provides some other benefits as well in addition to real lazy-loading (less typing, also automatically ensure there is only one instance of a type in schema).

I guess the approach we choose should be weighted carefully, so I encourage everyone interested to post their opinions here.

@mcg-web
Copy link
Collaborator

mcg-web commented Oct 17, 2019

That exactly the way we has designed the GraphQL bundle and this really ease types declaration. I'm 100% for this change! This will not be a big change since we can make both declaration lives together by requiring type loader only if type is string instead of an object.

@shmax
Copy link
Contributor Author

shmax commented Oct 17, 2019

Thanks much for the quick feedback!

(less typing, also automatically ensure there is only one instance of a type in schema)

I don't personally see "less typing" happening, at least not with my coding style--I would never type string literals directly into my field values; too much room for human error. I would just wind up using a constant, anyway, so it would just look like this:

...
	'currency' => [
		'type' => CurrencyType::class,
		'description' => 'A currency record',  

also automatically ensure there is only one instance of a type in schema

I believe your library already throws an exception if the same name pops up twice.

Finally, in the case of my mutation library, the name of the generated type isn't known until it's generated, so I must have callable support.

All that said, I think it would be pretty trivial to also support string values. If you look at my Types::get method, you can see it already operates on a string, so it would just be a matter of routing to the type loader, as you suggested.

@vladar
Copy link
Member

vladar commented Oct 17, 2019

I don't personally see "less typing" happening

Compare [String!]! and Type::nonNull(Type::listOf(Type::nonNull(Type::string())))

FYI Blackfire profiles seem to be unavailable for global access (maybe require a login)

@shmax
Copy link
Contributor Author

shmax commented Oct 17, 2019

Compare [String!]! and Type::nonNull(Type::listOf(Type::nonNull(Type::string())))

Well, that's a built-in type, and it wouldn't benefit from lazy-loading, right? Certainly we should consider if any of this makes sense for SDL, but I'm not sure I see the wisdom in mixing the two things together. And you still have the human error angle in your approach.

@mcg-web
Copy link
Collaborator

mcg-web commented Oct 17, 2019

See how with some light checks we can make this less error prone.

@shmax
Copy link
Contributor Author

shmax commented Oct 17, 2019

FYI Blackfire profiles seem to be unavailable for global access (maybe require a login)

Ah, sorry, I'm still sort of new to Blackfire. They should be public, now.

@shmax
Copy link
Contributor Author

shmax commented Oct 17, 2019

See how with some light checks we can make this less error prone.

I appreciate the thought, but it doesn't seem wise to me to be repeating the work that is already performed by the lexer.

But why would I be parsing and lexing? I define my schemas in PHP instead of SDL because they tend to be dynamic and I want what little safety can be supplied by the language checker itself.

We may have different ideas about what it means to pass a string to the type loader--to me, it's just any identifier that has meaning to you; it's up to you to interpret it. In my sample, the string is the namespace of the class itself (and I instantiate it directly when the time comes), so there is little room for human error.

If I wanted to use SDL, I would use SDL. Trying to mix it in with schemas defined with PHP idioms seems like a bad idea to me; for one thing, you're now adding more work back into the equation in the form of additional parsing and lexing that doesn't need to be there.

But what about full SDL? Does any of this make sense? I don't know, yet; but I'll find out.

@shmax shmax force-pushed the lazy-load-type branch 2 times, most recently from 234cb87 to 9d0babb Compare October 19, 2019 19:20
@shmax
Copy link
Contributor Author

shmax commented Oct 20, 2019

Hi @vladar and @mcg-web, I'm working away on tests and cleanup, but I have a question about this:

https://github.com/webonyx/graphql-php/blob/master/src/Type/Definition/FieldDefinition.php#L180

This is annotated as Type for the static checker, but then you have an entire test class that tries writing different kooky values to it (for example, here).

This seems to weaken the effectiveness of the static type checking. If you're serious about FieldDefinition::$type being a Type (or with this PR, a callable), would it not make more sense to throw an error in the FieldDefinition constructor, rather than waiting for the validation step?

Please let me know your thoughts so I can make the appropriate changes to the constructor and tests.

@shmax
Copy link
Contributor Author

shmax commented Oct 21, 2019

I took a stab at writing some tests, and fixed enough of the scrutinizer noise for it to pass.

Regarding SDL string values for types: it may or may not be a good idea, but unless I'm mistaken it can't replace function-based loading. At best it would have to be an alternative option to what I'm proposing, here. If we can agree on that, then I'd just as soon let someone else figure it out if they can come up with a good use case for it.

In short, it's a different feature than this one, and the two ideas seem to be orthogonal to one another, so it's not necessary to solve both of them at the same time.

I also had a look at full SDL-defined schemas, and as far as I can tell they don't overlap with lazy type loading, either.

So, please tell me what we need to do move forward with this. Thanks!

@vladar
Copy link
Member

vladar commented Oct 21, 2019

This is annotated as Type for the static checker, but then you have an entire test class that tries writing different kooky values to it (for example, here).

Those fields are effectively user input. The idea was that we didn't want to validate user input (provided as config arrays) on every request.

Instead, you choose when to validate them (you can run schema->validate() in the dev environment only + maybe during some build/deploy step)

Static type checking doesn't help much when it comes to validating config arrays.

@vladar
Copy link
Member

vladar commented Oct 21, 2019

but unless I'm mistaken it can't replace function-based loading

Why do you think so? The idea is that typeLoader remains the same, but you specify field types as SDL expressions vs. object references (or functions returning those object references)

In short, it's a different feature than this one, and the two ideas seem to be orthogonal to one another, so it's not necessary to solve both of them at the same time.

For me, those are mutually exclusive, mostly because it is a pain to maintain multiple ways of defining type relations, especially when people start mixing different styles.

And I see how SDL-based types fix the lazy-loading (as well as some other things mentioned above). Obviously, it has its downsides (parsing overhead, no strict typing, etc.)

The question is about the trade-offs we choose.

So, please tell me what we need to do to move forward with this. Thanks!

It is highly likely that we'll go the route you propose (as it's easier and backwards compatible). But I'd like to hear more feedback from people using the library before making a decision.

@shmax
Copy link
Contributor Author

shmax commented Oct 22, 2019

Those fields are effectively user input. The idea was that we didn't want to validate user input (provided as config arrays) on every request.

Instead, you choose when to validate them (you can run schema->validate() in the dev environment only + maybe during some build/deploy step)

It's not user input, though. If inline schemas are an alternative to SDL, and SDL is not user input, then neither are inline schemas. This kind of "validation" is ideally handled by the PHP language, or static checking.

Static type checking doesn't help much when it comes to validating config arrays.

It does help. You have to raise your PHP stan level to 3, and use array shapes in your doc blocks (The feature isn't quite ready for prime time; for some reason it doesn't support multiline array definitions, which is probably a deal-breaker for the kind of complexity we have). I've been experimenting with it in a separate branch, and it seems to work well.

But that doesn't matter in this case. Ideally, I should be able to use a native PHP typehint of type Type for my Type::resolveLazyType method, but I can't at the moment because you allow all those weird values through. If you threw an exception early I wouldn't have to worry about it.

Granted, this can probably wait until we work on PHPStan level 3, which I wouldn't personally want to tackle without the array shape stuff (which I'm currently investigating; will let you know).

@shmax
Copy link
Contributor Author

shmax commented Oct 22, 2019

Why do you think so?

I assume we're talking about something like this, right?

type: "MySpecialLazyLoadedType"

This could work for trivial cases, but the problem here is that if you require me to know the name of a type, then it's not truly being lazy-loaded. The power of inline schemas is that they are programmatic. We don't necessarily know the fields of our type until we create it, and we don't necessarily know the name until we create it.

It is highly likely that we'll go the route you propose (as it's easier and backwards compatible). But I'd like to hear more feedback from people using the library before making a decision.

Wonderful! Completely understandable, and there's no real rush, here (I have my fork to work with, after all). But is there anyone we should ping?

@vladar
Copy link
Member

vladar commented Oct 23, 2019

It does help. You have to raise your PHP stan level to 3, and use array shapes in your doc blocks

This is interesting! But there is a bunch of projects not using PHPStan who still use this library (for example in the old-school CMS world). So given the wide range of use-cases, this is not really an option for the validation at the library level.

We don't necessarily know the fields of our type until we create it, and we don't necessarily know the name until we create it.

This doesn't sound right to me. Lazy types are somewhat similar to PHP classes autoloading. If your clients know what they are querying (relying on a static schema), how does it happen that you do not know it?

But is there anyone we should ping?

I guess we'll have to create a separate issue (with options listed here), mention it in the next release notes to collect some feedback.

@shmax
Copy link
Contributor Author

shmax commented Oct 23, 2019

This is interesting! But there is a bunch of projects not using PHPStan who still use this library (for example in the old-school CMS world). So given the wide range of use-cases, this is not really an option for the validation at the library level.

Right, you can't assume people are using PHPStan, but we still have native PHP typehints. I should be able to just typehint my return as Type, but I currently can't because of the architectural decision to not do any kind of runtime type-checking until the validation stage. I'm not going to press the issue right now--let's save it for PHPStan level 3, which will force us to reevaluate a lot of things (for example, OutputType does not itself implement AbstractType, so you can't really use it in places expecting Type).

This doesn't sound right to me. Lazy types are somewhat similar to PHP classes autoloading. If your clients know what they are querying (relying on a static schema), how does it happen that you do not know it?

Clients don't generally use the names of types in their queries (there are exceptions, such as specifying query variables). They use the names of fields and args. Where do they get the names of the fields and args? Well, they inspect the API, which is generated from an introspection query, which forces all types to load.

The concept is similar to--but not identical--to PHP auto-loading. In auto-loading, there is no wiggle room--the requested name must match the name of the class, which must match the name of the file (of course, you can write your own auto-loader to do something different). With GraphQl types, the name is just one of many properties on the type. It doesn't have to match the name of the class (although that's what will happen if you allow it to be inferred)--you can use anything you like. Since it is up to the author of the inline schema to come up with a name, it is programmatic. There is no rule that says it must match the name of the class, or even that your type must subclass anything.

Hopefully you can agree with all that so far. In summary:

  1. Inline schemas are programmatic
  2. The name is just one of many properties that will be decided by the developer.
  3. There are no hard rules about what the name should be.

Therefore, a truly lazy-loaded type can't require advance knowledge of the name any more than it can require advance knowledge of the fields.

I wrote an entire library around automatically generating types: https://github.com/shmax/graphql-php-validation-toolkit

Lazy loading these generated types is simply not possible with your string-based idea.

@shmax
Copy link
Contributor Author

shmax commented Oct 23, 2019

By the way, I have a PR open to support multiline array shape syntax: phpstan/phpdoc-parser#33

Seems to be doing okay. The next roadblock is that slevomat/coding-standard has not done a release of the array shape support (or any release since March), and the maintainer is being bizarre about it

Copy link
Member

@vladar vladar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job! I'd like to merge it before the long-delayed 14.0.0 version release. Requested couple changes to make it happen.

$typeName,
Utils::printSafe($type)
)
);
}

$type = Type::resolveLazyType($type);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we should allow typeLoader to return callable. It is basically useless because when it's called - we already need the actual type. Do you have any reasons to allow it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't understand--what are you suggesting?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest disallowing typeLoader to return callable. It should always return a specific type.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand what you mean @vladar

There are two levels of lazy loading going on here:

  1. Calling $typeLoader($typeName)
  2. Checking if the result of that is yet another callable and calling that again

Why would 2. be necessary? As you noted, the two calls happen right after each other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this what you guys had in mind? 484d2dd

I dunno, I guess I'm okay with it, but I kind of like the idea that executing a lazy type is entirely the library's responsibility, and all the client has to do is deliver a callable or a type, but I suppose I can go either way. Let me know what you think.

Copy link
Collaborator

@spawnia spawnia Jun 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shmax thanks for showing the impact of changing the approach in that commit.

I can see how it can more convenient to implement if the type loading is lazy in all circumstances. It allows using the same lazy type reference in any place that expects a type, including the type loader.

The performance impact should be pretty minimal, there is one extra call to check is_callable() for every type loaded during a request. I think the extra convenience in development and the simplicity are worth it.

@vladar can you make a final call? My opinion is that the initial approach that @shmax took was fine and we can revert 484d2dd

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I see how this improves DX, let's restore it in a follow-up PR!

@shmax
Copy link
Contributor Author

shmax commented Nov 2, 2019

@vladar Thanks for all the comments! If you don't mind, I think I would like to wait for the PHPStan Level 3 branch to merge, then I'll do another pass over this and clean everything up.

@vladar vladar mentioned this pull request Nov 3, 2019
@vladar
Copy link
Member

vladar commented Nov 16, 2019

Not sure if it's ready for another review pass? Please ping me when you are done.

@shmax
Copy link
Contributor Author

shmax commented Jun 13, 2020

@vladar I think we're ready for another review when you're ready.

Copy link
Member

@vladar vladar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. We definitely need to update documentation though and some follow-up work may be required. But let's ship this PR as is 🚢

Great work! 👍

@vladar vladar merged commit fdc5808 into webonyx:master Jun 19, 2020
@shmax
Copy link
Contributor Author

shmax commented Jun 19, 2020

@vladar Yee-haw! Thanks for reviewing. I'm available to do touch-ups, and I'll get started on the assert stuff I've been yapping about. Did you have any final input on #557 (comment) ?

@vladar
Copy link
Member

vladar commented Jun 19, 2020

Yeah, I see your point of doing this now. I guess it makes sense to restore the original version allowing callbacks. Mind creating another follow-up PR for this?

@shmax
Copy link
Contributor Author

shmax commented Jun 19, 2020

Sure, will work on it this weekend. Thanks for your support.

@shmax shmax deleted the lazy-load-type branch June 20, 2020 01:05
@unclecheese
Copy link

Love this. Thank you, everyone. Can someone tell me, is there a performance benefit to storing the type definitions in separate files and adding a require_once() call to the cache fetch? Seems like if your query only needs one or two types, it would save a lot of parsing time.

@shmax
Copy link
Contributor Author

shmax commented Jul 13, 2020

Love this. Thank you, everyone. Can someone tell me, is there a performance benefit to storing the type definitions in separate files and adding a require_once() call to the cache fetch? Seems like if your query only needs one or two types, it would save a lot of parsing time.

That's an interesting idea--I haven't gotten that far, myself, but I love the idea. As our first lazy load test pilot, why don't you give it a try and let us know?

@unclecheese
Copy link

Yup, will do. My schema is 28,000 lines of generated PHP, so if it has an effect, I'm sure I'll see it. :)

@shmax
Copy link
Contributor Author

shmax commented Jul 13, 2020

save a lot of parsing time.

By the way, are you not using opcache?

@unclecheese
Copy link

I haven't gotten that far yet. Clearly that would make a big difference, but I'm just hacking away on a Vagrant box ATM, and I'm not sure what it has configured.

@shmax
Copy link
Contributor Author

shmax commented Jul 14, 2020

Well, it would probably be easier than refactoring your code, and it would benefit everything. All the cool kids are doing it...

vladar pushed a commit that referenced this pull request Jul 21, 2020
* Implement safe lazy initialization

Let's continue the discussion we are having here: #557 (comment)

* Fix codestyle

* Fix bugs

* Fix codestyle, again

* Fix phpstan issue and regen baseline

* Ensure the baseline strictly shrinks

* Typehint

* Fix codestyle

* baseline

* baseline
@spawnia spawnia mentioned this pull request Aug 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants