-
-
Notifications
You must be signed in to change notification settings - Fork 564
Horizontal eval for lazy loading same-type objects #60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Some kind of preview would also be great: after parsing the query, without any eval, ask it which types are being requested. Or on any level, ask which types are on the next level(s). Any idea? |
Yes this is a well known N+1 problem. And you are correct - executor uses Depth-First algorithm vs Breadh-First (same as reference implementation). I posted this comment one day about how to workaround this in the user land. As for solution in core - there is no such thing yet. But I do understand that we need one. The question is which approach will work best (and if I can find enough time to implement it). One option is to use breadth-first execution as you suggest. But this is really major change and I never found enough time to experiment with it (I actually tracked some other implementations, like .NET which also had plans for breadth-first execution, but never started too). Other option is to add rather minor modification to support Deferred (Lazy) values which will enable batching. Then your resolver would look like this: 'resolve' => function($obj) {
$this->someBuffer[] = $obj->someId;
return new GraphQL\Deferred(function() use ($obj) {
$values = DataLoader::loadAll($this->someBuffer); // must load stuff only once obviously
return isset($values[$obj->someId]) ? $obj->someId : null;
});
} Other suggested syntax: 'buffer' => function($obj) {
$this->someBuffer[] = $obj>someId;
},
'resolve' => function($obj) {
$values = DataLoader::loadAll($this->someBuffer); // must load stuff only once obviously
return isset($values[$obj->someId]) ? $obj->someId : null;
} (see also this thread for related discussion) This is actually how JS version works (Facebook created separate project called DataLoader to help implementing this approach). This way of doing things is probably less powerful than BFS, but it is realsitic in terms of finding time to implement at some point %) Plus it is rather easy to understand. Anyway, if you find it interesting to experiment with breadth-first execution, I am open to ideas and PRs. |
I don't understand how any of these solutions would work. If current implementation really is depth-first, it would have to resolve the first item before evaluating the second, right? So the buffer would never be used. I can't see how this would be possible with depth-first... The core would have to support it, no? I like the Deferred solution. It gives a lot of power to the implementor (me). But same problem (or I don't get it): if the executor needs the first object before resolving the second, how can it be deferred? |
The linked issue/comment talks about |
Those solutions are just proposed. They are not implemented yet. Obviously they will require changes to Executor code. Actually it will be a mix of depth-first and breadth-first in such case (JS version does similar thing with promises while still logically working as depth-first algorithm). At this moment the only existing way to workaround this issue is the one described in this comment As for It allowed to optimize some cases (like yours), but it didn't work for many other situations. Which means it was not general enough to maintain, hence reverted. |
I've found another way to userland it. It's also more response size efficient:
Since every With a little more polishing, it could even be very generic and easy on the server, defining it. It's a little more work for the client implementing the results, but it's fewer bytes to download... Is this a known 'solution'? |
It would be even better, for the client mostly, if GraphQL could return an assoc list, to key |
I'd say don't do it %) While it might seem tempting, but you place yourself into very dark territory by doing this. The reason is that a) it makes query dependent of execution order, b) stateful For example in JS app order of fields execution is not guaranteed (they might run parallel executors for different fields some day) and you may end up with very weird situations which are very hard to debug. While it is unlikely that we'll have parallel executors in PHP world, but still - good workaround shouldn't force you to rewrite queries. Have you tried original workaround proposed? While it is also not perfect, but at least you don't have to rewrite queries like this. |
Another option is to analyze nested fields in some parent field (e.g. in For example: 'resolve' => function($user, $args, $context, ResolveInfo $info) {
$nestedFields = $info->getFieldSelection(/* to depth */ 3);
// analyze $nestedFields here to prefetch data
if (isset($nestedFields['resource'])) {
// prefetch by user id
}
if (isset($nestedFields['resource']['expensive']) {
// prefetch even deeper
}
} Not sure if it will work for your case with interfaces, but still may be useful. |
I haven't tried your workaround yet =) I didn't understand it before, but I think I do now. Just to be clear: remember the list of But I like I'll try both soon. |
I already encountered dark territory btw =) It works decently for to-1 relationships like Thanks for all the quick help! |
Ah crap. |
but I'm probably missing complex something in That would make
and that works for me. This allows eager loading, which makes a big difference in the number of db queries. Is there a reason |
I guess we simply missed that when implemented ResolveInfo. Feel free to add a PR for this or I can add it based on your comment some time later. |
As for cases like |
In my framework, I can eager load all kind of relationships, to-1 and to-many. The active records know where to fetch what for which ids:
If I know what to expect (with We're getting somewhere! Thanks. I'm still trying your original workaround suggestion some time soon. |
@vladar @rudiedirkx I tried yesterday to quickly implement dataloader in PHP with ReactPHP and it works quite nicely : https://github.com/lordthorzonus/php-dataloader (Still quite much a work in progress though ). I'm interested in trying it out with the package. Basically I would need to edit the Executors Then it should be just this at the endpoint : $response = null;
$eventLoop // get this from somewhere (the same event loop should be provided for the dataprovider).
GraphQL::execute(...)->then(function($result) use (&$response) {
$response = $result;
});
$eventLoop->run();
echo json_encode($response); I think I'll have time to play around with this later this week :). |
Nice. Feel free to send a PR if you make it work. Also note that Executor follows closely JS version which already supports promises, so it might make sense to simply follow their solution. The problem though is how to test this feature with PHPUnit / Travis CI. |
@lordthorzonus I like how they did it in the python graphql-core package: |
Promises in PHP. Makes no sense to me at all. If #61 gets in, or something like it, the server implemenation can eager load most efficiently with existing architecture. Works for any number of levels down. Some kind of data loader would be nice too, but faking promises in PHP seems silly. |
So many different opinions and best/good/decent practices! Speed is most important to me. The more awesome and more options, the slower, probably. |
@vladar I'm starting to work on a solution using the pre-fetch idea you pitched here and using a central repository to store/retrieve the data in the resolve functions. However, using |
@chrissm79 Full-featured support for such things as arguments and directives actually means that you need to visit related ASTs %) It makes little sense to introduce new structures that will basically duplicate existing AST. So the simple answer - just use Also note that |
@chrissm79 After some thinking I realized that I may be wrong and convenient look-ahead tool in the core could be useful (at least it is requested often). So I created #65 to discuss possible implementation. Feel free to post your idea / use-cases there. Yet I doubt that I'll find time for this anytime soon, so any help would be appreciated. |
@vladar Really excited about the possibilities here! Adjusting the /**
* Prefetch data.
*
* @param ResolveInfo $info
* @param int $depth
* @return void
*/
public function prefetch(ResolveInfo $info, $depth = 1)
{
$fields = [];
/** @var Field $fieldAST */
foreach ($info->fieldASTs as $fieldAST) {
$fields = array_merge_recursive($fields, $this->foldSelectionSet($fieldAST->selectionSet, $depth));
}
return $fields;
}
/**
* Fold field selection set.
*
* @param SelectionSet $selectionSet
* @param int $descend
* @return array
*/
private function foldSelectionSet(SelectionSet $selectionSet, $descend)
{
$fields = [];
foreach ($selectionSet->selections as $selectionAST) {
if ($selectionAST instanceof Field) {
$fields[$selectionAST->name->value] = $descend > 0 && !empty($selectionAST->selectionSet)
? $this->buildOutput($selectionAST, ['children' => $this->foldSelectionSet($selectionAST->selectionSet, $descend - 1)])
: $this->buildOutput($selectionAST);
} else if ($selectionAST instanceof FragmentSpread) {
$spreadName = $selectionAST->name->value;
if (isset($this->fragments[$spreadName])) {
/** @var FragmentDefinition $fragment */
$fragment = $this->fragments[$spreadName];
$fields += $this->foldSelectionSet($fragment->selectionSet, $descend);
}
}
}
return $fields;
}
/**
* Build field output.
*
* @param Field $field
* @param array $data
* @return array
*/
private function buildOutput(Field $field, $data = [])
{
$args = [];
foreach ($field->arguments as $argument) {
$args[$argument->name->value] = $argument->value->value;
}
return array_merge([
'parent' => isset($data['children']),
'args' => $args,
], $data);
} |
When I first learned about GraphQL there was a performance chapter. If every relation loads every single relation, it requires many queries to load a few levels of data, even if the actual data is only on the last level. The solution was to lazy load objects: first collect ids (horizontal), and then load all objects at once, and then evaluate the next level.
Just in case. This query:
would take 20 queries to find 10 reservations > 10x 1 resource > 10x 1 sport. If it would eval horizontally, it would do 2 queries: load 10 resources, then load 10 sports. (Maybe not 2 levels, but even 1 could be a big difference.)
Is that a thing? It seems like the query is evaled vertically first, so it loads the entire 1 reservation (1 resource + 1 sport), and then the entire 1 next (1 resource + 1 sport) etc.
The text was updated successfully, but these errors were encountered: