Adding select_related and prefetch_related optimizations #220

spockNinja · 2017-07-20T04:07:07Z

Using a combination of Model meta and the graphql AST, we are able to use django's queryset.select_related() and queryset.prefetch_related() to reduce the number of queries run during a graphql query resolution.

These optimizations help a lot with the N+1 problem faced when using an ORM.

I have written the primary entry point (optimizations.optimize_queryset) such that it can be imported and used in any graphql resolution function. (See resolve_article in the tests). I then used it in several core places, including DjangoConnectionField, DjangoListField, and DjangoObjectType.

There will always be cases when the graphql representation does not match the model meta, either an alias or perhaps a complicated model @property. In those cases, you can specify manual optimizations in the DjangoObjectType Meta. Those optimizations look like:

class MyObject(DjangoObjectType):
    special_property = graphene.SomeField()

    class Meta:
        model = MyObjectModel
        optimizations = {
            'special_property': {
                'prefetch': ['relations__to__prefetch'],
                'select': ['relation__to__select']
            }
        }

Closes #57

coveralls · 2017-07-20T04:09:47Z

Coverage increased (+0.4%) to 93.092% when pulling 1230da5 on spockNinja:feature/optimization into 0588f89 on graphql-python:master.

coveralls · 2017-07-20T04:14:29Z

Coverage increased (+0.4%) to 93.1% when pulling ef22aff on spockNinja:feature/optimization into 0588f89 on graphql-python:master.

coveralls · 2017-07-20T04:28:39Z

Coverage increased (+0.4%) to 93.122% when pulling 0defa3a on spockNinja:feature/optimization into 0588f89 on graphql-python:master.

coveralls · 2017-07-20T04:32:32Z

Coverage increased (+0.4%) to 93.122% when pulling 629abd0 on spockNinja:feature/optimization into 0588f89 on graphql-python:master.

spockNinja · 2017-07-20T04:36:27Z

@syrusakbary

I added a few high-level tests to assert query counts. All other tests are passing.

If you like this improvement in general, I will whip up some docs. Also let me know if there is anything else that you would like tests written for.

mjtamlyn · 2017-07-20T09:08:35Z

Hi! This looks like a really good start, I personally would be keen to see something like this merge. A couple of notes:

Personally, I tend to prefer the prefetch in most cases. It should certainly be possible to override this behaviour on certain FK links, as the joins can become quite expensive. My version of this problem ignored select related completely.

I've found it necessary to be able to apply more complex optimisations than just a simple prefetch_related, by using a Prefetch object. It's also quite common for custom fields on a child object to depend on an attribute of the parent. I think you've touched on that with the Meta.optimizations, but I think that probable needs a more flexible API. I came up with an API of a classmethod called optimize_<field_name>(cls, queryset, context). This also allows you to apply optimizations only for certain user types for example - it doesn't matter whether the request asks for a "liked this post" field if the user is anonymous we know the answer is false and don't need to optimise it. This could actually be a more general API that isn't so dependent on the ORM or Django.

My project is starting to grow some microservices, and I recently had cause to add a post_process_<field> hook alongside the preprocessing optimize_<field> hook, so that prefetch_related-style optimizations can be applied over the microservice boundary. Run a bulk request to the service after the queryset has loaded, and annotate the queryset objects with the relevant information. This is probably out of scope for this particular project, but I though it's interesting to mention.

I'm concerned about your field extraction - I have a suspicion that it won't handle fragments correctly. For users of Relay, almost every request will include an extensive number of fragments, so extracting the actual requested fields from these is important.

I know the documentation now recommends a data-loader clone style approach using promises and lazy resolution. To me, this isn't a very "Django-flavoured" approach to the N+1 problem, and I've not had the time to experiment with it to see if it works. I would be quite interested in Syrus' thoughts.

spockNinja · 2017-07-20T15:32:11Z

@mjtamlyn

Thanks for the input!

I will definitely take a look at how fragments show up in the AST. That is certainly something I did not account for.

It may not be immediately apparent at a first glance, but the optimization does follow child dependencies and select/prefetch as much as can be inferred by the Model metas.

So a query like:

query {
  topObject {
    firstRelations {
      edges {
        node {
          id
          nestedOneToOne {
            id
            attribute
          }
        }
      }
    }
    firstOneToOne {
      id
      otherAttr
      anotherOneToOne {
        id
      }
    }
  }
}

will result in prefetch_related('first_relations__nested_one_to_one') and select_related('first_one_to_one__nested_one_to_one')

I will admit to not knowing much about the underlying Prefetch object. What kind of things have you done with that, that can't be done with a prefetch_related? Along the same lines, what kind of optimizations have you had to make that are outside the prefetch/select related api?

What is your reason for avoiding select_related? It results in less queries than it's prefetch_related counterpart.

I played with the data-loader a little. It could potentially be useful for some things, I'm sure. My goal with this PR is make simple inferable optimizations based on matches between the graphql ast and Model meta information.

mjtamlyn · 2017-07-20T16:24:12Z

select_related results in a join, which in certain circumstances can be much more expensive if the tables are very large and the dataset is small. Especially when you start chaining 4-5 tables of joins with millions of rows in each table, in order to select half a dozen items from each, the multiple query approach is much faster and for very little performance loss. YMMV.

Prefetch objects are a fully documented feature of Django. Fundamentally, they allow you to customise the prefetched queryset, so for example you can prefetch only "active" objects rather than all of them. Even more crazy things are possible, but might not be a good idea]

My main concern is that overactive inference here, without the ability to customise it further, could clobber existing performance optimisations being applied by users.

spockNinja · 2017-07-20T18:22:58Z

Awesome. Those area all fantastic points. I combed over your gist and I'm pretty sure I understand how it's working now. It took me a moment to figure out how you were handling nested prefetches, but I see now that apply_optimizers is recursive and the Prefetch objects are given an optimized queryset themselves.

Would it be okay if I incorporate parts of that gist into this PR to achieve the desired api?

I'm currently aware of the following changes that need to be made.

Replace the Meta.optimizations with def prefetch_* and def optimize_* hooks.
Utilize the Prefetch api for more flexibility.
Only use prefetch. Let optimize_* handle the case when a select_related is desired.
Inspect fragments (FragmentSpread and InlineFragment) when finding fields in the ast.

mjtamlyn · 2017-07-21T11:16:41Z

Go for it, that code is absolutely fine to use.

(Do note I'm not a contributor to this repo, so we'll have to see whether @syrusakbary likes the ideas)

MichaelrMentele · 2017-09-14T23:41:20Z

This is awesome! We have a home grown graphene-like piece of code that I was looking to replace with Graphene but I was concerned about N+1 queries. Would this be compatible with 1.7?

spockNinja · 2017-09-15T03:32:04Z

@MichaelrMentele I'm guessing you mean Django 1.7? I'm not 100% positive, as the Travis build for graphene-django now only runs for 1.8+

chdsbd · 2018-02-24T17:55:32Z

What's the status on this? Fixing the N+1 query problem would be a huge improvement.

@syrusakbary

ramusus · 2018-02-25T18:42:32Z

@spockNinja, @mjtamlyn guys, take a look at this approach.
I prefer to apply select_related and prefetch_related automatically using model introspection https://github.com/cinemanio/backend/blob/master/cinemanio/api/utils.py#L45-L58

And here is an example of schema: https://github.com/cinemanio/backend/blob/master/cinemanio/api/schema/movie.py#L33-L39.

nyejon · 2018-03-16T12:03:13Z

Is there an example somewhere of how to implement these optimisations manually?

27medkamal · 2018-05-16T07:12:34Z

@ramusus Love your implementation. Would be great if it also allowed for prefetch filters using Django's Prefetch object.

vinayan3 · 2018-05-18T02:03:44Z

I have some hacks that I've done to my schema to have preloading via using a DataLoader. It's not ideal but would love to find out if this or another issue will help with optimizations?

KonstantinSchubert · 2018-06-04T09:26:49Z

@spockNinja Are you still working on this PR or would you prefer if somebody else took over?
@syrusakbary Could you give an indication if there is any interest in merging this PR so we know if it's worth putting time into it?

danpalmer · 2018-07-03T17:58:59Z

graphene_django/optimization.py

+    model_fields = model_fields_as_dict(model)
+    selections = find_model_selections(graphql_ast)
+
+    graphene_obj_type = REGISTRY.get_type_for_model(model)


What's the behaviour of this when there are multiple types mapped to the same model? As far as I remember that case isn't prohibited.

danpalmer · 2018-07-03T18:00:53Z

graphene_django/optimization.py

+
+REGISTRY = get_global_registry()
+SELECT = 'select'
+PREFETCH = 'prefetch'


Could make this an Enum instead.

danpalmer · 2018-07-03T18:04:59Z

graphene_django/filter/fields.py

@@ -75,6 +76,7 @@ def connection_resolver(cls, resolver, connection, default_manager, max_limit,
            data=filter_kwargs,
            queryset=default_manager.get_queryset()
        ).qs
+        qs = optimize_queryset(qs, info)


Note that currently these optimisations will be lost when line 66 in this file runs. That intersects the filtered queryset here with the resolved one, and loses the optimisations in the process.

We've "solved" this by instead resolving in this method, then passing the resolved queryset to the filterset (defaulting to the default_manager.get_queryset() if the resolver doesn't give us a value), and then dropping the merge behaviour above. This may or may not work in the general case, I really don't know, but it's working for us (we've implemented a similar function to your optimize_queryset).

danpalmer · 2018-07-03T18:06:03Z

graphene_django/fields.py

        if iterable is None:
            iterable = default_manager
        iterable = maybe_queryset(iterable)
        if isinstance(iterable, QuerySet):
            if iterable is not default_manager:
                default_queryset = maybe_queryset(default_manager)
                iterable = cls.merge_querysets(default_queryset, iterable)
+            iterable = optimize_queryset(iterable, info)


Interestingly, this optimisation does work because it's come after the call to merge_querysets!

firaskafri · 2019-03-19T17:49:30Z

@spockNinja Still interested in this pull request?

phalt · 2019-05-03T18:15:17Z

I'd like to tidy up some of the PRs, so if we still want to work on this let me know or I will close this in 1 week.

cansin · 2021-02-22T13:58:46Z

@spockNinja @phalt I'd like to raise this PR from the dead. Did you end up introducing a similar mechanism eventually? We are looking to find the most elegant way to reduce the # of SQL queries our Graphene-Django instance is sending to the DB atm.

Jacob Foster added 8 commits July 18, 2017 16:30

Adding query optimization

2d2c5ee

Fix existing test with ast mock

17e9bd3

Account for edges/node nesting in ast

f585518

Optimize DjangoConnectionField too

b206278

Add tests for optimized queries

73f4d49

Add optimization to DjangoFieldList

34e6a90

Test manual optimization

582958c

Simplify optimize_querset signature with qs.model

1230da5

Accept all info for simplicity/futureproofing

ef22aff

Django 1.8 and 1.9 compatability

629abd0

spockNinja force-pushed the feature/optimization branch from 0defa3a to 629abd0 Compare July 20, 2017 04:30

syrusakbary force-pushed the master branch 2 times, most recently from bdc7189 to f93251b Compare September 1, 2017 08:12

spockNinja added the performance/optimization label Sep 15, 2017

syrusakbary removed the performance/optimization label Nov 15, 2017

jamesturk mentioned this pull request Nov 23, 2017

bring this branch up to speed w/ latest graphene 2.0 spockNinja/graphene-django#1

Closed

spockNinja mentioned this pull request Nov 23, 2017

Prefetch_related in root resolver not preserved in children resolvers #327

Closed

danpalmer reviewed Jul 3, 2018

View reviewed changes

firaskafri added ✨enhancement help wanted changes requested labels Mar 19, 2019

gotexis mentioned this pull request Apr 12, 2019

DjangoFilterConnectionField works nice with prefetching? cinemanio/backend#14

Open

phalt closed this May 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding select_related and prefetch_related optimizations #220

Adding select_related and prefetch_related optimizations #220

spockNinja commented Jul 20, 2017

coveralls commented Jul 20, 2017 •

edited

Loading

coveralls commented Jul 20, 2017 •

edited

Loading

coveralls commented Jul 20, 2017 •

edited

Loading

coveralls commented Jul 20, 2017 •

edited

Loading

spockNinja commented Jul 20, 2017

mjtamlyn commented Jul 20, 2017

spockNinja commented Jul 20, 2017

mjtamlyn commented Jul 20, 2017

spockNinja commented Jul 20, 2017

mjtamlyn commented Jul 21, 2017

MichaelrMentele commented Sep 14, 2017

spockNinja commented Sep 15, 2017

chdsbd commented Feb 24, 2018

ramusus commented Feb 25, 2018

nyejon commented Mar 16, 2018

27medkamal commented May 16, 2018

vinayan3 commented May 18, 2018

KonstantinSchubert commented Jun 4, 2018 •

edited

Loading

danpalmer Jul 3, 2018

danpalmer Jul 3, 2018

danpalmer Jul 3, 2018

danpalmer Jul 3, 2018

firaskafri commented Mar 19, 2019

phalt commented May 3, 2019

cansin commented Feb 22, 2021

Adding select_related and prefetch_related optimizations #220

Adding select_related and prefetch_related optimizations #220

Conversation

spockNinja commented Jul 20, 2017

coveralls commented Jul 20, 2017 • edited Loading

coveralls commented Jul 20, 2017 • edited Loading

coveralls commented Jul 20, 2017 • edited Loading

coveralls commented Jul 20, 2017 • edited Loading

spockNinja commented Jul 20, 2017

mjtamlyn commented Jul 20, 2017

spockNinja commented Jul 20, 2017

mjtamlyn commented Jul 20, 2017

spockNinja commented Jul 20, 2017

mjtamlyn commented Jul 21, 2017

MichaelrMentele commented Sep 14, 2017

spockNinja commented Sep 15, 2017

chdsbd commented Feb 24, 2018

ramusus commented Feb 25, 2018

nyejon commented Mar 16, 2018

27medkamal commented May 16, 2018

vinayan3 commented May 18, 2018

KonstantinSchubert commented Jun 4, 2018 • edited Loading

danpalmer Jul 3, 2018

Choose a reason for hiding this comment

danpalmer Jul 3, 2018

Choose a reason for hiding this comment

danpalmer Jul 3, 2018

Choose a reason for hiding this comment

danpalmer Jul 3, 2018

Choose a reason for hiding this comment

firaskafri commented Mar 19, 2019

phalt commented May 3, 2019

cansin commented Feb 22, 2021

coveralls commented Jul 20, 2017 •

edited

Loading

coveralls commented Jul 20, 2017 •

edited

Loading

coveralls commented Jul 20, 2017 •

edited

Loading

coveralls commented Jul 20, 2017 •

edited

Loading

KonstantinSchubert commented Jun 4, 2018 •

edited

Loading