Skip to content

Adding select_related and prefetch_related optimizations #220

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

spockNinja
Copy link
Contributor

Using a combination of Model meta and the graphql AST, we are able to use django's queryset.select_related() and queryset.prefetch_related() to reduce the number of queries run during a graphql query resolution.

These optimizations help a lot with the N+1 problem faced when using an ORM.

I have written the primary entry point (optimizations.optimize_queryset) such that it can be imported and used in any graphql resolution function. (See resolve_article in the tests). I then used it in several core places, including DjangoConnectionField, DjangoListField, and DjangoObjectType.

There will always be cases when the graphql representation does not match the model meta, either an alias or perhaps a complicated model @property. In those cases, you can specify manual optimizations in the DjangoObjectType Meta. Those optimizations look like:

class MyObject(DjangoObjectType):
    special_property = graphene.SomeField()

    class Meta:
        model = MyObjectModel
        optimizations = {
            'special_property': {
                'prefetch': ['relations__to__prefetch'],
                'select': ['relation__to__select']
            }
        }

Closes #57

@coveralls
Copy link

coveralls commented Jul 20, 2017

Coverage Status

Coverage increased (+0.4%) to 93.092% when pulling 1230da5 on spockNinja:feature/optimization into 0588f89 on graphql-python:master.

@coveralls
Copy link

coveralls commented Jul 20, 2017

Coverage Status

Coverage increased (+0.4%) to 93.1% when pulling ef22aff on spockNinja:feature/optimization into 0588f89 on graphql-python:master.

@coveralls
Copy link

coveralls commented Jul 20, 2017

Coverage Status

Coverage increased (+0.4%) to 93.122% when pulling 0defa3a on spockNinja:feature/optimization into 0588f89 on graphql-python:master.

@spockNinja spockNinja force-pushed the feature/optimization branch from 0defa3a to 629abd0 Compare July 20, 2017 04:30
@coveralls
Copy link

coveralls commented Jul 20, 2017

Coverage Status

Coverage increased (+0.4%) to 93.122% when pulling 629abd0 on spockNinja:feature/optimization into 0588f89 on graphql-python:master.

@spockNinja
Copy link
Contributor Author

@syrusakbary

I added a few high-level tests to assert query counts. All other tests are passing.

If you like this improvement in general, I will whip up some docs. Also let me know if there is anything else that you would like tests written for.

@mjtamlyn
Copy link

Hi! This looks like a really good start, I personally would be keen to see something like this merge. A couple of notes:

Personally, I tend to prefer the prefetch in most cases. It should certainly be possible to override this behaviour on certain FK links, as the joins can become quite expensive. My version of this problem ignored select related completely.

I've found it necessary to be able to apply more complex optimisations than just a simple prefetch_related, by using a Prefetch object. It's also quite common for custom fields on a child object to depend on an attribute of the parent. I think you've touched on that with the Meta.optimizations, but I think that probable needs a more flexible API. I came up with an API of a classmethod called optimize_<field_name>(cls, queryset, context). This also allows you to apply optimizations only for certain user types for example - it doesn't matter whether the request asks for a "liked this post" field if the user is anonymous we know the answer is false and don't need to optimise it. This could actually be a more general API that isn't so dependent on the ORM or Django.

My project is starting to grow some microservices, and I recently had cause to add a post_process_<field> hook alongside the preprocessing optimize_<field> hook, so that prefetch_related-style optimizations can be applied over the microservice boundary. Run a bulk request to the service after the queryset has loaded, and annotate the queryset objects with the relevant information. This is probably out of scope for this particular project, but I though it's interesting to mention.

I'm concerned about your field extraction - I have a suspicion that it won't handle fragments correctly. For users of Relay, almost every request will include an extensive number of fragments, so extracting the actual requested fields from these is important.

I know the documentation now recommends a data-loader clone style approach using promises and lazy resolution. To me, this isn't a very "Django-flavoured" approach to the N+1 problem, and I've not had the time to experiment with it to see if it works. I would be quite interested in Syrus' thoughts.

@spockNinja
Copy link
Contributor Author

@mjtamlyn

Thanks for the input!

I will definitely take a look at how fragments show up in the AST. That is certainly something I did not account for.

It may not be immediately apparent at a first glance, but the optimization does follow child dependencies and select/prefetch as much as can be inferred by the Model metas.

So a query like:

query {
  topObject {
    firstRelations {
      edges {
        node {
          id
          nestedOneToOne {
            id
            attribute
          }
        }
      }
    }
    firstOneToOne {
      id
      otherAttr
      anotherOneToOne {
        id
      }
    }
  }
}

will result in prefetch_related('first_relations__nested_one_to_one') and select_related('first_one_to_one__nested_one_to_one')

I will admit to not knowing much about the underlying Prefetch object. What kind of things have you done with that, that can't be done with a prefetch_related? Along the same lines, what kind of optimizations have you had to make that are outside the prefetch/select related api?

What is your reason for avoiding select_related? It results in less queries than it's prefetch_related counterpart.

I played with the data-loader a little. It could potentially be useful for some things, I'm sure. My goal with this PR is make simple inferable optimizations based on matches between the graphql ast and Model meta information.

@mjtamlyn
Copy link

select_related results in a join, which in certain circumstances can be much more expensive if the tables are very large and the dataset is small. Especially when you start chaining 4-5 tables of joins with millions of rows in each table, in order to select half a dozen items from each, the multiple query approach is much faster and for very little performance loss. YMMV.

Prefetch objects are a fully documented feature of Django. Fundamentally, they allow you to customise the prefetched queryset, so for example you can prefetch only "active" objects rather than all of them. Even more crazy things are possible, but might not be a good idea]

My main concern is that overactive inference here, without the ability to customise it further, could clobber existing performance optimisations being applied by users.

@spockNinja
Copy link
Contributor Author

Awesome. Those area all fantastic points. I combed over your gist and I'm pretty sure I understand how it's working now. It took me a moment to figure out how you were handling nested prefetches, but I see now that apply_optimizers is recursive and the Prefetch objects are given an optimized queryset themselves.

Would it be okay if I incorporate parts of that gist into this PR to achieve the desired api?

I'm currently aware of the following changes that need to be made.

  • Replace the Meta.optimizations with def prefetch_* and def optimize_* hooks.
  • Utilize the Prefetch api for more flexibility.
  • Only use prefetch. Let optimize_* handle the case when a select_related is desired.
  • Inspect fragments (FragmentSpread and InlineFragment) when finding fields in the ast.

@mjtamlyn
Copy link

Go for it, that code is absolutely fine to use.

(Do note I'm not a contributor to this repo, so we'll have to see whether @syrusakbary likes the ideas)

@syrusakbary syrusakbary force-pushed the master branch 2 times, most recently from bdc7189 to f93251b Compare September 1, 2017 08:12
@MichaelrMentele
Copy link

This is awesome! We have a home grown graphene-like piece of code that I was looking to replace with Graphene but I was concerned about N+1 queries. Would this be compatible with 1.7?

@spockNinja
Copy link
Contributor Author

@MichaelrMentele I'm guessing you mean Django 1.7? I'm not 100% positive, as the Travis build for graphene-django now only runs for 1.8+

@chdsbd
Copy link

chdsbd commented Feb 24, 2018

What's the status on this? Fixing the N+1 query problem would be a huge improvement.

@syrusakbary

@ramusus
Copy link

ramusus commented Feb 25, 2018

@spockNinja, @mjtamlyn guys, take a look at this approach.
I prefer to apply select_related and prefetch_related automatically using model introspection https://github.com/cinemanio/backend/blob/master/cinemanio/api/utils.py#L45-L58

And here is an example of schema: https://github.com/cinemanio/backend/blob/master/cinemanio/api/schema/movie.py#L33-L39.

@nyejon
Copy link

nyejon commented Mar 16, 2018

Is there an example somewhere of how to implement these optimisations manually?

@27medkamal
Copy link

@ramusus Love your implementation. Would be great if it also allowed for prefetch filters using Django's Prefetch object.

@vinayan3
Copy link

I have some hacks that I've done to my schema to have preloading via using a DataLoader. It's not ideal but would love to find out if this or another issue will help with optimizations?

@KonstantinSchubert
Copy link

KonstantinSchubert commented Jun 4, 2018

@spockNinja Are you still working on this PR or would you prefer if somebody else took over?
@syrusakbary Could you give an indication if there is any interest in merging this PR so we know if it's worth putting time into it?

model_fields = model_fields_as_dict(model)
selections = find_model_selections(graphql_ast)

graphene_obj_type = REGISTRY.get_type_for_model(model)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the behaviour of this when there are multiple types mapped to the same model? As far as I remember that case isn't prohibited.


REGISTRY = get_global_registry()
SELECT = 'select'
PREFETCH = 'prefetch'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could make this an Enum instead.

@@ -75,6 +76,7 @@ def connection_resolver(cls, resolver, connection, default_manager, max_limit,
data=filter_kwargs,
queryset=default_manager.get_queryset()
).qs
qs = optimize_queryset(qs, info)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that currently these optimisations will be lost when line 66 in this file runs. That intersects the filtered queryset here with the resolved one, and loses the optimisations in the process.

We've "solved" this by instead resolving in this method, then passing the resolved queryset to the filterset (defaulting to the default_manager.get_queryset() if the resolver doesn't give us a value), and then dropping the merge behaviour above. This may or may not work in the general case, I really don't know, but it's working for us (we've implemented a similar function to your optimize_queryset).

if iterable is None:
iterable = default_manager
iterable = maybe_queryset(iterable)
if isinstance(iterable, QuerySet):
if iterable is not default_manager:
default_queryset = maybe_queryset(default_manager)
iterable = cls.merge_querysets(default_queryset, iterable)
iterable = optimize_queryset(iterable, info)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, this optimisation does work because it's come after the call to merge_querysets!

@firaskafri
Copy link
Collaborator

@spockNinja Still interested in this pull request?

@phalt
Copy link
Contributor

phalt commented May 3, 2019

I'd like to tidy up some of the PRs, so if we still want to work on this let me know or I will close this in 1 week.

@phalt phalt closed this May 8, 2019
@cansin
Copy link

cansin commented Feb 22, 2021

@spockNinja @phalt I'd like to raise this PR from the dead. Did you end up introducing a similar mechanism eventually? We are looking to find the most elegant way to reduce the # of SQL queries our Graphene-Django instance is sending to the DB atm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

select_related and prefetch_related