Fix N+1 problem for one-to-one and many-to-one relationships #253

jnak · 2019-10-29T23:53:26Z

Hi everyone,

This PR fixes the N+1 problem for one-to-one and many-to-one relationships. If we think this approach makes sense, I'll send a separate PR to address one-to-many and many-to-many relationships.

Design goals

Be completely seamless to end-users. You should not need to modify your SQLAlchemy models or your SQLAlchemyObjectType models to benefit from this.
Work with any SQLAlchemy relationships, including ones that have complex primaryjoin and secondaryjoin conditions.
Avoid generating one large SQL query using nested JOIN in the resolver of the root field of the query. For non-trivial queries, the joined SQL statement is complex with hard-to-predict performances. See SQLAlchemy's comparison of selectin with joined loading for more context.
Fully work in schemas that include non-SQLAlchemyObjectType Graphene types. This batching optimization should compose well with other types, including those that use Dataloader under the hood.
Avoid building SQL queries manually. It's non-trivial to generate correct SQL statements that support complex joins and composite primary keys across a variety of DBs. We should lean as much as possible on SQLAlchemy to do the heavy lifting.

Non-goals

Only select the columns / fields that are being queried. This will be addressed in a separate PR.
Caching records so we don't fetch the same record twice from the DB. This is tricky to get right. We can revisit this later if there is a need.

Solution

The idea here is to have SQLAlchemyObjectType automatically generate batch resolvers for relationships. The resolvers use Dataloader to collect all the similar relationship property accesses. From there, we ask SQLAlchemy to load all the relationships of those parents as one SQL query.

SQLAlchemy does not have a public API to do this but we can piggyback on some internal APIs of the selectin eager loading strategy. It's a bit hacky but I think it's better than re-implementing and maintaining a big chunk of the selectin loader logic ourselves. You can find more details about this approach in the docstring of the batch loader. Hopefully we can get some feedback from the SQLAlchemy maintainers (cc @zzzeek).

If we decide to move forward with this PR, I'll make sure to add more test cases so we can catch regressions in the event the internal SQLALchemy APIs we rely on change.

Cheers,
J

coveralls · 2019-10-29T23:56:18Z

Coverage increased (+0.1%) to 97.464% when pulling d7d90f0 on jnak:batch-sql into 89c3726 on graphql-python:master.

zzzeek

The function here does not seem to be participating within the query loading process because there's no context being passed in, therefore it would be easier to maintain if you were to call straight into SelectInLoader._load_for_path() directly, sending in the most minimal state that this single function needs in order to proceed, rather than trying to reconstruct a harness around it using lots of other private APIs that can change.

Based on that, if you were to instead go to SQLAlchemy and propose a distilled utility function that calls into SelectInLoad._load_for_path given simple external arguments like Session, list of instances, and a target relationship, that can include test coverage on the SQLAlchemy side such that this single function continues to work in the same way publicly going forward, and you would no longer rely on private arrangements within SQLAlchemy and I wouldn't have to worry about rearranging internals.

zzzeek · 2019-10-30T02:37:18Z

graphene_sqlalchemy/types.py

+    class NonListRelationshipLoader(dataloader.DataLoader):
+        cache = False
+
+        def batch_load_fn(self, parents):  # pylint: disable=method-hidden


IIUC the purpose of this code is to get to the SelectInLoader._load_for_path method alone; the rest of the private API access here is to try to call into that function in the way that it is usually called, but it appears that this ends up loading in lots of private SQLAlchemy geometries that aren't guaranteed to stay the same. it is much eaiser for me to provide a public API version of a single call like SelectnLoader._load_for_path (with a fixed single-token path as is the case here) than it is for me to worry that the overall structure of objects like PostLoad / post_load_paths / PostLoad.loaders etc., which are actually all fairly recent additions to the internals, are going to impact external projects if I change them around. so below i will try to remark on private APIs that don't seem to be needed based on the inputs to this function.

zzzeek · 2019-10-30T02:38:37Z

graphene_sqlalchemy/types.py

+                assert parent not in session.dirty
+
+            load = Load(parent_mapper.entity).selectinload(model_attr)
+            query = session.query(parent_mapper.entity).options(load)


so we are starting with a brand new Query here and it has no state or joins at all. That is, there's no paths or options coming in. this means a lot of what happens below is much simpler.

zzzeek · 2019-10-30T02:43:01Z

graphene_sqlalchemy/types.py

+
+            # Taken from orm.query.Query.__iter__
+            # https://git.io/JeuBi
+            context = query._compile_context()


here we are making a QueryContext because it eventually gets passed into _load_for_path, where it is used to determine from the original query what the other MapperOptions on that query are, which above we know they are none, if the Query has _populated_existing on it, which we know it does not, what the current Session is, which we have right here, and that seems to be about it. So we don't actually need a fully compiled context, we just need an object with basically the Session on it. You could make a QueryContext directly, or just a short object that has the attributes that the loader needs. that is still making some private API assumptions, but it would be local to just this one loader method and if i provide a public function it wouldn't need these things.

zzzeek · 2019-10-30T02:44:02Z

graphene_sqlalchemy/types.py

+
+            # Taken from orm.loading.instances
+            # https://git.io/JeuBR
+            context.post_load_paths = {}


used by PostLoader but if you weren't using PostLoader you wouldn't need to know about this.

zzzeek · 2019-10-30T02:45:08Z

graphene_sqlalchemy/types.py

+
+            # Taken from orm.strategies.SelectInLoader.__init__
+            # https://git.io/JeuBd
+            selectin_strategy = getattr(parent_mapper.entity, model_attr).property._get_strategy(load.strategy)


you can more directly just instantiate a SelectInLoader object, which means some assumptions about the constructor, but fewer than are needed than calling _get_strategy(load.strategy).

zzzeek · 2019-10-30T02:47:19Z

graphene_sqlalchemy/types.py

+
+            # Taken from orm.loading._instance_processor._instance
+            # https://git.io/JeuBq
+            post_load = PostLoad()


PostLoad is likely where I'm most uncomfortable with the private API stuff. this is very esoteric internals that look as strange as they do because they are trying to do as much work up front as is possible, waiting for thousands of rows to come in where we want to minimize the call overhead, so they are hard to follow and they change all the time. Going straight to _load_for_path, if possible, likely less maintenance, even though that function could change too.

zzzeek · 2019-10-30T02:48:16Z

graphene_sqlalchemy/types.py

+            # https://git.io/Jeu4j
+            context.partials = {}
+            for parent in parents:
+                post_load.add_state(parent._sa_instance_state, True)


_sa_instance_state is accessed publicly by using sqlalchemy.inspect(parent).

zzzeek · 2019-10-30T02:50:14Z

graphene_sqlalchemy/types.py

+            # Taken from orm.loading._instance_processor._instance
+            # https://git.io/JeuBn
+            # https://git.io/Jeu4j
+            context.partials = {}


more scary internal stuff that I don't look at unless I have to, I'm not sure if one of these functions hit upon this, I'm not seeing the codepath that does, but this is a longer codepath than I think you need here.

zzzeek · 2019-10-30T02:52:22Z

graphene_sqlalchemy/types.py

+
+            # Taken from orm.strategies.SelectInLoader.create_row_processor
+            # https://git.io/Jeu4F
+            selectin_path = context.query._current_path + parent_mapper._path_registry


context.query._current_path is always going to be the "root" path since you created the Query above with no "path", this is PathRegistry.root, but then you are adding parent_mapper._path_registry, but this whole expression can be shortened to parent_mapper._path_registry.

ghost

I don't think I can add anything beyond @zzzeek's comments.

ghost · 2019-10-30T14:28:41Z

graphene_sqlalchemy/types.py

+        # TODO Batch many-to-many and one-to-many relationships
+        return _get_attr_resolver(obj_type, model_attr, model_attr)
+
+    class NonListRelationshipLoader(dataloader.DataLoader):


Why create a new class for each call to _get_relationship_resolver()?

Because each relationship is different. The selectin loader expects a specific relationship (vs a specific child and / or parent).

Couldn't you pass the relevant local variables into the constructor rather than use them as closure variables?

graphene_sqlalchemy/types.py

zzzeek

looks great. you still have some exposure if I change that method for the moment, but I think overall this is more easy to adapt towards if that were to happen. public API function in SQLAlchemy should be pursued.

zzzeek · 2019-10-30T14:54:47Z

graphene_sqlalchemy/types.py

+            # For our purposes, the query_context will only used to get the session
+            query_context = QueryContext(session.query(parent_mapper.entity))
+
+            loader._load_for_path(


great, so you still have some private API here but it's way more contained and we can always make a public function out of this, feel free to propose to SQLAlchemy.

jnak

@zzzeek Thank you so much for your feedback. That's super helpful. If we decide to go forward with this overall approach, I'll definitely pursue making a public utility in SQLAlchemy repo. Thanks!

jnak · 2019-10-30T15:05:24Z

graphene_sqlalchemy/types.py

+        # TODO Batch many-to-many and one-to-many relationships
+        return _get_attr_resolver(obj_type, model_attr, model_attr)
+
+    class NonListRelationshipLoader(dataloader.DataLoader):


Because each relationship is different. The selectin loader expects a specific relationship (vs a specific child and / or parent).

Nabellaleen

I find you PR very interesting and the help of @zzzeek is really priceful !

--

Just the build to fix :)

jnak · 2019-11-12T18:46:11Z

@Nabellaleen Great that you're on board with the approach. I fixed the tests by disabling batching for SQLAlchemy versions < 1.2 since selectin was introduced in 1.2. Can you re-approve the diff so I can merge it?

I'm going to send a separate PR soon for batching one-to-many and many-to-many relationships.

cglacet · 2020-01-16T18:36:19Z

Will this be in the next release?

jnak · 2020-01-22T21:39:43Z

@cglacet Most likely yes. But we're currently blocked by #254 and few other dependent changes. Would you mind taking a look at it? That would be really helpful to push this over the finish line

jnak mentioned this pull request Oct 30, 2019

Limiting SQL query to defined fields/columns #134

Closed

zzzeek reviewed Oct 30, 2019

View reviewed changes

ghost reviewed Oct 30, 2019

View reviewed changes

zzzeek previously approved these changes Oct 30, 2019

View reviewed changes

jnak dismissed zzzeek’s stale review via ac316ff October 30, 2019 14:59

jnak commented Oct 30, 2019

View reviewed changes

jnak mentioned this pull request Oct 30, 2019

N + 1 round trip problem #35

Open

Nabellaleen previously approved these changes Nov 12, 2019

View reviewed changes

jnak dismissed Nabellaleen’s stale review via 7e50367 November 12, 2019 18:35

jnak added 6 commits November 13, 2019 12:01

Fix N+1 problem for one-to-one and many-to-one relationships.

bd5a68a

address zzzeek comments

cc0de2a

simplify path

60b6df9

update comment

c8f39bd

bump sqlalchemy

0d0067a

disable batching for sqlalchemy < 1.2

d7d90f0

jnak force-pushed the batch-sql branch from 56caa38 to d7d90f0 Compare November 13, 2019 17:02

Nabellaleen approved these changes Nov 14, 2019

View reviewed changes

jnak merged commit 98e6fe7 into graphql-python:master Nov 18, 2019

jnak mentioned this pull request Nov 19, 2019

Fix N+1 problem for one-to-many and many-to-many relationships #254

Merged

Fix N+1 problem for one-to-one and many-to-one relationships #253

Fix N+1 problem for one-to-one and many-to-one relationships #253

Uh oh!

Conversation

jnak commented Oct 29, 2019

Uh oh!

coveralls commented Oct 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zzzeek left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ghost left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zzzeek left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnak left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Nabellaleen left a comment

Choose a reason for hiding this comment

Uh oh!

jnak commented Nov 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cglacet commented Jan 16, 2020

Uh oh!

jnak commented Jan 22, 2020

Uh oh!

Uh oh!

coveralls commented Oct 29, 2019 •

edited

Loading

jnak commented Nov 12, 2019 •

edited

Loading