Discussion: General Mechanism to Selectively Override Automatic Field Creations #209

jnak · 2019-04-25T01:48:06Z

Overview

In order to convert SQLAlchemy object models (SAM) into a full-blown GraphQL schema, SQLAlchemyObjectType (SAOT) does a lot of automatic mapping and conversions:

it automatically names Graphene fields after their corresponding SAM column names
it automatically converts SAM column types to common Graphene types
it automatically converts SAM relationships to Relay-style paginated fields
it automatically uses SAM descriptions to populate Graphene field's descriptions
it automatically converts SQLAlchemy enums to Graphene enums (see Discussion: Auto-creation of Graphene Enums #208)

While it is convenient that SAOT does all this out-of-the-box, developers should ultimately be in control of the Graphene types generated. Currently we can override SAOT default's behavior in 2 ways.

1. Use Meta

The Meta of SAOT allows the overriding of a few select automatic behaviors. For example, we can restrict what columns are exposed via the only_fields and exclude_fields parameters. In this example, Meta works fine because those parameters essentially act as toggles that completely disable or enable SOAM behavior for each field.

Similarly, we can currently override SAOT default connection field factory via connection_field_factory. But this is not flexible enough because we may want to have SAOT use different factories for different fields. For example, it may make sense to have only one paginated field be sortable (hence use SQLAlchemyConnectionField) and all the other fields be regular Relay-style collections (hence use UnsortedSQLAlchemyConnectionField).

One simple workaround would be to have a have a connection_field_factories parameter that takes a dictionary of field names to factories:

class MyType(SQLAlchemyObjectType):
    class Meta:
         model = MyModel
         default_connection_field_factory = MyConnectionField
         connection_field_factories = {
             'my_connection_field': SortedConnectionField,
         }

While that works fine, it leads to the creation of many parameters in the Meta class (eg #178 and #208 (comment)) and we may end up with a Meta class that becomes hard to maintain.

My main concern though with this pattern is that scatters the logic for a field in separate Meta parameters. For example, we would need to do something like to rename a field, specify a connection factory and deprecate it:

class MyType(SQLAlchemyObjectType):
    class Meta:
         model = MyModel
         connection_field_factories = {
             'my_connection_field': SortedConnectionField,
             ...
         }
         descriptions = {
             'my_connection_field': "Some description",
             ...
         }
         rename = {
             'connection_field': 'my_connection_field',
            ...
         }
         deprecated = (
            'my_connection_field,
           ...
         )

We need to look at multiple places just to understand how we are overriding the SAOT default for a given field. I much prefer Graphene's pattern of grouping all the logic for a field into a Field definition on the ObjectType class.

2. Redefine the Graphene field

Since SAOT is a subclass of graphene ObjectType, we can always redefine the field auto-generated by SAOT. For example, we can change the type of a field like this:

class MyType(SQLAlchemyObjectType):
    class Meta:
         model = MyModel

   id = graphene.String()

Here it works pretty well because SAOT does not do much to the id column. The only downside is that we lose the description automatically generated by SAOT based on the SQLAlchemy column.

But the more logic SAOT does when converting a column to a field, the more we lose when we redefine a field from scratch. That's why I expect this pattern to become more of a problem as we improve SAOT. For example, one performance improvement that we probably want to do eventually is to only select the SAM columns for fields that have been queried. To do this, SAOT needs to build a mapping of columns to fields when it auto-generates fields. Regular Graphene fields (not generated by SAOT) will need to be manually annotated with the SAM fields they depend on. That's one more thing that we would lose" by redefining manually the Graphene field.

Proposed Solution

We could have a sql_field function that would look a lot like a graphene Field. The only difference is that instead of instantiating a field from scratch, it would use a combination of SOAT default and the user defined overrides:

class MyType(SQLAlchemyObjectType):
    class Meta:
        model = MyModel
  
  # Deprecate a field
  # The type and description are automatically set by SAOT
  id = sql_field(deprecated=True)  
  
  # Duplicate and rename a field
  id_v2 = sql_field(name='id') 

  # Override the default description
  column_1 = sql_field(description='This is the main column')

  # Override the default type / serializer 
  uuid = sql_field(type=graphene.String)

  # Override the default enum name
  kind = sql_field(enum_name='my_enum')

  # Override the default connection factory
  connection = sql_field(connection_factory=SortableConnectionField)

The main benefits of that approach are:

It is very clear and consistent with Graphene API
We only specify what we want to override
All the logic for a given field is grouped in a sql_field call
We can still use Meta to set default values such as connection_field_factory

One interesting side effect is that fields are automatically whitelisted:

class MyType(SQLAlchemyObjectType):
    class Meta:
        model = MyModel
        only_fields = ()   # Blacklist all fields by default
  
  # Both columns are whitelisted
  id = sql_field()  
  my_column = sql_field(description='Some description')

From an API design perspective I can't think of any downside. Maybe a little more typing in some cases but I think we should optimize for clarity over the number of keystrokes.

What do you think of this pattern? Do you see any downsides? Do you have any ideas on how to make it better? I would love to hear your thoughts. I will send a PR if we agree this the right pattern to follow.

The text was updated successfully, but these errors were encountered:

Cito · 2019-04-25T11:23:48Z

@jnak - thanks for starting that discussion. I like your proposed solution. Some thoughts from my side:

The last example is a bit unclear - are you suggesting that fields are automatically whitelisted or that they can be by setting only_fields = ()? I think we should not require whitelisting by default.

When you say field = sql_field(name='foo') I guess this refers to the foo Column in the SAM, not to the foo column in the database table - the SAM could use a different column name than the database. We may also want to refer to properties on the SAM which are not backed by a column in the database. So I suggest we use the name orm_field instead of sql_field.

I also suggest to provide something simpler than connection = sql_field(connection_factory=SortableConnectionField). Maybe just connection = sql_field(sortable=True). The user doesn't want to be bothered with the technical implementation using factories and their names. That also smells too much like Java.

Btw, you always write SOAT instead of SAOT :) Seems to be a bad choice of acronym - had to fix the same typo in my own last comment, too.

Nabellaleen · 2019-04-25T12:50:17Z

I love the general idea, @jnak !

And I've the same suggestion than @Cito about the sql_field which seems to not be a good naming. orm_field seems clear.

jnak · 2019-04-26T03:02:36Z

Thanks guys for the feedback.

Good point regarding sql_field. In my mind, it was standing for SQLAlchemy field but I agree it's confusing. Let's use orm_field instead.

Regarding whitelisting what I meant is that orm_field defines a field the same way Graphene Field does, and because of this, orm_field should take precedence over only_fields or exclude_fields. While there would no point in excluding a field via exclude_fields and then redefining it below via orm_field, the precedence of orm_field over only_fields is actually going to be useful. For example, at my company we don't want to publish SAM columns automatically for various reasons. We currently enforce that by using a shared SAOT subclass:

# in a shared module

class BaseSQLAlchemyObjectType(graphene_sqlalchemy.SQLAlchemyObjectType):
    class Meta(object):
        abstract = True

    @classmethod
    def __init_subclass_with_meta__(cls, **kwargs): 
        super(OscarSQLAlchemyType, cls).__init_subclass_with_meta__(
            only_fields=(),
            **kwargs
        )  

...
# in a different module 

class MyType(BaseSQLAlchemyObjectType):
    class Meta:
         model = MyModel
         only_fields = (
             'a_public_column',
             ...  
         )

It does the job but it felt a little awkward to have sometimes long list of column names in only_fields. Now we are going to be able to do this instead:

class MyType(BaseSQLAlchemyObjectType)
    class Meta:
         model = MyModel

    a_public_column = orm_field()
    ...

That seems much more natural in my opinion. I also like that it decreases the "cost" of adding a description to a column / field - it's just a matter of adding a description param to the corresponding orm_field line.

Regarding connection fields, I realize that my example was poorly chosen because connection_field_factory expects a connection field factory (see paragraph below) and the sorted SQLAlchemyConnectionField cannot be auto-generated by SAOT at the moment (see #163). A better example would have been:

# special_connection_factory is a factory that instantiates a user defined connection class
my_connection = orm_field(connection_factory=special_connection_factory)

This is because there are only 2 types of connection fields: the default UnsortedSQLAlchemyConnectionField and a custom user-defined class. Given that we want to give the flexibility to override the connection class with any custom class, we need to have a parameter that takes a connection field factory (vs a boolean).

I agree though that connection_factory sounds a bit like Java but I can't think of a better name. The function that we are passing in is not a connection field class. It is a function that takes the registry and the relationship and that instantiates a connection field (see #187). We can potentially shorten it to connection but I think that is slightly misleading. Another option would be to re-use the type parameter and have it expect a connection factory when the orm field is a SAM relationship. But I think it's confusing.

Anyway I'm not too concerned about this ugly name because it's quite an advanced feature. It's only for users that have defined their own connection class and need to override the SAOT connection behavior selectively because I'm not going to remove the connection_field_factory in Meta. If those advanced users think it's too verbose or ugly, they can easily make their own orm_field wrappers like this:

def orm_field(special_connection=None, **kwargs):
    if special_connection:
        kwargs['connection_factory'] = special_connection_factory
    return graphene_sqlalchemy.orm_field(**kwargs)

Btw I'm going to be offline for a week. I'll address your feedback if there is any when I get back.
Cheers

Cito · 2019-04-26T09:54:51Z

Thanks for the additional info. The orm_fields should take precedence over only_fields, and your usage example makes sense. My point was that you would still need a custom BaseSQLAlchemyObjectType that sets only_fields=() to achieve whitelisting, that would not (and should not) be the default.

Regarding the connection_factory I agree that it should be called that way if you use it to pass a custom factory. As you say, that would be done by advanced users only anyway. But maybe sorted=False/True could be added as a shorthand for the builtin factories.

jnak · 2019-05-16T13:57:18Z

Thanks for your feedback @Cito and @Nabellaleen.

I've sent the first PR. I'm going to break up the work in multiple PRs because this one is already quite large. I'll do the enum overrides and the connection overrides in separate PRs as soon as this is one is approved.

Cheers,
J

…#214) This implements the overriding mechanism discussed in #209 . Changes include: - Add ORMField class - The main overridable parameters are: type, description, deprecation_reason and required - We can name fields differently using prop_name. This was preferred over name to avoid confusion / collision with graphene.Field parameters. - Add tests for all types of SQLAlchemy properties: columns, relationships, column properties, hybrid properties and and composite properties. - Cleanups and re-organize some tests.

jnak changed the title ~~Discussion: General Mechanism to Selectively Override Default Behavior~~ Discussion: General Mechanism to Selectively Override Automatic Field Creations Apr 25, 2019

jnak added the discussion label Apr 25, 2019

jnak mentioned this issue May 3, 2019

Improve creation of column and sort enums #210

Merged

This was referenced May 14, 2019

Implement Mechanism to Selectively Override Automatic Field Creations #214

Merged

Adding SQLAlchemyInputObjectType #213

Open

c8y3 mentioned this issue Mar 22, 2024

Issue415 graph ql first steps dfir-iris/iris-web#444

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion: General Mechanism to Selectively Override Automatic Field Creations #209

Discussion: General Mechanism to Selectively Override Automatic Field Creations #209

jnak commented Apr 25, 2019 •

edited

Loading

Cito commented Apr 25, 2019 •

edited

Loading

Nabellaleen commented Apr 25, 2019

jnak commented Apr 26, 2019

Cito commented Apr 26, 2019

jnak commented May 16, 2019

Discussion: General Mechanism to Selectively Override Automatic Field Creations #209

Discussion: General Mechanism to Selectively Override Automatic Field Creations #209

Comments

jnak commented Apr 25, 2019 • edited Loading

Overview

Proposed Solution

Cito commented Apr 25, 2019 • edited Loading

Nabellaleen commented Apr 25, 2019

jnak commented Apr 26, 2019

Cito commented Apr 26, 2019

jnak commented May 16, 2019

jnak commented Apr 25, 2019 •

edited

Loading

Cito commented Apr 25, 2019 •

edited

Loading