Skip to content

Discussion: Auto-creation of Graphene Enums #208

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Cito opened this issue Apr 18, 2019 · 16 comments
Closed

Discussion: Auto-creation of Graphene Enums #208

Cito opened this issue Apr 18, 2019 · 16 comments

Comments

@Cito
Copy link
Member

Cito commented Apr 18, 2019

In PR #98 I made an attempt to improve the auto generation of Graphene Enums, but it turned out that this feature still needs some discussion. Let's clarify here what we really want to achieve first.

In the following:

  • g-enum = Graphene Enum types
  • sa-enum = SQLAlchemy Enum type
  • py-enum = Python Enum type
  • sql-enum = Database enum data type

The sql-enums are supported as actual data types by some databases, otherwise implemented using name constraints. Note that both g-enums and sa-enums can be based either on py-enums or on simple lists of values.

For the examples below, I use the following imports:

from enum import Enum as PyEnum
import sqlalchemy as sa
from sqlalchemy.ext.declarative import declarative_base

Base = sa.ext.declarative.declarative_base()

When g-enums are created from sa-enums by graphene_sqlalchemy, they must be given a GraphQL type name. We must decide how these name are generated.

1) sa-enums based on py-enums

Example 1:

class PetKind(PyEnum):
    cat = 1
    dog = 2

class Pet(Base):
    __tablename__ = "pets"
    name = sa.Column(sa.String(30), primary_key=True)
    kind = sa.Column(sa.Enum(PetKind))

I think it's clear that in this case, the g-enum should be derived from PetKind and thus get the same name PetKind. Since GraphQL and Python have the same conventions for type/class names, this case is all good and fine.

1a) sa-enums based on py-enums, with SQL name

Example 1a:

class Pet(Base):
    __tablename__ = "pets"
    name = sa.Column(sa.String(30), primary_key=True)
    kind = sa.Column(sa.Enum(PetKind, name='kind_of_pet'))

The sa-enums can be declared with a name argument which is used for name constraints or as sql-enum type name on the database. In this case, I think the name of the py-enum should take precedence for us; the generated g-enum should still be derived from PetKind, and thus have the name PetKind, not KindOfPet.

2) sa-enums based on values, with SQL name

Example 2:

class Pet(Base):
    __tablename__ = "pets"
    name = sa.Column(sa.String(30), primary_key=True)
    kind = sa.Column(sa.Enum('cat', 'dog', name='kind_of_pet'))

Note: The name here is passed as argument to Enum, not Column. I.e. the sql-enum name is kind_of_pet, while the column name is kind. This case is interesting.

First, no matter which name we choose, it will not follow GraphQL type name conventions according to which type names are written in PascalCase (though I don't find it officially required or recommended, I think it's a strong convention that also resonates with the Python class name convention). So I believe an automatic name conversion should happen; the GraphQL name should be Kind or KindOfPet rather than kind or kind_of_pet.

Second, I think in this case the sql-enum name should take precedence, since it describes the enum itself and should be characteristic for the enum, while the column name only describes its usage relative to the model. Also, different column names could be used for the same enum. For example, we could have another table:

class Person(Base):
    __tablename__ = "persons"
    name = sa.Column(sa.String(30), primary_key=True)
    favorite_pet_kind = sa.Column(sa.Enum('cat', 'dog', name='kind_of_pet'))

So in this case, I suggest the g-enum should be created like:

graphene.Enum('KindOfPet', [('CAT', 'cat'), ('DOG', 'cat')])

I.e. the g-enum should be named KindOfPet and not Kind or FavoritePetKind. Note that in addition to the type name, the symbol names have also been converted, but this is a separate issue I will discuss below as point 7.

3) sa-enums based on values, without SQL name

Example 3:

class Pet(Base):
    __tablename__ = "pets"
    name = sa.Column(sa.String(30), primary_key=True)
    kind = sa.Column(sa.Enum('cat', 'dog'))

In this case, I think it would make sense to create a g-enum type named Kind after the column, since we have no other clue how the type should be named. However, as in the example above, the same type could be also used in a differently named column:

class Person(Base):
    __tablename__ = "persons"
    name = sa.Column(sa.String(30), primary_key=True)
    favorite_pet_kind = sa.Column(sa.Enum('cat', 'dog'))

We have two options here: First, we create another g-enum type named FavoritePetKind with the same values. Or we re-use the Kind type from above. But in that case it would not be clear which name to use: The first name, Kind, or the last name, FavoritePetKind, or the shortest name or the longest name? We would need to specify an artificial rule for picking the name. So I suggest simply creating two g-enum types with the same values, but different names, for the two columns.

But we still have a problem. What if column names are equal, but values differ, like in the following

Example 3b:

class Pet(Base):
    __tablename__ = "pets"
    name = sa.Column(sa.String(30), primary_key=True)
    kind = sa.Column(sa.Enum('cat', 'dog'))
    eye_color = sa.Column(sa.Enum('amber', 'brown'))

class Person(Base):
    __tablename__ = "persons"
    name = sa.Column(sa.String(30), primary_key=True)
    kind = sa.Column(sa.Enum('thinker', 'doer'))
    favorite_pet_kind = sa.Column(sa.Enum('cat', 'dog'))
    eye_color = sa.Column(sa.Enum('blue', 'brown'))

My suggestion to solve this problem is to add the model name as a prefix and auto generate the g-enum types PetKind, PetEyeColor, PersonKind, PersonFavoritePetKind and PersonEyeColor. We could try to be smart and reuse g-enums with the same values, adding the prefix only when there are value clashes, but I think for consistency and simplicity sake we should always add the model as prefix, not only when there are conflicts.

4) Should we enforce an Enum postfix?

Should we also enforce a postfix of Enum for g-enum names? Note that this is not a GraphQL convention, but it may make sense to avoid name clashes with other types in the GraphQL schema, particularly when we auto-create names. I.e. in the example above, the generated g-enum would be PetEyeColorEnum instead of PetEyeColor. And do we want to use the postfix only for auto generated names or also add one to g-enums derived from py-enums or generated from sql-enum names? Of course, the postfix would not be added when it already exists.

Currently I think we should not add such a prefix, particularly if we auto create types with model name as prefix as suggested above, since then name clashes are much less likely. And when enums already have names, they would normally also not clash with model class names. I can't think of an example where this would be a problem.

5) Retrieving g-enums

It is sometimes necessary to refer to the auto-generated g-enums, like when you are using them in GraphQL input types. In the example above, you may want to provide a query that takes a PetKind as argument and returns all pets of that kind. To do that, you need to refer to the g-enum generated from the pets.kind column.

I think we should provide two methods for this, one using the g-enum name as argument, another one using the column as argument. The following calls would then return the same g-enum:

get_enum_by_name('PetKind')

get_enum_for_column(Pet.pet_kind)

The first method could also be used for retrieving sort enums for models that are generated by graphene_sqlalchemy to be used as argument for sorting query results. There should be also a method for getting the sort g-enum that takes the model as parameter. These should return the same sort g-enum:

get_enum_by_name('PetSortEnum')

get_sort_enum_for_model(Pet)

Should the get_enum_for_column and get_sort_enum_for_model automatically create and register a g-enum when nothing has been registered yet? I think so, that's better than throwing an error. This would allow retrieving a column g-enum for usage in an argument used before the corresponding column, or a sort g-enum without explicitly creating it first using a set_sort_enum_for_model function. Of course the latter should be provided to allow customizing the generated sort enum, but otherwise you would not even need it.

6) Use the registry or utility or enum module?

To always retrieve the same g-enum that has been generated for a column or as a sort enum, the g-enums need to be stored in the graphene_sqlalchemy registry. The methods for retrieving will therefore naturally be methods of the Registry class. Should we also provide functions in the utility module with the same names, that take the registry as an optional parameter, using the global registry as default? Or maybe we should put all the enum support in a separate enum module instead? The functions for creating sort enums could then also live there. I think that makes sense.

7) Values of g-enums

GraphQL has the convention (recommended in the specs) that enum values are named like ENUM_VALUE instead of enum_value or EnumValue. On the database, no such convention exists, and sql-enums often use lowercase names. Python enums also often use uppercase member names, but this is not really a convention, and they are also often lowercase.

So the question is: Should we alter the names of enum values to be uppercase in the g-enum? I think so. Graphene also adapts the names of fields from Python to GraphQL already. Graphene does not rename the enum values, and leaves it up to you to follow the conventions, which I think makes sense. However, in the case of graphene_sqlalchemy we cannot define the names as we like, but must take those from the database. So I think in that case it makes sense to transform the names of g-enum values from the database to Graphene so that they follow GraphQL conventions. The values of the g-enum values should not be transformed. See the example above:

graphene.Enum('KindOfPet', [('CAT', 'cat'), ('DOG', 'cat')])  # names transformed, values not

For those who really don't want to transform enum value names, we could support a registry attribute convert_enum_value_names: bool that would be True by default but could be set to False to deactivate the automatic conversion.

Please let me hear your opinions, particulary if you have different opinions or issues I have forgotten to consider. I will then create a PR with the functionality we agreed upon.

@jnak
Copy link
Collaborator

jnak commented Apr 19, 2019

@Cito I'll give you a full answer tomorrow. For now I wanted to tell you that I really appreciate you starting this design discussion. I think these sort of upfront discussions will significantly increase the overall quality of this library. You can expect me to start similar convos in the next few weeks.

@thejcannon
Copy link

For 3), there's also the option of erroring and telling the user to provide more information.

@jnak
Copy link
Collaborator

jnak commented Apr 23, 2019

1. Makes sense.

1.a I agree it makes sense to use the py-enums here because the name parameter of an sa-enum normally needs to stay in sync with the DB. It's usually much easier to update Python only values such as the name of the py-enum.

2 I agree it makes sense. It's good that we can easily override the name parameter by passing a py-enum (i.e. 1.a) if that's necessary.

3 I'm a little torn here between @Cito's and @thejcannon's suggestions.

On the one hand, I don't like the idea of duplicating enums that conceptually represent the same thing since it mostly defeats the purposes of enums. I worry that people might not realize this is happening until their schemas are published and they have to introduce a backward incompatible change to fix it.

On the other end, raising an error breaks the auto-magical behavior that graphene-sqlalchemy that some people have come to expect. That said, if we provide a clear error message, it would be trivial to fix the error assuming we follow @Cito's suggestion for 2 (i.e. you just add a name parameter). I also like that raising an error makes the implementation and the library behavior simpler.

Overall I think prefer @thejcannon's suggestion but I would love to hear what you both prefer.

4

And when enums already have names, they would normally also not clash with model class names.

Given that SQLAlchemy classes can be defined in multiple modules, there is still a chance that there is a name collision. There could even be different SQLAlchemy models with the same class name. Currently these situation are easily resolved by naming the SQLAlchemyObjectType subclasses with different names.

I'm ok not adding the Enum postfix as long as we follow @Cito's suggestion for 1 and 1.a so it's easy to resolve name collisions should they arise.

5 I'm in favor of having get_enum_for_column and get_sort_enum_for_model transparently register the Enum so we can easily use them as argument definition in fields.

But I don't see the benefits of having get_enum_by_name at all:

  • I can't think of a case where we would need to use get_enum_by_name because it was not possible to use get_enum_for_column or get_enum_by_name.
  • It's more explicit and robust to refer to enums by columns than by their generated names. Especially given the subtleties of our naming logic.
  • From an API design perspective, I think it's clearer to have one way of doing things instead of 2 ways. All the more as get_enum_by_name is less capable - it cannot transparently register the enum.

Am I missing anything?

6 If we are going to create enum functions that uses the the global registry, I think we should be consistent and do the same for other public methods of the registry. In that case, where do we want to put the public functions for register, get_type_for_model, register_composite_converter and get_converter_for_composite? One option would to be to put all of them in the registry module to:

  • convey that these functions implicitly rely on a global registry
  • make it clear that each public method on the registry should have their public equivalent

Given that those public helper functions are completely equivalent to calling the methods on the registry and that the current pattern of using get_global_registry works well, I'd be in favor of punting this discussion for now. It would be easy to revisit all this in a separate issue / PR.

7 I agree that it makes sense to rename the values of the enum automatically and that we should have a simple way to override this behavior.

That said I think the overriding mechanism should:

  • work on a column-by-column basis. The current Meta system works at the SQLALchemy model level.
  • be fully customizable. The same way we can rename a py-enum to change the name of a g-enum, I'd like an easy way to redefine how each value is serialized.

I realize graphene-sqlalchemy does not have yet a pattern to do fine-grained overrides. At the moment, the only way to override its default behavior is to redefine the field and completely back out of the automatic behavior of graphene-sqlalchemy. I've been thinking about a general pattern to selectively override part of graphene-sqlalchemy default behavior on a field-by-field basis. I'm going to open a separate issue about this.

For the purpose of this issue, I think it's fine to punt on the name conversion override for now.

Thanks again for starting this discussion.

@Cito
Copy link
Member Author

Cito commented Apr 23, 2019

@jnak and @thejcannon thanks for your feedback. Yes, raising errors and asking the user for more information is always a good option. However, the question here is: Where can the user provide that information? Is it necessary to touch the SA model to do that? Conceptually, I think we should assume the worst case that the user has no influence on the SA models and that these can only be imported, but not modified. So if an enum in a column doesn't have a name, we need to either deal with that case by inventing a name or provide an option to set a name on the graphene-sqlalchemy side.

Regarding get_enum_by_name in point 4) I agree this could/should be excluded from the public API.

I also agree that we should move the creation of additional convenience functions that would use the global registry by default into a separate issue. Maybe the whole mechanism for providing the global registry should be changed, so that the global registry object always exists as a singleton that would never be None and could simply be imported and used. Then get_global_registry could go away. And if the Registry class gets a reset() method, reset_global_registry would not be needed either.

Regarding the auto-naming and the get_enum_for_column and get_sort_enum_for_model methods, I fear we still haven't thought this through deeply enough.

To explain what I mean let me introduce two more abbreviations: SAM = SQLAlchemy model class, SAOT = SQLAlchemyObjectType.

What I realized is that we actually don't want to get an enum for a SAM column, but for a SAOT field. Similarly, we want a sort enum not for a SAM, but for a SAOT: SAOT names could be different from SAM names and SAOT field names could be different from SAM column names via #178.

The prefix discussed in 3) should then be the name of the SAOT, not the name of the SAM (and would then be automatically unique), and the name of the enum itself should be derived from the SAOT field, not from the SAM column name.

Note that there is a small difficulty here, because at the time of creation of the column enum, graphene-sqlalchemy only knows the SAM, not the SAOT. But it is doable anyway: We could create the enum type as a lambda expression which Graphene will evaluate only later later when the SAOT is known (Graphene already provides that mechanism for postponing type definitions).

To sum up, I think we have two ways to proceed and solve 3) from above:

  1. Throw an error when enums don't have a name, but provide a way of letting the user specify a name via the SAOT meta options (not only by changing the SAM column type options). For instance, we could provide a dictionary field_options as meta option, with keys like enum_type_name, convert_enum_value_names, and maybe also keys like field_name or exclude which then could replace the existing exclude_fields and rename_field (Add a meta option to allow field names renaming #178) meta options.

  2. Auto-generate the enum type name, derived from the SAOT field name, optionally with the SAOT name as prefix and "Enum" as postfix (these could be config options of the registry). In this case we can postpone the config option for individual fields.

Personally, I prefer 2. We can and should still add an option later to override the auto-generated enum names as discussed in 1. But this could be created as another issue and postponed to a later version.

Irrespective of which way we choose, I now think that instead of get_enum_for_column and get_sort_enum_for_model which I proposed earlier in 5) we should actually provide get_enum_for_field and get_sort_enum_for_object, where you pass a SAOT (field), not a SAM (column).

Does this all make sense or am I overthinking the issue?

@Cito
Copy link
Member Author

Cito commented Apr 23, 2019

Just realized that contrary to get_enum_for_column, get_enum_for_field would need to accept two parameters - the SAOT and the field name. Ugly. We could require the SAOT field itself, that would be even uglier because it can only be specified as SAOT._meta.fields(fieldName), not as SAOT.fieldName as is possible with SAM columns. But the fields do not exist as attributes on the SAOTs. Sigh.

@Cito
Copy link
Member Author

Cito commented Apr 23, 2019

An alternative idea just proposed by @thejcannon on the Slack channel is to get the sort enums and column enums via a method provided on the SAOT itself. Eg. SAOT.sort_enum() and SAOT.enum_for_field(fieldName). The model could then call (its) registry. I really like that idea. All the clumsiness of going through the registry module and/or instance and the problem mentioned in my last comment would go away.

@jnak
Copy link
Collaborator

jnak commented Apr 23, 2019

@Cito:

Conceptually, I think we should assume the worst case that the user has no influence on the SA models and that these can only be imported, but not modified.

I'm not sure we should assume that people cannot modify their SQLAlchemy models. If that's the case that means it will be very hard for them to rename enums / resolve name collisions. I'd rather assume that only the DBs cannot be changed. What would be the legitimate cases where one can import SQLAlchemy models but cannot modify anything about it?

@Cito:

The model could then call (its) registry. I really like that idea. All the clumsiness of going through the registry module and/or instance and the problem mentioned in my last comment would go away.

I agree with that. I would also prefer using the actual field (vs fieldName) but there does not seem to be a clean way to do this with graphene.

I would prefer sort_enum not to be defined on the main SOAT since it's an optional feature. Does it make sense to accomplish this via a mixin or a subclass? That would be in-line with how SQLAlchemyConnectionField subclasses UnsortedSQLAlchemyConnectionField .

@Cito
Copy link
Member Author

Cito commented Apr 24, 2019

What would be the legitimate cases where one can import SQLAlchemy models but cannot modify anything about it?

I can imagine using models from a 3rd party package, e.g. a CMS like kotti.resources. Or you could use automap to reflect the database schema. I just tested that you can in fact create the enums from the database that way and then be sure they always reflect what's really in the database. You don't want to redefine the enums in Python in that case.

Does it make sense to accomplish this via a mixin or a subclass?

I don't think that this is convenient. For the ConnectionFields a sort argument is added if it's not an UnsortedSQLAlchemyConnectionField, so there is a difference. But for the SAOT itself no functionality is added automatically. It would just provide the sort_enum method and you can make use of it or not.

@jnak
Copy link
Collaborator

jnak commented Apr 24, 2019

I can imagine using models from a 3rd party package, e.g. a CMS like kotti.resources. Or you could use automap to reflect the database schema. I just tested that you can in fact create the enums from the database that way and then be sure they always reflect what's really in the database. You don't want to redefine the enums in Python in that case.

Fair enough. That is pretty uncommon though. How about we expose the function we use to generate a g-enum tuple from an sa-enum? This way we can resolve those situations by redefining the field like this:

kind = graphene.Enum('KindEnum', to_enum_tuple(MyModel.kind))

It's not the most succinct pattern but it is pretty straightforward and keeps our core logic simple and predictable. Given it's a pretty uncommon situation, that seems like a reasonable tradeoff.

I don't think that this is convenient. For the ConnectionFields a sort argument is added if it's not an UnsortedSQLAlchemyConnectionField, so there is a difference. But for the SAOT itself no functionality is added automatically. It would just provide the sort_enum method and you can make use of it or not.

I agree it's not the most convenient but I would prefer to keep this feature separated from SOAT like it is as the moment because there are a number of issues with that feature:

  • SQLAlchemyConnectionField can only be defined manually and on the Query type. A relationship is never automatically converted by SOAT into a connection field.
  • The Sort enum generated expose all columns of a SAM. That includes the columns not exposed by Graphene and the columns not indexed. This is both a security and a performance issue.

How about we just keep get_sort_enum_for_object on the registry for now and make it more convenient later once we resolved all these issues?

@jnak
Copy link
Collaborator

jnak commented Apr 24, 2019

Regarding my last point, we can also have define a public get_sort_enum_for_object function (vs method) that gets the registry from the SOAT passed an an attribute. In that case, it will be almost exactly as verbose and we still get the benefit of not having to pass the registry: get_sort_enum_for_object(MyModel).

@Cito
Copy link
Member Author

Cito commented Apr 24, 2019

The Sort enum generated expose all columns of a SAM. That includes the columns not exposed by Graphene and the columns not indexed. This is both a security and a performance issue.

This is what I tried to explain above and why the idea is to provide SAOT.sort_enum(), not SAM.sort_enum() or get_sort_enum_for_model(SAM). The sort enum would the only contain the fields that exist in the SAOT which may have different names than the columns in the model, and would not contain the excluded columns. And nothing is created automatically. You must call that method explicitly and use the result as type of a sort argument.

it will be almost exactly as verbose and we still get the benefit of not having to pass the registry: get_sort_enum_for_object(MyModel).

SAOT.sort_enum() is still shorter, and it has the advantage that it can use the correct registry associated with the SAOT, not the globale one.

@jnak
Copy link
Collaborator

jnak commented Apr 24, 2019

SAOT.sort_enum() is still shorter, and it has the advantage that it can use the correct registry associated with the SAOT, not the global one.

Sorry I meant get_sort_enum_for_object(SAOT)

@Cito
Copy link
Member Author

Cito commented Apr 24, 2019

Yes, that could also pick the registry from the SAOT. But I still don't understand what would be bad about having SAOT.sort_enum(). The SAOT already has other methods like SAOT.get_query() and SAOT.get_node(). And what about SAOT.enum_for_field(fieldName)? Would you provide that only as a registry function, too? I think the API should be consistent.

@jnak
Copy link
Collaborator

jnak commented Apr 26, 2019

To clarify, I'm in favor of SAOT.enum_for_field(fieldName) because it's actually SAOT that creates those enums in this case. It's a core a feature of SAOT and as such it makes sense to have a helper on the SAOT class itself. Similarly, get_node and get_query are core features of SAOT because the promise of SAOT / graphene-sqlalchemy is to help build Relay-compliant schemas.

I think sort_enums are a different story for 2 reasons:

  • The sort enums are not created by SAOT. Users have to define sorted field manually (see test). SAOT do not need to be aware of sort enums.
  • Sort enums are a major performance foot-gun. Because sort fields can only be defined at the Query level (see SQLAlchemyConnectionField sort works only for top-level query.  #163), each query that sorts on a non-indexed column will trigger a full table scan. I think it would be reasonable to keep this feature under the radar (by not putting it on SOAT directly) until this major problem is fixed.

But I don't feel strongly about it so do whatever you think is reasonable. I just wanted to voice my concerns.

I'm going to be offline for a week. I feel very good about all this so feel free to start working on a PR if you have the bandwidth.
Cheers

@Cito
Copy link
Member Author

Cito commented Apr 27, 2019

I have now created a new PR #210 with the implementation along the lines of what has been discussed here.

When creating sort enums, you can now rescrict fields with an additional only_fields parameter (which is effective in addition to the already existing only_fields option of the object type), and restrict to indexed fields with the parameter only_indexed.

I will be busy with other projects in the next time again and leave if to you to push this forward when you are back. It would be good if the PR could be merged before other big changes are made to the master branch because it is pretty big. We can still make adaptations to it before releasing the next version.

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related topics referencing this issue.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 25, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants