Skip to content

[FEATURE] Elastic 8 Client - Stores Enums as Strings #8194

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Tracked by #8356
DR9885 opened this issue May 22, 2024 · 5 comments
Open
Tracked by #8356

[FEATURE] Elastic 8 Client - Stores Enums as Strings #8194

DR9885 opened this issue May 22, 2024 · 5 comments
Labels
8.x Relates to a 8.x client version Category: Enhancement Usability

Comments

@DR9885
Copy link

DR9885 commented May 22, 2024

Is your feature request related to a problem? Please describe.
Before enums were stored as integers, which allowed us to do range queries.

Ex: we have a Rating Enum, and sometimes we like to search above a certain rating.

Describe the solution you'd like
Options:

  1. Go back to enums defaulting to integers.
  2. Add a way to configure an enum field as integer or string.
@flobernd flobernd added the 8.x Relates to a 8.x client version label May 27, 2024
@flobernd
Copy link
Member

Hi @DR9885,

most of the times you want enums to be stored as string, because otherwise reordering them or adding new members inbetween will mess up the complete mapping.

As we are using System.Text.Json as our JSON serializer, you could simply register a custom converter for this specific enum:

public class JsonOrdinalEnumConverter :
	JsonConverter<Enum>
{
	public override Enum? Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options) =>
		(Enum)Enum.ToObject(typeToConvert, reader.GetUInt64());

	public override void Write(Utf8JsonWriter writer, Enum value, JsonSerializerOptions options) =>
		writer.WriteNumberValue(Convert.ToUInt64(value));
}

It's important to use the converter attribute on the property like this:

public class Movie
{
    [JsonConverter(typeof(JsonOrdinalEnumConverter))]
    public Rating Rating { get; init; }
}

and not on the type (sadly, the default serializer always takes precedence):

[JsonConverter(typeof(JsonOrdinalEnumConverter))]
public enum Rating
{
    ...
}

Does that solve your issue?

@DR9885
Copy link
Author

DR9885 commented May 29, 2024

Good to know, we can leverage the System.Text features.

However, these changes have other impacts:

  • Range Queries are not supported. (as stated above)
  • For Everyone, Upgrading from Nest to 8, All Term Queries will no longer work, and will have to go to ".keyword"
  • Text & Keyword is much larger than Int, so when having a large search db with a bunch of enums. This will now increase the size and slow down the index speed.
  • Responses will also be slower, with larger text to be sent over the network. some enum names can be really long.
  • Also, C# has flags, which are designed to store attributes in the smallest format, which would break the purpose of this.
    https://dev.to/ayodejii/enum-flags-in-c-20a6
  • most nosql stores keep enums as int format.

Example of some Enum Values we use, that would be too wordy to store as strings.

        SuccessorAdditionalOrChangeInTrustee = 52,
        UnscheduledDrawCreditEnhancement = 53,
        SubstitutionCreditLiquidityProvider = 54 ,
        FinancialObligationIncurrence = 55,
        FinObligEventReflectingFinancialDifficulties = 56 , 
        FinancialObligationIncurrenceDebtObligation  = 57 ,
        FinancialObligIncurrenceGuaranteeDebtOblig = 58 ,
        FinanObligIncurrenceGuaranteeDerivativeInstr = 59,
        FinancialObligIncurrenceDerivativeInstrument = 60,
        FinanObligEventReflectFinDifficultiesDefault = 61,
        FinObligEventReflectFinDiffEventOfAccel = 62,
        FinObligEventReflectFinDiffTerminationEvent = 63,
        FinanObligEventReflectFinDiffModifOfTerms = 64 ,
        FinanObligEventReflectFinanDiffOther = 65,
        InitialAssetBackedSecuritiesDisclosure = 66 ,
        QuarterlyAssetBackedSecuritiesDisclosure = 67 ,
        AnnualAssetBackedSecuritiesDisclosure = 68,
        OtherAssetBackedSecuritiesDisclosure = 69 ,
        BankLoanAlternativeFinancingFilings = 70 

I think it would make more sense to have compiled languages like C# and Java store it as an integer. And javascript store it as a string.

@flobernd
Copy link
Member

flobernd commented May 29, 2024

The new default behavior was decided before I joined the team and there probably are as well good arguments in favor of this change. Maybe @Mpdreamz @stevejgordon can remember these?

Range Queries are not supported. (as stated above)

Range queries on enum fields is a nieche use-case and probably even indicates a wrong type. Enums are not intended to cover a continuous range of values.

For Everyone, Upgrading from Nest to 8, All Term Queries will no longer work, and will have to go to ".keyword"

Your point is completely valid. This is a breaking change and could mess with existing data, if not carefully handled (like e.g. using the workaround I mentioned above). I have to check, if this point is part of the migration guide..

The other arguments do all boil down to some advances/disadvantages of the underlaying data type. I still think the current default is good as it minimizes surprises (especially for rather inexperienced developers or new users who don't think about the underlaying types). For the more advances users there still is the option to customize this behavior like using a custom converter like mentioned above.

Maybe I could make the default configurable to make it a little bit easier to switch back to ordinal values.

@Mpdreamz
Copy link
Member

I remember we did this because we got a lot of bug reports where this caught people off guard.

For sure [Flag] enums should be serialised as integers (if they aren't already).

With https://github.com/elastic/elastic-ingest-dotnet we don't default to writing enums as strings because System.Text.Json also doesn't.

The one argument for sending them as integers is to allow people to rename enums values provided they maintain the enum ordering.

We should align https://github.com/elastic/elastic-ingest-dotnet with the client behaviour though.

@DR9885
Copy link
Author

DR9885 commented May 29, 2024

@Mpdreamz there are also performance benefits to using int over string.

  • less network overhead
  • less serialization
  • faster indexing (text & keyword need larger indexes partial/tokenized matching). Integers only need to be indexed for terms.

I have a few objects where we save with over 100 different enums in each object, with batches of 10k models. The change to will impact performance heavily.

Also updating each and every field that uses an enum would take a lot of developer time. Can we just make this configurable by the framework as a whole?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.x Relates to a 8.x client version Category: Enhancement Usability
Projects
None yet
Development

No branches or pull requests

3 participants