Skip to content

Add support for reference-based schemas in OpenAPI document #56175

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jun 18, 2024

Conversation

captainsafia
Copy link
Member

Note: this PR targets a feature branch.

This pull request adds support for generating reference IDs for JSON schemas that are duplicated across the OpenAPI document. This implementation consists of three components:

  • A collection of IEqualityComparer implementations that allow us to compare whether two OpenApiSchema instances are equivalent
  • An IOpenApiDocumentTransformer implementation that walks the schemas in the OpenAPI document model and replaces inline schemas with reference ID-based schemas where applicable
  • Additional logic to the schema store to support storing OpenAPI schemas alongside their reference IDs
    • Note: this PR generates a random GUID for each schema reference ID. Human-readable schema reference IDs will come in a follow-up PR.

In this implementation, references are generated based on the equality of the generated schemas, not the .NET type they are associated with. With this strategy, we avoid leaking details of .NET type system to the OpenAPI document. For example, types that map to the same JSON schema (like IEnumerable<Todo> and Todo[]) will use the same reference ID. This strategy also allows us to account for transformations made to the OpenAPI schema via transformers in the final document.

@captainsafia captainsafia requested a review from a team as a code owner June 10, 2024 19:59
@dotnet-issue-labeler dotnet-issue-labeler bot added the old-area-web-frameworks-do-not-use *DEPRECATED* This label is deprecated in favor of the area-mvc and area-minimal labels label Jun 10, 2024
@captainsafia captainsafia added area-mvc Includes: MVC, Actions and Controllers, Localization, CORS, most templates feature-openapi area-minimal Includes minimal APIs, endpoint filters, parameter binding, request delegate generator etc and removed old-area-web-frameworks-do-not-use *DEPRECATED* This label is deprecated in favor of the area-mvc and area-minimal labels labels Jun 10, 2024
return false;
}

return GetHashCode(x) == GetHashCode(y);
Copy link
Member

@JamesNK JamesNK Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two objects might return true but not be the same if equals is implemented like this. Won't that be a problem?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say more? It should function as an equality operator as long as it's symmetric, reflexive, and transitive, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the typical expectation is that types with the same hash codes might be equal whereas types with different hash codes are definitely not equal.

Assuming I've done my implementation correctly, GetHashCode for OpenApiSchema and should return a unique value for each equivalent schema object (e.g. the bucket size is always 1).

Copy link
Member

@JamesNK JamesNK Jun 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When working with hashcodes I don't think you can guarantee the bucket size is 1 given the nature of the objects. The data in the object is being reduced down to a 2^32 number. There are infinite possible ways of defining schemas that are reduced to a hashcode so it seems like at some point there must be hash collisions.

I question what is happening here - using hash code as equals goes against guidance - but on the other hand you're confident that's its ok. I don't feel certain enough in my understanding of hash tables to be sure. Maybe ping an expert in this area?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your questions here are legit. After some rubber ducking, it appears there is a gap here when evaluating OpenApiByte and OpenApiBinary types since they encapsulate a byte area that we were evaluating correctly. I did some hardening on the comparer implementations in 623fa19

@JamesNK
Copy link
Member

JamesNK commented Jun 11, 2024

In this implementation, references are generated based on the equality of the generated schemas, not the .NET type they are associated with. With this strategy, we avoid leaking details of .NET type system to the OpenAPI document. For example, types that map to the same JSON schema (like IEnumerable<Todo> and Todo[]) will use the same reference ID. This strategy also allows us to account for transformations made to the OpenAPI schema via transformers in the final document.

I agree about not duplicating schemas for different collection types. Did you consider having some logic that knows about collections and ensures that JSON array types are shared, e.g. anything that implements IEnumerable<T> is automatically shared, e.g.

  • IEnumerable<Todo>
  • Todo[]
  • List<Todo>
  • ImmutableArray<Todo>
  • etc

That seems like a simple and clean way to share collection type schemas.

@captainsafia
Copy link
Member Author

Did you consider having some logic that knows about collections and ensures that JSON array types are shared, e.g. anything that implements IEnumerable is automatically shared, e.g.

I'm not sure what you mean by "JSON array types" are shared. Some of the logic around generating the actual schema for the array types is hidden behind STJ's JsonSchemaMapper support. For array-based schemas, it keys off whether it's JsonTypeInfoKind.Enumerable. Is using something like that what you're describing here?

return false;
}

return GetHashCode(x) == GetHashCode(y);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say more? It should function as an equality operator as long as it's symmetric, reflexive, and transitive, no?


internal sealed class OpenApiAnyComparer : IEqualityComparer<IOpenApiAny>
{
public static OpenApiAnyComparer Instance { get; } = new OpenApiAnyComparer();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually give such types a private constructor, but I'm guessing you considered that to be too much ceremony.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean providing a private constructor over a static Instance property or both? What's he value of the private constructor in the later case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually provide both so no one fails to notice the Instance member and instantiates it themselves.

@captainsafia captainsafia requested review from wtgodbe, a team, halter73 and mgravell as code owners June 13, 2024 15:23
@captainsafia captainsafia changed the base branch from feature/openapi-refs to main June 13, 2024 15:24
@captainsafia
Copy link
Member Author

Apologies for the review request spam! I've rebased against main and changed the branch target for this PR from a feature branch to main. Support for friendly $ref names is in 9ae6437.

@captainsafia captainsafia removed request for a team, mgravell and wtgodbe June 13, 2024 15:25

internal static class JsonTypeInfoExtensions
{
private static readonly Dictionary<Type, string> _simpleTypeToName = new()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any other primitive-y types that should be added? For example I added DateOnly, TimeOnly, (U)Int128 and some others to the latest release of Swashbuckle (admittedly for a different purpose): JSON serializer context

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DateOnly, TimeOnly, and a few others on the referenced link get handled by the JsonSchemaMapper for us. I'll do a few more comparisons but I think the set we have here is good enough.

@@ -17,16 +19,39 @@ internal sealed class OpenApiSchemaStore
private readonly ConcurrentDictionary<OpenApiSchemaKey, JsonObject> _schemas = new()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some similar code here in case that's useful too.

Comment on lines 291 to 219
"DateTime": {
"type": "string",
"format": "date-time"
},
"IFormFile": {
"type": "string",
"format": "binary"
},
"IFormFileCollection": {
"type": "array",
"items": {
"$ref": "#/components/schemas/IFormFile"
}
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this potentially bleed out some ".NET-ness" where more generic names could be used (form-file or whatever)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially, if applicable, I think I might prefer FormFile over form-file since it is a bit more homogenous with remaining conventions and more closely aligns with valid identifier names in most programming languages. There's probably some implementation bleed around disclosing that this is an interface.

@JamesNK
Copy link
Member

JamesNK commented Jun 14, 2024

I'm not sure what you mean by "JSON array types" are shared. Some of the logic around generating the actual schema for the array types is hidden behind STJ's JsonSchemaMapper support. For array-based schemas, it keys off whether it's JsonTypeInfoKind.Enumerable. Is using something like that what you're describing here?

Yes.

IMO when generating schemas for types then I think Type is a good key. In some cases we want to massage the type, such as make int[], IEnumerable<int>, ImmutableArray<int>, etc, all resolve to the same schema since they're all just arrays of the same schema. And primitive types can be special cased, e.g. Int32 -> int, etc.

Why is it better to use the schema as a key than the type?

@captainsafia
Copy link
Member Author

IMO when generating schemas for types then I think Type is a good key. In some cases we want to massage the type, such as make int[], IEnumerable, ImmutableArray, etc, all resolve to the same schema since they're all just arrays of the same schema. And primitive types can be special cased, e.g. Int32 -> int, etc.

The behavior that you described here matches the current behavior of this implementation. I added this snapshot test to more clearly illustrate the behavior of the reference schema.

When it comes to generating the schemas themselves, we're still using the .NET type. It's only when we generate reference IDs that we bring in the uniqueness of schemas. This makes the implementation more tolerant to:

  • Types that share the same schema but are represented with a different .NET type like the collection and key-value pair collections mentioned above.
  • Schemas that have been modified as the result of a users schema transformers. For example, if the schema for a type is modified when it is used in a response object vs. a request object.

@captainsafia
Copy link
Member Author

👈🏽 for reviews -- I think I've addressed a bulk of the feedback around the comparers in 623fa19. Would like to get this in for preview6 if possible. 🙏🏽

{
counter++;
_referenceIdCounter[targetReferenceId] = counter;
_schemasWithReference[schema] = $"{targetReferenceId}{counter}";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for referenceId "Foo" and "Foo2" to already be incoming referenceId's? I'm thinking something along the lines of X509Certificate and X509Certificate2 but DTOs. It looks like this could conflict if the type without the 2 got an incremented counter due to special annotations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this is a possible (although unlikely) edge case. In the event that this happens, the implementation will throw an exception to the user about the unexpected conflict.

There's a couple of ways I could see around solving this:

  • Throw a more descriptive exception when we try to insert duplicate IDs so the user has a little bit more insight into what is going on here.
  • Provide an options hook that allows users to configure custom schema name rules that override are defaults.

I'm leaning towards the later given it's a generally useful feature that gives users more flexibility around the naming conventions. That way, you can map things appropriately if there are overloaded type names in your application.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed #56305 to discuss the second approach further.

throw new InvalidOperationException("The schema reference ID must be set on the schema.");
}

public Dictionary<OpenApiSchema, string?> SchemasByReference => _schemasWithReference;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an ImmutableDictionary? Also, should we just make it an internal property instead of a field+property?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplified this to a property. I'm not the biggest fan of ImmutableDictionary given the requirements on using the builder pattern for it, especially because we're building up the list of schemas as the schema service gets called.

@captainsafia captainsafia enabled auto-merge (squash) June 18, 2024 23:06
@captainsafia captainsafia merged commit 874aaf4 into main Jun 18, 2024
26 checks passed
@captainsafia captainsafia deleted the safia/openapi-ref branch June 18, 2024 23:15
@dotnet-policy-service dotnet-policy-service bot added this to the 9.0-preview6 milestone Jun 18, 2024
@JamesNK
Copy link
Member

JamesNK commented Jun 18, 2024

I'll try to make time this week to rebase my gRPC transcoding + OpenAPI PR on top of this and do some more testing. I'm curious to see the results in (semi) real world usage.

@captainsafia
Copy link
Member Author

I'll try to make time this week to rebase my gRPC transcoding + OpenAPI PR on top of this and do some more testing. I'm curious to see the results in (semi) real world usage.

Sounds good! I'm curious to see how this flows with the gRPC integration. Happy to refine/add new APIs if anything shakes out as a result of that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-minimal Includes minimal APIs, endpoint filters, parameter binding, request delegate generator etc area-mvc Includes: MVC, Actions and Controllers, Localization, CORS, most templates feature-openapi
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants