-
-
Notifications
You must be signed in to change notification settings - Fork 308
Clarify whether plain name fragment URIs are canonical, and what it means if they are (or aren't) #937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Depends on how the term is going to be used :) I can give one example where a canonical URI that contains a plain-name fragment is not the most useful uri to be slinging around: when generating errors or annotations. I just ran into this today while implementing
So, the interesting thing here is that there are multiple errors generated on the far side of |
The "canonical" link relation type is defined in RFC 6596, and our usage should be roughly compatible with that. It is linked under § 8.2.2 The "id" keyword in the spec. Your example definitely shows that So the answer to the question "what form should be used for consistent reporting of locations, including keyword locations that are not themselves schema objects" is clearly "Use the JSON Pointer from the nearest The other question around how we talk about canonical URIs is in that they MUST be supported, while non-canonical URIs, at least those involving JSON Pointers from other base URIs, MAY be supported, but don't have to be. However, plain name fragments MUST be supported, so maybe it's just a case of cleaning up the language there. And also around the empty fragment JSON Pointer fragment, which also MUST be supported even if the, um... most canonical? URI for schema resource roots is one without a fragment at all. |
@handrews Apologies for the nitpick: did you mean to use "are" and not "or" in the issue title? I'm trying to understand it precisely. |
I'm just ruminating on the example given above. Given these four absolute locations:
Without saying anything about being canonical, I suppose they could also be expressed as:
I'm just trying to wrap my head around @handrews's comments above and comparing these lists is helpful to me. It also makes sense why "use the nearest |
@ssilverman no, your second block of URIs is invalid. You cannot mix plain-name and JSON Pointer syntax. |
Ok, thanks for clarifying that. |
So what about “or” vs. “are” in the title? |
see the examples in https://json-schema.org/draft/2019-09/json-schema-core.html#rfc.section.10.4.2 but also see json-schema-org/json-schema-spec#937 - instance_location and keyword_location will always be json pointers - absolute_keyword_location will always be an absolute URI or URI reference, when defined Luckily the published schema at https://json-schema.org/draft/2019-09/output/schema will still validate our output, because a bare json pointer looks like a uri reference.
"canonical" as opposed to what? If we're talking about a link relation to apply to an $id keyword, I think I brought up an issue where |
Canonical in this case means the fully-resolved uri associated with the current subschema location, derived from the nearest $id (as there could be multiple $ids up the heirarchy that would allow the current location to be referenced from a number of different uris). This SHOULD be an absolute uri, so long as there is an absolute uri provided as the base uri for the entire document, or there is an absolute uri as an $id somewhere up along the way. It will never contain a plain-name fragment (as per discussions above in this thread), but it could contain a json pointer fragment if this exact location doesn't have an $id. Hopefully I didn't get that too wrong. I'm at present in the middle of implementing $id and $anchor logic in my implementation so I'm up to my eyeballs in URI terminology :) |
@awwright Yeah I was trying to remember that discussion. IIRC, I liked "canonical" because there was some situation where "self" and "canonical" diverged and I thought canonical made more sense, but I can't remember what it was, or find where that discussion happened (probably slack). |
maybe we need a glossary :D I've been confused a few times by what "absolute" refers to, since an absolute URI in the RFC is one with a scheme and without a fragment, but we sling around absolute uris with fragments (either plain-name or json pointers) all the time. Oh yeah, "canonical" also refers to the cleaning up of paths like "foo/../bar". |
@karenetheridge we refer to absolute keyword location which may be identified by a URI with a fragment, but we should not refer to absolute URIs with fragments. Any such wording should be fixed. |
@karenetheridge for a more general specification work glossary, I'd recommend using / contributing to http://webconcepts.info/ Most of this stuff is not JSON Schema-specific so we shouldn't make yet another glossary. Stuff that is JSON Schema-specific should go in our specs. |
@ssilverman "are" fixed thanks |
OK, digging into RFC 6596, I believe the mental model was an extension of the situation when you fetch a schema document with one URI, but it provides a different one with Essentially,
I think I went from the author's preferred version of a resource and looking at Extending the request-URI mental model to a multi-resource JSON Schema document, a URI using a JSON Pointer relative to the document root for an embedded resource is the "request URI", and the embedded Of course, as @awwright observed, it's not uncommon for resources to include a "self" link. That link relation type is inherited from the Atom RFC, which initially established the IANA registry of link relation types. As far as I can tell, the entire specification for "self" is:
Which overlaps with but is more general than "canonical". I recall feeling moderately strongly that "canonical" was better, but can no longer remember quite why. Let's do this. Whatever we call these things, we need a decision about how to present plain name fragment URIs. @awwright @karenetheridge and/or @ssilverman, if you want to argue for "self", let's make a separate issue for that, and let this issue focus on the original point. If we need to wait on the other issue, that's fine. |
see the examples in https://json-schema.org/draft/2019-09/json-schema-core.html#rfc.section.10.4.2 but also see json-schema-org/json-schema-spec#937 - instance_location and keyword_location will always be json pointers - absolute_keyword_location will always be an absolute URI or URI reference, when defined Luckily the published schema at https://json-schema.org/draft/2019-09/output/schema will still validate our output, because a bare json pointer looks like a uri reference.
It's taken me a while to get to this one, and in the meantime, @handrews, you've come to the conclusion about canonical I was going to suggest. If a schema is retrieved from However, that doesn't seem to be the way it is used in the spec. Canonical is all about choosing between valid alternatives, but the term is often used outside of the context of multiple URI options. That's at best confusing. Another issue is that people read the spec and think that absolute URI is part of the definition of canonical, when it's actually just part of the definition of The biggest issue is that the spec sometimes speaks of non-canonical URIs as potentially not supported. That's certainly not what canonical means. Canonical is about choosing between valid options. If they weren't valid, they wouldn't even be considered. Since canonical is about selecting from equally valid URIs, I'm not sure there's a good reason for the spec to decide which URI should be canonical. That can be left up to implementations to decide without compromising interoperability. In my implementation I have a schema data structure and you can ask it what it's URI is. Sometimes I need to choose between multiple options, but it shouldn't matter if my choice is different than someone else's if both URIs point to the same thing. For the record, my implementation chooses |
Something on my todo list for when the spec nears completion is to look over all of the uses of absolute URIs, relative URIs, URI references etc and look for inconsistencies or confusing wording. URI resolution is fairly straightforward, but you have to understand what bits refer to what to apply it properly :)
I agree with this as well. |
@jdesrosiers I could have sworn I wrote a long response to you yesterday, but it's not here. I must have not hit the comment button and then closed the window :/ I'll have to remember what I meant to say and will get back to this soon. |
iirc this was to allow a simpler approach to resolving references and provide a standard best practice approach. I believe we wanted to say, when there are multiple possible URIs to a given target, THIS is the one you should use. Canonical may not be the best word to describe that URI, but naming things is hard, and canonical is a familiar term (at least to me). Specifically, for example, like this... json-schema-spec/jsonschema-core.xml Lines 1764 to 1786 in 0003dbb
Now, it's not to say that you cannot fully implementat and support both URIs, but in order to make implementation requirements simpler, and to help schema authors (and consumers) avoid ambiguity, canonical you can rely on to always work. For quick reference, here's the section of the spec we're talking about... json-schema-spec/jsonschema-core.xml Lines 1821 to 1840 in 0003dbb
I'm not sure that's accurate...
Followed by... (note line 3301) json-schema-spec/jsonschema-core.xml Lines 3298 to 3308 in 0003dbb
So to me, it looks like we've said there are multiple valid canonical URIs... and that kinda goes against the naming "canonical" in this instance. |
@jdesrosiers regarding:
The mental model here came from when you have something like (in YAML because I'm lazy today): $id: https://example.com/schema/foo
$defs:
bar
$id: https://example.com/schema/bar
type: object
properties:
biz: {} So some implementation would support addressing the That fits the request vs canonical model just fine. Separately, it is not necessary to support So really, it's "these URIs are not canonical because they might not even work" more than "these URIs might not work because they are non-canonical." Although it is a bit of a chicken-and-egg philosophical problem and maybe there's a better option. |
What I would like to end up with is this, and I'm flexible on the terminology, except that if there is no clear consensus it stays as-is:
I am open to suggestion on how to best communicate the above. |
I think we're all on the same page about how to use canonical.
Yep. That's exactly the kind of thing "canonical" should be used for.
Agreed. That sounds wrong.
Yep. Our only difference was that I was thinking of those URIs not as "they might not work", but as "invalid" and I find it awkward to assign a canonical status to something that isn't even a valid alternative. If the URI "might" work, then I agree that it can be considered non-canonical. @handrews I agree with your list in the previous comment. The only thing I'd change is that I wouldn't say anything about preference. That kind of thing seems more the domain of a style guide. URIs with fragments that cross In my implementation's documentation, I use the terms "internal identifier" and "external identifier". You can retrieve the schema using either one, but if you ask the schema for it's URI, it will tell you the "internal identifier". So, "internal"/"external" is one way to explain it. Another way to describe it could be as similar to the "base" tag in HTML where all references are resolved against the base rather than the URL used to retrieve the document. |
I hope this is a good place for this question, but what about Another use-case for this is multiple schemas within a single document. Admittedly this use-case is also supported by fragments, as discussed here, but will make later separating them into multiple documents harder. As far as I know, there are at least some implementations which already support something like this. |
Yes, that's the way
If you find an implementation that doesn't work like this for embedded schemas, it's a bug. However, not all implementations allow you to specify an external identifier for a root schema. That's not a bug, just optional behavior that's not supported. If the implementation does allow you the specify an external identifier, but doesn't use it, then it would be a bug. |
I don't believe there are any functional changes to be had as a result of resolving this issue, so I'd like to punt it to next draft in order to enable the publication of 2020 draft sooner. |
As far as the original question posed by this issue, @jdesrosiers and @karenetheridge say they use the JSON Pointer forms as canonical, and not the plain name fragment. That is good enough for me! A clarification PR of some sort is probably in order. If there are other debates that need to continue (e.g. is "canonical" even the term we want?) I think those should move out to their own issues. |
Fixes issue json-schema-org#937, clarifying a number of other things along the way. While it touches a fair number of lines, I'm fairly sure that it doesn't anything about conformance. After spending more time reading various writings on the concept of the "canonical" URI for a resource, and reviewing our language, I came to the following conclusions: * canonical URIs only make sense at the whole-resource scope * A URI with a fragment is neither canonical nor non-canonical * It makes more sense to talk about fragments w.r.t. canonical URIs * Our language was sufficiently confusing that going this way seems fine. As part of this, I fixed an outright incorrect statement that identifier keywords set canonical URIs. Since there is only one canonical URI and a single schema object could contain three ($id, $anchor, $dynamicAnchor) or more identifier keywords, this statement is clearly a bug. These keywords assign URIs, but only $id assigns a canonical one. I revamped a lot of wording in descriptions and examples to hopefully be more precise. I separated the discussion of the empty fragment in $id from the main paragraph of its functionality, and clarified that this is talking about a media-type-specific semantic equivalence, and is not asserting that RFC 3986 normalization applies to fragments (this has been a point of confusion).
I don't believe we have reached consensus.
The part in bold above, we are not communicating clearly enough. In 2020-12, we do detail these facts, even if only in the form of showing multiple canonical URI for a path as part of Appendix A. See: https://json-schema.org/draft/2020-12/json-schema-core.html#rfc.appendix.A
I think we all agree, when we say "canonical URI" we mean ONE URI. I feel like I got lost a little when @karenetheridge (correctly) pointed out every location in a schema has a canonical URI. Remember, the issue was about URIs which crossed resource boundaries in the same document (Schemas in which json-schema-spec/jsonschema-core.xml Lines 1820 to 1823 in 117c05e
What if we were to change this phrasing to something like...
The issue is with addressing locations which have a defined In fact, RFC6596, The Canonical Link Relation, which we reference for the definition of canonical, only describes in terms of resources. The fragments are attched the the resource's canonical URI to provide in-resource location. It was said in #1104 that...
It has taken me up till now to understand the above and draw the same conclusion. I'm going to focus my effots to modify language to reflect this as a PR. As an aside... Although I feel this may be out of scope for this issue... Picking up regarding #1104 and a thread of comments on a review.
We are talking about the lines quoted previously above, regarding what an implementation may choose not to support. @karenetheridge @jdesrosiers, Would you be happy with something that reworked this to make addressing schema resources with non-canonical URIs something where the "behaviour is undefined"? I would add a CREF to note that the reason it might still work is due to the nature of using JSON Path, it's likely in most cases easy to resolve the resulting location? (Personally, I don't see why implementations have such a hard time with this issue specifically. I haven't implemented JSON Schema myself, but surley you take the JSON at a location, apply the JSON Path, and you have a new location. I'd be interested to hear what I'm missing... but not in this Issue, please.) |
For the patch release, anything is an improvement, but yes I would like to see it described as "undefined". For the future, we need to stop using "canonical" in the spec. It's not the right abstraction. It's confusing, complicated, and I think it misses the point. An embedded schema is it's own independent schema distinct from it's parent (think iframe). A pointer can only point to a location within a single schema resource (I can't craft an xpath to point to a location within an iframe). That's the whole concept.
It's not necessarily hard, it's just unnecessary complexity. It's much simpler to not have to track base URI and dialect changes depending on where you are in the schema. With strict boundaries, every schema has one base URI and one dialect no matter where you are. I'm lazy, I don't want to write code to track those changes when there's a neat and simple alternative conceptual model that doesn't require additional code and supports everything schema authors need. I just break down compound schemas when they are loaded and then I don't have to worry about anything changing. It's not even extra work because I have to break down the compound schema anyway for validation against the meta-schema. |
Attempt to resolve json-schema-org#937 Add note and cref in appendix A clarifying that we intended to define a URI phrasing which would avoid the requirement to allow for location shadowing in implementations, as this is tricky. Clarifying that plain name fragments should always be supported, and that they only can work in relation to the base URI of the Schema Resource. Otherwise there could be duplicate plain name fragments and addressing wouldn't work
…cal (#1192) * Clarify that plain name fragments are neither canonical or non-canonical Attempt to resolve #937 Add note and cref in appendix A clarifying that we intended to define a URI phrasing which would avoid the requirement to allow for location shadowing in implementations, as this is tricky. Clarifying that plain name fragments should always be supported, and that they only can work in relation to the base URI of the Schema Resource. Otherwise there could be duplicate plain name fragments and addressing wouldn't work.
Closing this issue as #1192 was merged. |
Fixes issue json-schema-org#937, clarifying a number of other things along the way. While it touches a fair number of lines, I'm fairly sure that it doesn't anything about conformance. After spending more time reading various writings on the concept of the "canonical" URI for a resource, and reviewing our language, I came to the following conclusions: * canonical URIs only make sense at the whole-resource scope * A URI with a fragment is neither canonical nor non-canonical * It makes more sense to talk about fragments w.r.t. canonical URIs * Our language was sufficiently confusing that going this way seems fine. As part of this, I fixed an outright incorrect statement that identifier keywords set canonical URIs. Since there is only one canonical URI and a single schema object could contain three ($id, $anchor, $dynamicAnchor) or more identifier keywords, this statement is clearly a bug. These keywords assign URIs, but only $id assigns a canonical one. I revamped a lot of wording in descriptions and examples to hopefully be more precise. I separated the discussion of the empty fragment in $id from the main paragraph of its functionality, and clarified that this is talking about a media-type-specific semantic equivalence, and is not asserting that RFC 3986 normalization applies to fragments (this has been a point of confusion).
Fixes issue json-schema-org#937, clarifying a number of other things along the way. While it touches a fair number of lines, I'm fairly sure that it doesn't anything about conformance. After spending more time reading various writings on the concept of the "canonical" URI for a resource, and reviewing our language, I came to the following conclusions: * canonical URIs only make sense at the whole-resource scope * A URI with a fragment is neither canonical nor non-canonical * It makes more sense to talk about fragments w.r.t. canonical URIs * Our language was sufficiently confusing that going this way seems fine. As part of this, I fixed an outright incorrect statement that identifier keywords set canonical URIs. Since there is only one canonical URI and a single schema object could contain three ($id, $anchor, $dynamicAnchor) or more identifier keywords, this statement is clearly a bug. These keywords assign URIs, but only $id assigns a canonical one. I revamped a lot of wording in descriptions and examples to hopefully be more precise. I separated the discussion of the empty fragment in $id from the main paragraph of its functionality, and clarified that this is talking about a media-type-specific semantic equivalence, and is not asserting that RFC 3986 normalization applies to fragments (this has been a point of confusion).
Fixes issue #937, clarifying a number of other things along the way. While it touches a fair number of lines, I'm fairly sure that it doesn't anything about conformance. After spending more time reading various writings on the concept of the "canonical" URI for a resource, and reviewing our language, I came to the following conclusions: * canonical URIs only make sense at the whole-resource scope * A URI with a fragment is neither canonical nor non-canonical * It makes more sense to talk about fragments w.r.t. canonical URIs * Our language was sufficiently confusing that going this way seems fine. As part of this, I fixed an outright incorrect statement that identifier keywords set canonical URIs. Since there is only one canonical URI and a single schema object could contain three ($id, $anchor, $dynamicAnchor) or more identifier keywords, this statement is clearly a bug. These keywords assign URIs, but only $id assigns a canonical one. I revamped a lot of wording in descriptions and examples to hopefully be more precise. I separated the discussion of the empty fragment in $id from the main paragraph of its functionality, and clarified that this is talking about a media-type-specific semantic equivalence, and is not asserting that RFC 3986 normalization applies to fragments (this has been a point of confusion).
…cal (#1192) * Clarify that plain name fragments are neither canonical or non-canonical Attempt to resolve #937 Add note and cref in appendix A clarifying that we intended to define a URI phrasing which would avoid the requirement to allow for location shadowing in implementations, as this is tricky. Clarifying that plain name fragments should always be supported, and that they only can work in relation to the base URI of the Schema Resource. Otherwise there could be duplicate plain name fragments and addressing wouldn't work.
Fixes issue #937, clarifying a number of other things along the way. While it touches a fair number of lines, I'm fairly sure that it doesn't anything about conformance. After spending more time reading various writings on the concept of the "canonical" URI for a resource, and reviewing our language, I came to the following conclusions: * canonical URIs only make sense at the whole-resource scope * A URI with a fragment is neither canonical nor non-canonical * It makes more sense to talk about fragments w.r.t. canonical URIs * Our language was sufficiently confusing that going this way seems fine. As part of this, I fixed an outright incorrect statement that identifier keywords set canonical URIs. Since there is only one canonical URI and a single schema object could contain three ($id, $anchor, $dynamicAnchor) or more identifier keywords, this statement is clearly a bug. These keywords assign URIs, but only $id assigns a canonical one. I revamped a lot of wording in descriptions and examples to hopefully be more precise. I separated the discussion of the empty fragment in $id from the main paragraph of its functionality, and clarified that this is talking about a media-type-specific semantic equivalence, and is not asserting that RFC 3986 normalization applies to fragments (this has been a point of confusion).
There's a little confusion and ambiguity on this topic.
Things to figure out:
The text was updated successfully, but these errors were encountered: