Clarify whether plain name fragment URIs are canonical, and what it means if they are (or aren't) #937

handrews · 2020-05-23T01:18:44Z

There's a little confusion and ambiguity on this topic.

Absolute URIs (with no fragment, not even an empty one) are definitely canonical, and identify the complete schema resource.
- This implies that the same URI with an empty JSON Pointer fragment is non-canonical, but it should work, and it would be bad if an implementation did not support it.
URIs with non-empty JSON Pointer fragment URIs, relative to the nearest base URI, are considered canonical.
URIs with non-empty JSON Pointer fragment URIs, relative to a more distant base URI, are considered non-canonical and are not guaranteed to work.
- This was the main point of the canonical URI stuff in 2019-09.
URIs with plain name fragments aren't discussed as canonical or non-canonical
- However, they only work with the nearest base URI so there's never any ambiguity
- Therefore, plain name fragment URIs should always work.

Things to figure out:

Is "canonical" the right term here? IIRC @awwright thought it was not, but I had a vaguely coherent reason as to why it was (which I should look up) and I was burnt out and needed to publish it so we did that. Probably not the best process.
Should we be framing the requirement around what URIs MUST be supported and what MAY be supported differently, in order to capture the possibilities better?
Are there other use cases I'm missing here?

karenetheridge · 2020-05-23T02:19:11Z

Is "canonical" the right term here?

Depends on how the term is going to be used :) I can give one example where a canonical URI that contains a plain-name fragment is not the most useful uri to be slinging around: when generating errors or annotations. I just ran into this today while implementing $anchor. Let me show an example that I'm using in my tests:

data: 1
schema:
{
  "$defs": {
    foo: {
      "$anchor": "my_foo",
      "const": "foo value"
    },
    "bar": {
      "$anchor": "my_bar",
      not: true
    },
  },
  "$id": "http://localhost:4242",
  "allOf": [
    { "$ref": "#my_foo" },
    { "$ref": "#my_bar" }
  ]
}

result:
{
  "valid": false,
  "errors": [
    {
      "instanceLocation": "#",
      "keywordLocation": "#/allOf/0/$ref/const",
      "absoluteKeywordLocation": "http://localhost:4242#/$defs/foo/const",
      "error": "value does not match"
    },
    {
      "instanceLocation": "#",
      "keywordLocation": "#/allOf/1/$ref/not",
      "absoluteKeywordLocation": "http://localhost:4242#/$defs/bar/not",
      "error": "subschema is valid"
    },
    {
      "instanceLocation": "#",
      "keywordLocation": "#/allOf",
      "absoluteKeywordLocation": "http://localhost:4242#/allOf",
      "error": "subschemas 0, 1 are not valid"
    }
  ]
}

So, the interesting thing here is that there are multiple errors generated on the far side of $refs where the URI being referenced has a plain-name fragment, and at least one level below the position where the identifier changes. We need to generate an error at this location, but we can't specify anything relative to the plain-name fragment, so we have to revert to the previous canonical uri to use as the base. So which uri is canonial for these locations? Is it possible to have more than one? Which one(s) should we hang on to as we traverse the schema, for the purpose of generating annotations and errors against that location? If an error is generated right at the location with the $anchor, should we use its uri as the location of the error, even though an error one level below it doesn't use that uri? Would that be confusing?

handrews · 2020-05-23T04:09:51Z

@karenetheridge

The "canonical" link relation type is defined in RFC 6596, and our usage should be roughly compatible with that. It is linked under § 8.2.2 The "id" keyword in the spec.

Your example definitely shows that absoluteKeywordLocation can't use a plain name fragment, at least not unless we then add a separate relative JSON pointer, and I do not want to go there.

So the answer to the question "what form should be used for consistent reporting of locations, including keyword locations that are not themselves schema objects" is clearly "Use the JSON Pointer from the nearest $id (or appropriate fallback base URI).

The other question around how we talk about canonical URIs is in that they MUST be supported, while non-canonical URIs, at least those involving JSON Pointers from other base URIs, MAY be supported, but don't have to be.

However, plain name fragments MUST be supported, so maybe it's just a case of cleaning up the language there. And also around the empty fragment JSON Pointer fragment, which also MUST be supported even if the, um... most canonical? URI for schema resource roots is one without a fragment at all.

ssilverman · 2020-05-23T18:49:20Z

@handrews Apologies for the nitpick: did you mean to use "are" and not "or" in the issue title? I'm trying to understand it precisely.

ssilverman · 2020-05-24T18:24:37Z

So the answer to the question "what form should be used for consistent reporting of locations, including keyword locations that are not themselves schema objects" is clearly "Use the JSON Pointer from the nearest $id (or appropriate fallback base URI).

I'm just ruminating on the example given above. Given these four absolute locations:

http://localhost:4242#/$defs/foo/const
http://localhost:4242#/$defs/bar/not
http://localhost:4242#/$defs/foo
http://localhost:4242#/$defs/bar

Without saying anything about being canonical, I suppose they could also be expressed as:

http://localhost:4242#my_foo/const
http://localhost:4242#my_bar/not
http://localhost:4242#my_foo
http://localhost:4242#my_bar

I'm just trying to wrap my head around @handrews's comments above and comparing these lists is helpful to me. It also makes sense why "use the nearest $id" might be good language.

handrews · 2020-05-24T19:47:30Z

@ssilverman no, your second block of URIs is invalid. You cannot mix plain-name and JSON Pointer syntax.

ssilverman · 2020-05-24T19:57:45Z

Ok, thanks for clarifying that.

ssilverman · 2020-05-24T19:59:00Z

So what about “or” vs. “are” in the title?

see the examples in https://json-schema.org/draft/2019-09/json-schema-core.html#rfc.section.10.4.2 but also see json-schema-org/json-schema-spec#937 - instance_location and keyword_location will always be json pointers - absolute_keyword_location will always be an absolute URI or URI reference, when defined Luckily the published schema at https://json-schema.org/draft/2019-09/output/schema will still validate our output, because a bare json pointer looks like a uri reference.

awwright · 2020-05-26T22:12:40Z

"canonical" as opposed to what? If we're talking about a link relation to apply to an $id keyword, I think I brought up an issue where rel=self makes more sense than rel=canonical.

karenetheridge · 2020-05-26T22:57:34Z

Canonical in this case means the fully-resolved uri associated with the current subschema location, derived from the nearest $id (as there could be multiple $ids up the heirarchy that would allow the current location to be referenced from a number of different uris).

This SHOULD be an absolute uri, so long as there is an absolute uri provided as the base uri for the entire document, or there is an absolute uri as an $id somewhere up along the way. It will never contain a plain-name fragment (as per discussions above in this thread), but it could contain a json pointer fragment if this exact location doesn't have an $id.

Hopefully I didn't get that too wrong. I'm at present in the middle of implementing $id and $anchor logic in my implementation so I'm up to my eyeballs in URI terminology :)

handrews · 2020-05-26T23:20:50Z

@awwright Yeah I was trying to remember that discussion.

IIRC, I liked "canonical" because there was some situation where "self" and "canonical" diverged and I thought canonical made more sense, but I can't remember what it was, or find where that discussion happened (probably slack).

karenetheridge · 2020-05-26T23:23:58Z

maybe we need a glossary :D I've been confused a few times by what "absolute" refers to, since an absolute URI in the RFC is one with a scheme and without a fragment, but we sling around absolute uris with fragments (either plain-name or json pointers) all the time.

Oh yeah, "canonical" also refers to the cleaning up of paths like "foo/../bar".

handrews · 2020-05-26T23:47:05Z

@karenetheridge we refer to absolute keyword location which may be identified by a URI with a fragment, but we should not refer to absolute URIs with fragments. Any such wording should be fixed.

handrews · 2020-05-26T23:48:43Z

@karenetheridge for a more general specification work glossary, I'd recommend using / contributing to http://webconcepts.info/

Most of this stuff is not JSON Schema-specific so we shouldn't make yet another glossary. Stuff that is JSON Schema-specific should go in our specs.

handrews · 2020-05-26T23:49:33Z

@ssilverman "are" fixed thanks

handrews · 2020-05-27T00:10:21Z

OK, digging into RFC 6596, I believe the mental model was an extension of the situation when you fetch a schema document with one URI, but it provides a different one with $id.

Essentially, $id there means "the URI you used is not the ideal one, use this one instead". So in this bit from RFC 6596, the request-URI is the "context resource URI" and the $id URI is the target resource one:

In regard to the link relation type, "canonical" can be described
informally as the author's preferred version of a resource. More
formally, the canonical link relation specifies the preferred IRI
from a set of resources that return the context IRI's content in
duplicated form. Once specified, applications such as search engines
can focus processing on the canonical, and references to the context
(referring) IRI can be updated to reference the target (canonical)
IRI.

I think I went from the author's preferred version of a resource and looking at $id as the author's expression of that.

Extending the request-URI mental model to a multi-resource JSON Schema document, a URI using a JSON Pointer relative to the document root for an embedded resource is the "request URI", and the embedded $id is the author telling you what the preferred URI is.

Of course, as @awwright observed, it's not uncommon for resources to include a "self" link. That link relation type is inherited from the Atom RFC, which initially established the IANA registry of link relation types. As far as I can tell, the entire specification for "self" is:

The value "self" signifies that the IRI in the value of the href
attribute identifies a resource equivalent to the containing
element.

Which overlaps with but is more general than "canonical".

I recall feeling moderately strongly that "canonical" was better, but can no longer remember quite why.

Let's do this. Whatever we call these things, we need a decision about how to present plain name fragment URIs. @awwright @karenetheridge and/or @ssilverman, if you want to argue for "self", let's make a separate issue for that, and let this issue focus on the original point. If we need to wait on the other issue, that's fine.

see the examples in https://json-schema.org/draft/2019-09/json-schema-core.html#rfc.section.10.4.2 but also see json-schema-org/json-schema-spec#937 - instance_location and keyword_location will always be json pointers - absolute_keyword_location will always be an absolute URI or URI reference, when defined Luckily the published schema at https://json-schema.org/draft/2019-09/output/schema will still validate our output, because a bare json pointer looks like a uri reference.

jdesrosiers · 2020-05-28T19:00:05Z

It's taken me a while to get to this one, and in the meantime, @handrews, you've come to the conclusion about canonical I was going to suggest. If a schema is retrieved from https://example.com/schema1 but it's $id is https://example.com/schema2, the $id is equivalent to rel="canonical" asserting that the $id URI is preferred over the URI used to retrieve the resource.

However, that doesn't seem to be the way it is used in the spec. Canonical is all about choosing between valid alternatives, but the term is often used outside of the context of multiple URI options. That's at best confusing.

Another issue is that people read the spec and think that absolute URI is part of the definition of canonical, when it's actually just part of the definition of $id and orthogonal to canonical.

The biggest issue is that the spec sometimes speaks of non-canonical URIs as potentially not supported. That's certainly not what canonical means. Canonical is about choosing between valid options. If they weren't valid, they wouldn't even be considered.

Since canonical is about selecting from equally valid URIs, I'm not sure there's a good reason for the spec to decide which URI should be canonical. That can be left up to implementations to decide without compromising interoperability.

In my implementation I have a schema data structure and you can ask it what it's URI is. Sometimes I need to choose between multiple options, but it shouldn't matter if my choice is different than someone else's if both URIs point to the same thing. For the record, my implementation chooses $id over the retrieving URL as canonical and it chooses JSON Pointer fragments over plain-name fragments as canonical.

karenetheridge · 2020-05-30T04:06:22Z

Another issue is that people read the spec and think that absolute URI is part of the definition of canonical, when it's actually just part of the definition of $id and orthogonal to canonical.

Something on my todo list for when the spec nears completion is to look over all of the uses of absolute URIs, relative URIs, URI references etc and look for inconsistencies or confusing wording. URI resolution is fairly straightforward, but you have to understand what bits refer to what to apply it properly :)

or the record, my implementation chooses $id over the retrieving URL as canonical and it chooses JSON Pointer fragments over plain-name fragments as canonical.

I agree with this as well.

handrews · 2020-05-30T07:34:24Z

@jdesrosiers I could have sworn I wrote a long response to you yesterday, but it's not here. I must have not hit the comment button and then closed the window :/

I'll have to remember what I meant to say and will get back to this soon.

Relequestual · 2020-09-25T21:24:43Z

The biggest issue is that the spec sometimes speaks of non-canonical URIs as potentially not supported. That's certainly not what canonical means. Canonical is about choosing between valid options. If they weren't valid, they wouldn't even be considered. - @jdesrosiers

iirc this was to allow a simpler approach to resolving references and provide a standard best practice approach.

I believe we wanted to say, when there are multiple possible URIs to a given target, THIS is the one you should use. Canonical may not be the best word to describe that URI, but naming things is hard, and canonical is a familiar term (at least to me).

Specifically, for example, like this...

json-schema-spec/jsonschema-core.xml

Lines 1764 to 1786 in 0003dbb

    
                               <figure> 
        
                                   <preamble> 
        
                                       Consider the following schema document that contains another 
        
                                       schema resource embedded within it: 
        
                                   </preamble> 
        
                                   <artwork> 
        
           <![CDATA[ 
        
           { 
        
               "$id": "https://example.com/foo", 
        
               "items": { 
        
                   "$id": "https://example.com/bar", 
        
                   "additionalProperties": { } 
        
               } 
        
           } 
        
           ]]> 
        
                                   </artwork> 
        
                                   <postamble> 
        
                                       The URI "https://example.com/foo#/items/additionalProperties" 
        
                                       points to the schema of the "additionalProperties" keyword in 
        
                                       the embedded resource.  The canonical URI of that schema, however, 
        
                                       is "https://example.com/bar#/additionalProperties". 
        
                                   </postamble> 
        
                               </figure>

Now, it's not to say that you cannot fully implementat and support both URIs, but in order to make implementation requirements simpler, and to help schema authors (and consumers) avoid ambiguity, canonical you can rely on to always work.

For quick reference, here's the section of the spec we're talking about...

json-schema-spec/jsonschema-core.xml

Lines 1821 to 1840 in 0003dbb

    
           <t> 
        
               An implementation MAY choose not to support addressing schemas 
        
               by non-canonical URIs. As such, it is RECOMMENDED that schema authors only 
        
               use canonical URIs, as using non-canonical URIs may reduce 
        
               schema interoperability. 
        
               <cref> 
        
                   This is to avoid requiring implementations to keep track of a whole 
        
                   stack of possible base URIs and JSON Pointer fragments for each, 
        
                   given that all but one will be fragile if the schema resources 
        
                   are reorganized.  Some have argued that this is easy so there is 
        
                   no point in forbidding it, while others have argued that it complicates 
        
                   schema identification and should be forbidden.  Feedback on this 
        
                   topic is encouraged. 
        
               </cref> 
        
           </t> 
        
           <t> 
        
               Further examples of such non-canonical URIs, as well as the appropriate 
        
               canonical URIs to use instead, are provided in appendix 
        
               <xref target="idExamples" format="counter"></xref>. 
        
           </t>

URIs with plain name fragments aren't discussed as canonical or non-canonical - @handrews

I'm not sure that's accurate...
Appendix A:
Schema of...

{
    "$id": "https://example.com/root.json",
    "$defs": {
        "A": { "$anchor": "foo" },

Followed by... (note line 3301)

json-schema-spec/jsonschema-core.xml

Lines 3298 to 3308 in 0003dbb

    
           <t hangText="#/$defs/A"> 
        
               <list> 
        
                   <t hangText="base URI">https://example.com/root.json</t> 
        
                   <t hangText="canonical URI with plain fragment"> 
        
                       https://example.com/root.json#foo 
        
                   </t> 
        
                   <t hangText="canonical URI with pointer fragment"> 
        
                       https://example.com/root.json#/$defs/A 
        
                   </t> 
        
               </list> 
        
           </t>

So to me, it looks like we've said there are multiple valid canonical URIs... and that kinda goes against the naming "canonical" in this instance.

handrews · 2020-09-26T22:10:45Z

@jdesrosiers regarding:

The biggest issue is that the spec sometimes speaks of non-canonical URIs as potentially not supported. That's certainly not what canonical means. Canonical is about choosing between valid options. If they weren't valid, they wouldn't even be considered.

The mental model here came from when you have something like (in YAML because I'm lazy today):

$id: https://example.com/schema/foo
$defs:
    bar
        $id: https://example.com/schema/bar
        type: object
        properties:
            biz: {}

So some implementation would support addressing the biz schema object as https://example.com/schema/foo#/$defs/bar/properties/biz, which would function as the request URI. However, because of the $id under bar, the canonical URI would be https://example.com/schema/bar#/properties/biz.

That fits the request vs canonical model just fine.

Separately, it is not necessary to support https://example.com/schema/foo#/$defs/bar/properties/biz at all. This is one reason why it's not the canonical URI (the other being that it's annoying to keep track of all of the different JSON Pointer fragments from different possible base URIs, or at least I thought it was annoying when I implemented it).

So really, it's "these URIs are not canonical because they might not even work" more than "these URIs might not work because they are non-canonical." Although it is a bit of a chicken-and-egg philosophical problem and maybe there's a better option.

handrews · 2020-09-26T22:18:08Z

What I would like to end up with is this, and I'm flexible on the terminology, except that if there is no clear consensus it stays as-is:

Addressing the root schema object of a resource using an absolute-URI (no fragment at all) is always valid
Addressing a schema object with a plain name fragment where possible is always valid
Addressing a schema object with a JSON Pointer relative to the nearest $id is always valid
Addressing a schema object with a JSON Pointer relative to a more distant $id may or may not work (should not be relied upon, except in closed systems where the implementation is known etc.)
.
Using a absolute-URI is preferred when the intention is to address the entire resource
Using a plain-name fragment is preferred when treating the internal structure of the resource as an encapsulated implementation detail
Using a JSON Pointer fragment relative to the nearest $id is preferred when the referrer can reasonably expect to have up-to-date knowledge of the internal structure (and may be the only option)
Using a JSON Pointer fragment relative to a more distant $id is never preferred

I am open to suggestion on how to best communicate the above.

jdesrosiers · 2020-09-27T23:50:27Z

I think we're all on the same page about how to use canonical.

I believe we wanted to say, when there are multiple possible URIs to a given target, THIS is the one you should use.

Yep. That's exactly the kind of thing "canonical" should be used for.

So to me, it looks like we've said there are multiple valid canonical URIs... and that kinda goes against the naming "canonical" in this instance.

Agreed. That sounds wrong.

So really, it's "these URIs are not canonical because they might not even work" more than "these URIs might not work because they are non-canonical."

Yep. Our only difference was that I was thinking of those URIs not as "they might not work", but as "invalid" and I find it awkward to assign a canonical status to something that isn't even a valid alternative. If the URI "might" work, then I agree that it can be considered non-canonical.

@handrews I agree with your list in the previous comment. The only thing I'd change is that I wouldn't say anything about preference. That kind of thing seems more the domain of a style guide. URIs with fragments that cross $id boundaries are specified to be undefined. To me that means "don't do that", not "we prefer you didn't". Since pointers crossing $id boundaries is no longer a thing, the only cases where we multiple URIs that can point to same document are (1) $id vs the URL used to fetch the schema and (2) a plain-name fragment vs pointer. They're equally valid and there's no need for the spec to choose a favorite.

In my implementation's documentation, I use the terms "internal identifier" and "external identifier". You can retrieve the schema using either one, but if you ask the schema for it's URI, it will tell you the "internal identifier". So, "internal"/"external" is one way to explain it. Another way to describe it could be as similar to the "base" tag in HTML where all references are resolved against the base rather than the URL used to retrieve the document.

amosonn · 2020-10-27T17:05:38Z

I hope this is a good place for this question, but what about $id which is a partial URI but not a fragment, i.e."$id": "schemaFoo.json"? Where the assumption is that it is resolved to an absolute URI, using as the base URI the one on which this schema was retrieved. The main use case is to allow grouping several schemas (in separate documents) under the same path, but where the base URI is not known before hand (or can change). This can be useful for storing them on a file system, or on a server installation which happens on premise and therefore doesn't have a fixed domain name.

Another use-case for this is multiple schemas within a single document. Admittedly this use-case is also supported by fragments, as discussed here, but will make later separating them into multiple documents harder.

As far as I know, there are at least some implementations which already support something like this.

jdesrosiers · 2020-10-27T22:28:00Z

what about $id which is a partial URI but not a fragment, i.e."$id": "schemaFoo.json"? Where the assumption is that it is resolved to an absolute URI, using as the base URI the one on which this schema was retrieved.

Yes, that's the way $id works. To use the terminology I used in my previous comment, the external identifier is an absolute URI used to retrieve the schema (or the internal identifier of the parent schema if it's an embedded schema) and the internal identifier is the $id resolved against the external identifier. If you don't have an external identifier then the $id can't be a relative URI because the internal URI will not resolve to be absolute.

As far as I know, there are at least some implementations which already support something like this.

If you find an implementation that doesn't work like this for embedded schemas, it's a bug. However, not all implementations allow you to specify an external identifier for a root schema. That's not a bug, just optional behavior that's not supported. If the implementation does allow you the specify an external identifier, but doesn't use it, then it would be a bug.

Relequestual · 2020-11-27T22:57:21Z

I don't believe there are any functional changes to be had as a result of resolving this issue, so I'd like to punt it to next draft in order to enable the publication of 2020 draft sooner.

handrews · 2021-05-08T03:53:20Z

As far as the original question posed by this issue, @jdesrosiers and @karenetheridge say they use the JSON Pointer forms as canonical, and not the plain name fragment. That is good enough for me! A clarification PR of some sort is probably in order.

If there are other debates that need to continue (e.g. is "canonical" even the term we want?) I think those should move out to their own issues.

Fixes issue json-schema-org#937, clarifying a number of other things along the way. While it touches a fair number of lines, I'm fairly sure that it doesn't anything about conformance. After spending more time reading various writings on the concept of the "canonical" URI for a resource, and reviewing our language, I came to the following conclusions: * canonical URIs only make sense at the whole-resource scope * A URI with a fragment is neither canonical nor non-canonical * It makes more sense to talk about fragments w.r.t. canonical URIs * Our language was sufficiently confusing that going this way seems fine. As part of this, I fixed an outright incorrect statement that identifier keywords set canonical URIs. Since there is only one canonical URI and a single schema object could contain three ($id, $anchor, $dynamicAnchor) or more identifier keywords, this statement is clearly a bug. These keywords assign URIs, but only $id assigns a canonical one. I revamped a lot of wording in descriptions and examples to hopefully be more precise. I separated the discussion of the empty fragment in $id from the main paragraph of its functionality, and clarified that this is talking about a media-type-specific semantic equivalence, and is not asserting that RFC 3986 normalization applies to fragments (this has been a point of confusion).

Relequestual · 2022-01-19T14:53:39Z

I don't believe we have reached consensus.
We've done a lot of sidetracking and I'm close to opening a new issue for focus.
Let's revist the opening comment...

URIs with non-empty JSON Pointer fragment URIs, relative to the nearest base URI, are considered canonical.

URIs with non-empty JSON Pointer fragment URIs, relative to a more distant base URI, are considered non-canonical and are not guaranteed to work.

This was the main point of the canonical URI stuff in 2019-09.

URIs with plain name fragments aren't discussed as canonical or non-canonical

However, they only work with the nearest base URI so there's never any ambiguity

Therefore, plain name fragment URIs should always work.

The part in bold above, we are not communicating clearly enough.
We need to make the two sub points fully explained in the spec.

In 2020-12, we do detail these facts, even if only in the form of showing multiple canonical URI for a path as part of Appendix A. See: https://json-schema.org/draft/2020-12/json-schema-core.html#rfc.appendix.A


{
    "$id": "https://example.com/root.json",
    "$defs": {
        "A": { "$anchor": "foo" },
...

#/$defs/A

    base URI
        https://example.com/root.json
    canonical URI with plain fragment
        https://example.com/root.json#foo 
    canonical URI with pointer fragment
        https://example.com/root.json#/$defs/A

I think we all agree, when we say "canonical URI" we mean ONE URI.

I feel like I got lost a little when @karenetheridge (correctly) pointed out every location in a schema has a canonical URI.

Remember, the issue was about URIs which crossed resource boundaries in the same document (Schemas in which $id was used in the non-root schema object).

json-schema-spec/jsonschema-core.xml

Lines 1820 to 1823 in 117c05e

    
           An implementation MAY choose not to support addressing schemas 
        
           by non-canonical URIs. As such, it is RECOMMENDED that schema authors only 
        
           use canonical URIs, as using non-canonical URIs may reduce 
        
           schema interoperability.

What if we were to change this phrasing to something like...

An implementation MAY choose not to support addressing schema resources (or a schema resource's subschemas) by non-canonical URI parts (which excludes any fragment). As such, it is RECOMMENDED that schema authors only use canonical schema resource URIs, as using non-canonical schema resource URIs may reduce schema interoperability.

The issue is with addressing locations which have a defined $id, making it a schema resource.
The canonical URI for a schema resource will always be fragmentless.
Fragments added after the canonical URI of a schema resource should not effect the canonical nature of the URI.

In fact, RFC6596, The Canonical Link Relation, which we reference for the definition of canonical, only describes in terms of resources. The fragments are attched the the resource's canonical URI to provide in-resource location.

It was said in #1104 that...

canonical URIs only make sense at the whole-resource scope

A URI with a fragment is neither canonical nor non-canonical

It makes more sense to talk about fragments w.r.t. canonical URIs

Our language was sufficiently confusing that going this way seems fine.

It has taken me up till now to understand the above and draw the same conclusion.

I'm going to focus my effots to modify language to reflect this as a PR.

As an aside... Although I feel this may be out of scope for this issue...

Picking up regarding #1104 and a thread of comments on a review.

zero or more other URIs that MAY also be used to refer to it (which some implementations may or may not choose to support).

I dislike this phrasing. I think it's backwards. It makes it sound like this is an optional feature that some will choose to support and others will not. It's not a feature. It's always wrong. Just because it MAY happen to work in some implementations doesn't mean it's not always wrong.
@jdesrosiers

We are talking about the lines quoted previously above, regarding what an implementation may choose not to support.

@karenetheridge @jdesrosiers, Would you be happy with something that reworked this to make addressing schema resources with non-canonical URIs something where the "behaviour is undefined"? I would add a CREF to note that the reason it might still work is due to the nature of using JSON Path, it's likely in most cases easy to resolve the resulting location?

(Personally, I don't see why implementations have such a hard time with this issue specifically. I haven't implemented JSON Schema myself, but surley you take the JSON at a location, apply the JSON Path, and you have a new location. I'd be interested to hear what I'm missing... but not in this Issue, please.)

jdesrosiers · 2022-01-20T03:57:44Z

Would you be happy with something that reworked this to make addressing schema resources with non-canonical URIs something where the "behaviour is undefined"?

For the patch release, anything is an improvement, but yes I would like to see it described as "undefined". For the future, we need to stop using "canonical" in the spec. It's not the right abstraction. It's confusing, complicated, and I think it misses the point. An embedded schema is it's own independent schema distinct from it's parent (think iframe). A pointer can only point to a location within a single schema resource (I can't craft an xpath to point to a location within an iframe). That's the whole concept.

I don't see why implementations have such a hard time with this issue

It's not necessarily hard, it's just unnecessary complexity. It's much simpler to not have to track base URI and dialect changes depending on where you are in the schema. With strict boundaries, every schema has one base URI and one dialect no matter where you are. I'm lazy, I don't want to write code to track those changes when there's a neat and simple alternative conceptual model that doesn't require additional code and supports everything schema authors need. I just break down compound schemas when they are loaded and then I don't have to worry about anything changing. It's not even extra work because I have to break down the compound schema anyway for validation against the meta-schema.

Attempt to resolve json-schema-org#937 Add note and cref in appendix A clarifying that we intended to define a URI phrasing which would avoid the requirement to allow for location shadowing in implementations, as this is tricky. Clarifying that plain name fragments should always be supported, and that they only can work in relation to the base URI of the Schema Resource. Otherwise there could be duplicate plain name fragments and addressing wouldn't work

Relequestual · 2022-02-22T13:30:26Z

Closed #1104
in favour of #1192
in attempt to resolve this issue.

…cal (#1192) * Clarify that plain name fragments are neither canonical or non-canonical Attempt to resolve #937 Add note and cref in appendix A clarifying that we intended to define a URI phrasing which would avoid the requirement to allow for location shadowing in implementations, as this is tricky. Clarifying that plain name fragments should always be supported, and that they only can work in relation to the base URI of the Schema Resource. Otherwise there could be duplicate plain name fragments and addressing wouldn't work.

Relequestual · 2022-03-14T11:12:03Z

Closing this issue as #1192 was merged.
Please comment if you wish to re-open.

Fixes issue json-schema-org#937, clarifying a number of other things along the way. While it touches a fair number of lines, I'm fairly sure that it doesn't anything about conformance. After spending more time reading various writings on the concept of the "canonical" URI for a resource, and reviewing our language, I came to the following conclusions: * canonical URIs only make sense at the whole-resource scope * A URI with a fragment is neither canonical nor non-canonical * It makes more sense to talk about fragments w.r.t. canonical URIs * Our language was sufficiently confusing that going this way seems fine. As part of this, I fixed an outright incorrect statement that identifier keywords set canonical URIs. Since there is only one canonical URI and a single schema object could contain three ($id, $anchor, $dynamicAnchor) or more identifier keywords, this statement is clearly a bug. These keywords assign URIs, but only $id assigns a canonical one. I revamped a lot of wording in descriptions and examples to hopefully be more precise. I separated the discussion of the empty fragment in $id from the main paragraph of its functionality, and clarified that this is talking about a media-type-specific semantic equivalence, and is not asserting that RFC 3986 normalization applies to fragments (this has been a point of confusion).

Fixes issue #937, clarifying a number of other things along the way. While it touches a fair number of lines, I'm fairly sure that it doesn't anything about conformance. After spending more time reading various writings on the concept of the "canonical" URI for a resource, and reviewing our language, I came to the following conclusions: * canonical URIs only make sense at the whole-resource scope * A URI with a fragment is neither canonical nor non-canonical * It makes more sense to talk about fragments w.r.t. canonical URIs * Our language was sufficiently confusing that going this way seems fine. As part of this, I fixed an outright incorrect statement that identifier keywords set canonical URIs. Since there is only one canonical URI and a single schema object could contain three ($id, $anchor, $dynamicAnchor) or more identifier keywords, this statement is clearly a bug. These keywords assign URIs, but only $id assigns a canonical one. I revamped a lot of wording in descriptions and examples to hopefully be more precise. I separated the discussion of the empty fragment in $id from the main paragraph of its functionality, and clarified that this is talking about a media-type-specific semantic equivalence, and is not asserting that RFC 3986 normalization applies to fragments (this has been a point of confusion).

…cal (#1192) * Clarify that plain name fragments are neither canonical or non-canonical Attempt to resolve #937 Add note and cref in appendix A clarifying that we intended to define a URI phrasing which would avoid the requirement to allow for location shadowing in implementations, as this is tricky. Clarifying that plain name fragments should always be supported, and that they only can work in relation to the base URI of the Schema Resource. Otherwise there could be duplicate plain name fragments and addressing wouldn't work.

Fixes issue #937, clarifying a number of other things along the way. While it touches a fair number of lines, I'm fairly sure that it doesn't anything about conformance. After spending more time reading various writings on the concept of the "canonical" URI for a resource, and reviewing our language, I came to the following conclusions: * canonical URIs only make sense at the whole-resource scope * A URI with a fragment is neither canonical nor non-canonical * It makes more sense to talk about fragments w.r.t. canonical URIs * Our language was sufficiently confusing that going this way seems fine. As part of this, I fixed an outright incorrect statement that identifier keywords set canonical URIs. Since there is only one canonical URI and a single schema object could contain three ($id, $anchor, $dynamicAnchor) or more identifier keywords, this statement is clearly a bug. These keywords assign URIs, but only $id assigns a canonical one. I revamped a lot of wording in descriptions and examples to hopefully be more precise. I separated the discussion of the empty fragment in $id from the main paragraph of its functionality, and clarified that this is talking about a media-type-specific semantic equivalence, and is not asserting that RFC 3986 normalization applies to fragments (this has been a point of confusion).

handrews added the core label May 23, 2020

handrews added this to the draft-08-patch1 milestone May 23, 2020

handrews assigned karenetheridge, awwright and ssilverman May 23, 2020

handrews changed the title ~~Clarify whether plain name fragment URIs or canonical, and what it means if they are (or aren't)~~ Clarify whether plain name fragment URIs are canonical, and what it means if they are (or aren't) May 26, 2020

Relequestual added the Priority: High label Aug 8, 2020

Relequestual modified the milestones: draft-08-patch1 (draft 2020-NN), draft-next Nov 27, 2020

handrews added pr-available Priority: Medium and removed Priority: High labels May 8, 2021

handrews mentioned this issue May 14, 2021

Clarify various things about canonical URIs #1104

Closed

handrews modified the milestones: draft-next, draft-patch May 14, 2021

Relequestual mentioned this issue Feb 16, 2022

Remove the notion of "canonical URIs" in favour of boundaried schema resources #1183

Open

Relequestual mentioned this issue Feb 22, 2022

Clarify that plain name fragments are neither canonical or non-canonical #1192

Merged

Relequestual closed this as completed Mar 14, 2022

Relequestual mentioned this issue Mar 14, 2022

Clarify various things about canonical URIs #1196

Merged

wrrrg24 mentioned this issue Mar 28, 2024

further specify the format of iri-references throughout #1085

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify whether plain name fragment URIs are canonical, and what it means if they are (or aren't) #937

Clarify whether plain name fragment URIs are canonical, and what it means if they are (or aren't) #937

handrews commented May 23, 2020

karenetheridge commented May 23, 2020 •

edited

Loading

handrews commented May 23, 2020

ssilverman commented May 23, 2020 •

edited

Loading

ssilverman commented May 24, 2020 •

edited

Loading

handrews commented May 24, 2020

ssilverman commented May 24, 2020

ssilverman commented May 24, 2020

awwright commented May 26, 2020

karenetheridge commented May 26, 2020

handrews commented May 26, 2020

karenetheridge commented May 26, 2020

handrews commented May 26, 2020

handrews commented May 26, 2020

handrews commented May 26, 2020

handrews commented May 27, 2020

jdesrosiers commented May 28, 2020

karenetheridge commented May 30, 2020 •

edited

Loading

handrews commented May 30, 2020

Relequestual commented Sep 25, 2020

handrews commented Sep 26, 2020

handrews commented Sep 26, 2020

jdesrosiers commented Sep 27, 2020

amosonn commented Oct 27, 2020

jdesrosiers commented Oct 27, 2020

Relequestual commented Nov 27, 2020

handrews commented May 8, 2021

Relequestual commented Jan 19, 2022

jdesrosiers commented Jan 20, 2022

Relequestual commented Feb 22, 2022

Relequestual commented Mar 14, 2022

Clarify whether plain name fragment URIs are canonical, and what it means if they are (or aren't) #937

Clarify whether plain name fragment URIs are canonical, and what it means if they are (or aren't) #937

Comments

handrews commented May 23, 2020

karenetheridge commented May 23, 2020 • edited Loading

handrews commented May 23, 2020

ssilverman commented May 23, 2020 • edited Loading

ssilverman commented May 24, 2020 • edited Loading

handrews commented May 24, 2020

ssilverman commented May 24, 2020

ssilverman commented May 24, 2020

awwright commented May 26, 2020

karenetheridge commented May 26, 2020

handrews commented May 26, 2020

karenetheridge commented May 26, 2020

handrews commented May 26, 2020

handrews commented May 26, 2020

handrews commented May 26, 2020

handrews commented May 27, 2020

jdesrosiers commented May 28, 2020

karenetheridge commented May 30, 2020 • edited Loading

handrews commented May 30, 2020

Relequestual commented Sep 25, 2020

handrews commented Sep 26, 2020

handrews commented Sep 26, 2020

jdesrosiers commented Sep 27, 2020

amosonn commented Oct 27, 2020

jdesrosiers commented Oct 27, 2020

Relequestual commented Nov 27, 2020

handrews commented May 8, 2021

Relequestual commented Jan 19, 2022

jdesrosiers commented Jan 20, 2022

Relequestual commented Feb 22, 2022

Relequestual commented Mar 14, 2022

karenetheridge commented May 23, 2020 •

edited

Loading

ssilverman commented May 23, 2020 •

edited

Loading

ssilverman commented May 24, 2020 •

edited

Loading

karenetheridge commented May 30, 2020 •

edited

Loading