Skip to content

Clarify whether plain name fragment URIs are canonical, and what it means if they are (or aren't) #937

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
handrews opened this issue May 23, 2020 · 30 comments · Fixed by #1196

Comments

@handrews
Copy link
Contributor

There's a little confusion and ambiguity on this topic.

  • Absolute URIs (with no fragment, not even an empty one) are definitely canonical, and identify the complete schema resource.
    • This implies that the same URI with an empty JSON Pointer fragment is non-canonical, but it should work, and it would be bad if an implementation did not support it.
  • URIs with non-empty JSON Pointer fragment URIs, relative to the nearest base URI, are considered canonical.
  • URIs with non-empty JSON Pointer fragment URIs, relative to a more distant base URI, are considered non-canonical and are not guaranteed to work.
    • This was the main point of the canonical URI stuff in 2019-09.
  • URIs with plain name fragments aren't discussed as canonical or non-canonical
    • However, they only work with the nearest base URI so there's never any ambiguity
    • Therefore, plain name fragment URIs should always work.

Things to figure out:

  • Is "canonical" the right term here? IIRC @awwright thought it was not, but I had a vaguely coherent reason as to why it was (which I should look up) and I was burnt out and needed to publish it so we did that. Probably not the best process.
  • Should we be framing the requirement around what URIs MUST be supported and what MAY be supported differently, in order to capture the possibilities better?
  • Are there other use cases I'm missing here?
@karenetheridge
Copy link
Member

karenetheridge commented May 23, 2020

Is "canonical" the right term here?

Depends on how the term is going to be used :) I can give one example where a canonical URI that contains a plain-name fragment is not the most useful uri to be slinging around: when generating errors or annotations. I just ran into this today while implementing $anchor. Let me show an example that I'm using in my tests:

data: 1
schema:
{
  "$defs": {
    foo: {
      "$anchor": "my_foo",
      "const": "foo value"
    },
    "bar": {
      "$anchor": "my_bar",
      not: true
    },
  },
  "$id": "http://localhost:4242",
  "allOf": [
    { "$ref": "#my_foo" },
    { "$ref": "#my_bar" }
  ]
}

result:
{
  "valid": false,
  "errors": [
    {
      "instanceLocation": "#",
      "keywordLocation": "#/allOf/0/$ref/const",
      "absoluteKeywordLocation": "http://localhost:4242#/$defs/foo/const",
      "error": "value does not match"
    },
    {
      "instanceLocation": "#",
      "keywordLocation": "#/allOf/1/$ref/not",
      "absoluteKeywordLocation": "http://localhost:4242#/$defs/bar/not",
      "error": "subschema is valid"
    },
    {
      "instanceLocation": "#",
      "keywordLocation": "#/allOf",
      "absoluteKeywordLocation": "http://localhost:4242#/allOf",
      "error": "subschemas 0, 1 are not valid"
    }
  ]
}

So, the interesting thing here is that there are multiple errors generated on the far side of $refs where the URI being referenced has a plain-name fragment, and at least one level below the position where the identifier changes. We need to generate an error at this location, but we can't specify anything relative to the plain-name fragment, so we have to revert to the previous canonical uri to use as the base. So which uri is canonial for these locations? Is it possible to have more than one? Which one(s) should we hang on to as we traverse the schema, for the purpose of generating annotations and errors against that location? If an error is generated right at the location with the $anchor, should we use its uri as the location of the error, even though an error one level below it doesn't use that uri? Would that be confusing?

@handrews
Copy link
Contributor Author

@karenetheridge

The "canonical" link relation type is defined in RFC 6596, and our usage should be roughly compatible with that. It is linked under § 8.2.2 The "id" keyword in the spec.

Your example definitely shows that absoluteKeywordLocation can't use a plain name fragment, at least not unless we then add a separate relative JSON pointer, and I do not want to go there.

So the answer to the question "what form should be used for consistent reporting of locations, including keyword locations that are not themselves schema objects" is clearly "Use the JSON Pointer from the nearest $id (or appropriate fallback base URI).

The other question around how we talk about canonical URIs is in that they MUST be supported, while non-canonical URIs, at least those involving JSON Pointers from other base URIs, MAY be supported, but don't have to be.

However, plain name fragments MUST be supported, so maybe it's just a case of cleaning up the language there. And also around the empty fragment JSON Pointer fragment, which also MUST be supported even if the, um... most canonical? URI for schema resource roots is one without a fragment at all.

@ssilverman
Copy link
Member

ssilverman commented May 23, 2020

@handrews Apologies for the nitpick: did you mean to use "are" and not "or" in the issue title? I'm trying to understand it precisely.

@ssilverman
Copy link
Member

ssilverman commented May 24, 2020

So the answer to the question "what form should be used for consistent reporting of locations, including keyword locations that are not themselves schema objects" is clearly "Use the JSON Pointer from the nearest $id (or appropriate fallback base URI).

I'm just ruminating on the example given above. Given these four absolute locations:

  1. http://localhost:4242#/$defs/foo/const
  2. http://localhost:4242#/$defs/bar/not
  3. http://localhost:4242#/$defs/foo
  4. http://localhost:4242#/$defs/bar

Without saying anything about being canonical, I suppose they could also be expressed as:

  1. http://localhost:4242#my_foo/const
  2. http://localhost:4242#my_bar/not
  3. http://localhost:4242#my_foo
  4. http://localhost:4242#my_bar

I'm just trying to wrap my head around @handrews's comments above and comparing these lists is helpful to me. It also makes sense why "use the nearest $id" might be good language.

@handrews
Copy link
Contributor Author

@ssilverman no, your second block of URIs is invalid. You cannot mix plain-name and JSON Pointer syntax.

@ssilverman
Copy link
Member

Ok, thanks for clarifying that.

@ssilverman
Copy link
Member

So what about “or” vs. “are” in the title?

karenetheridge added a commit to karenetheridge/JSON-Schema-Modern that referenced this issue May 26, 2020
see the examples in https://json-schema.org/draft/2019-09/json-schema-core.html#rfc.section.10.4.2
but also see json-schema-org/json-schema-spec#937

- instance_location and keyword_location will always be json pointers
- absolute_keyword_location will always be an absolute URI or URI reference, when defined

Luckily the published schema at https://json-schema.org/draft/2019-09/output/schema
will still validate our output, because a bare json pointer looks like a uri
reference.
@awwright
Copy link
Member

"canonical" as opposed to what? If we're talking about a link relation to apply to an $id keyword, I think I brought up an issue where rel=self makes more sense than rel=canonical.

@karenetheridge
Copy link
Member

Canonical in this case means the fully-resolved uri associated with the current subschema location, derived from the nearest $id (as there could be multiple $ids up the heirarchy that would allow the current location to be referenced from a number of different uris).

This SHOULD be an absolute uri, so long as there is an absolute uri provided as the base uri for the entire document, or there is an absolute uri as an $id somewhere up along the way. It will never contain a plain-name fragment (as per discussions above in this thread), but it could contain a json pointer fragment if this exact location doesn't have an $id.

Hopefully I didn't get that too wrong. I'm at present in the middle of implementing $id and $anchor logic in my implementation so I'm up to my eyeballs in URI terminology :)

@handrews
Copy link
Contributor Author

@awwright Yeah I was trying to remember that discussion.

IIRC, I liked "canonical" because there was some situation where "self" and "canonical" diverged and I thought canonical made more sense, but I can't remember what it was, or find where that discussion happened (probably slack).

@karenetheridge
Copy link
Member

maybe we need a glossary :D I've been confused a few times by what "absolute" refers to, since an absolute URI in the RFC is one with a scheme and without a fragment, but we sling around absolute uris with fragments (either plain-name or json pointers) all the time.

Oh yeah, "canonical" also refers to the cleaning up of paths like "foo/../bar".

@handrews
Copy link
Contributor Author

@karenetheridge we refer to absolute keyword location which may be identified by a URI with a fragment, but we should not refer to absolute URIs with fragments. Any such wording should be fixed.

@handrews
Copy link
Contributor Author

@karenetheridge for a more general specification work glossary, I'd recommend using / contributing to http://webconcepts.info/

Most of this stuff is not JSON Schema-specific so we shouldn't make yet another glossary. Stuff that is JSON Schema-specific should go in our specs.

@handrews handrews changed the title Clarify whether plain name fragment URIs or canonical, and what it means if they are (or aren't) Clarify whether plain name fragment URIs are canonical, and what it means if they are (or aren't) May 26, 2020
@handrews
Copy link
Contributor Author

@ssilverman "are" fixed thanks

@handrews
Copy link
Contributor Author

OK, digging into RFC 6596, I believe the mental model was an extension of the situation when you fetch a schema document with one URI, but it provides a different one with $id.

Essentially, $id there means "the URI you used is not the ideal one, use this one instead". So in this bit from RFC 6596, the request-URI is the "context resource URI" and the $id URI is the target resource one:

In regard to the link relation type, "canonical" can be described
informally as the author's preferred version of a resource. More
formally, the canonical link relation specifies the preferred IRI
from a set of resources that return the context IRI's content in
duplicated form. Once specified, applications such as search engines
can focus processing on the canonical, and references to the context
(referring) IRI can be updated to reference the target (canonical)
IRI.

I think I went from the author's preferred version of a resource and looking at $id as the author's expression of that.

Extending the request-URI mental model to a multi-resource JSON Schema document, a URI using a JSON Pointer relative to the document root for an embedded resource is the "request URI", and the embedded $id is the author telling you what the preferred URI is.

Of course, as @awwright observed, it's not uncommon for resources to include a "self" link. That link relation type is inherited from the Atom RFC, which initially established the IANA registry of link relation types. As far as I can tell, the entire specification for "self" is:

The value "self" signifies that the IRI in the value of the href
attribute identifies a resource equivalent to the containing
element.

Which overlaps with but is more general than "canonical".

I recall feeling moderately strongly that "canonical" was better, but can no longer remember quite why.


Let's do this. Whatever we call these things, we need a decision about how to present plain name fragment URIs. @awwright @karenetheridge and/or @ssilverman, if you want to argue for "self", let's make a separate issue for that, and let this issue focus on the original point. If we need to wait on the other issue, that's fine.

karenetheridge added a commit to karenetheridge/JSON-Schema-Modern that referenced this issue May 27, 2020
see the examples in https://json-schema.org/draft/2019-09/json-schema-core.html#rfc.section.10.4.2
but also see json-schema-org/json-schema-spec#937

- instance_location and keyword_location will always be json pointers
- absolute_keyword_location will always be an absolute URI or URI reference, when defined

Luckily the published schema at https://json-schema.org/draft/2019-09/output/schema
will still validate our output, because a bare json pointer looks like a uri
reference.
@jdesrosiers
Copy link
Member

It's taken me a while to get to this one, and in the meantime, @handrews, you've come to the conclusion about canonical I was going to suggest. If a schema is retrieved from https://example.com/schema1 but it's $id is https://example.com/schema2, the $id is equivalent to rel="canonical" asserting that the $id URI is preferred over the URI used to retrieve the resource.

However, that doesn't seem to be the way it is used in the spec. Canonical is all about choosing between valid alternatives, but the term is often used outside of the context of multiple URI options. That's at best confusing.

Another issue is that people read the spec and think that absolute URI is part of the definition of canonical, when it's actually just part of the definition of $id and orthogonal to canonical.

The biggest issue is that the spec sometimes speaks of non-canonical URIs as potentially not supported. That's certainly not what canonical means. Canonical is about choosing between valid options. If they weren't valid, they wouldn't even be considered.

Since canonical is about selecting from equally valid URIs, I'm not sure there's a good reason for the spec to decide which URI should be canonical. That can be left up to implementations to decide without compromising interoperability.

In my implementation I have a schema data structure and you can ask it what it's URI is. Sometimes I need to choose between multiple options, but it shouldn't matter if my choice is different than someone else's if both URIs point to the same thing. For the record, my implementation chooses $id over the retrieving URL as canonical and it chooses JSON Pointer fragments over plain-name fragments as canonical.

@karenetheridge
Copy link
Member

karenetheridge commented May 30, 2020

Another issue is that people read the spec and think that absolute URI is part of the definition of canonical, when it's actually just part of the definition of $id and orthogonal to canonical.

Something on my todo list for when the spec nears completion is to look over all of the uses of absolute URIs, relative URIs, URI references etc and look for inconsistencies or confusing wording. URI resolution is fairly straightforward, but you have to understand what bits refer to what to apply it properly :)

or the record, my implementation chooses $id over the retrieving URL as canonical and it chooses JSON Pointer fragments over plain-name fragments as canonical.

I agree with this as well.

@handrews
Copy link
Contributor Author

@jdesrosiers I could have sworn I wrote a long response to you yesterday, but it's not here. I must have not hit the comment button and then closed the window :/

I'll have to remember what I meant to say and will get back to this soon.

@Relequestual
Copy link
Member

The biggest issue is that the spec sometimes speaks of non-canonical URIs as potentially not supported. That's certainly not what canonical means. Canonical is about choosing between valid options. If they weren't valid, they wouldn't even be considered. - @jdesrosiers

iirc this was to allow a simpler approach to resolving references and provide a standard best practice approach.

I believe we wanted to say, when there are multiple possible URIs to a given target, THIS is the one you should use. Canonical may not be the best word to describe that URI, but naming things is hard, and canonical is a familiar term (at least to me).

Specifically, for example, like this...

<figure>
<preamble>
Consider the following schema document that contains another
schema resource embedded within it:
</preamble>
<artwork>
<![CDATA[
{
"$id": "https://example.com/foo",
"items": {
"$id": "https://example.com/bar",
"additionalProperties": { }
}
}
]]>
</artwork>
<postamble>
The URI "https://example.com/foo#/items/additionalProperties"
points to the schema of the "additionalProperties" keyword in
the embedded resource. The canonical URI of that schema, however,
is "https://example.com/bar#/additionalProperties".
</postamble>
</figure>

Now, it's not to say that you cannot fully implementat and support both URIs, but in order to make implementation requirements simpler, and to help schema authors (and consumers) avoid ambiguity, canonical you can rely on to always work.

For quick reference, here's the section of the spec we're talking about...

<t>
An implementation MAY choose not to support addressing schemas
by non-canonical URIs. As such, it is RECOMMENDED that schema authors only
use canonical URIs, as using non-canonical URIs may reduce
schema interoperability.
<cref>
This is to avoid requiring implementations to keep track of a whole
stack of possible base URIs and JSON Pointer fragments for each,
given that all but one will be fragile if the schema resources
are reorganized. Some have argued that this is easy so there is
no point in forbidding it, while others have argued that it complicates
schema identification and should be forbidden. Feedback on this
topic is encouraged.
</cref>
</t>
<t>
Further examples of such non-canonical URIs, as well as the appropriate
canonical URIs to use instead, are provided in appendix
<xref target="idExamples" format="counter"></xref>.
</t>


URIs with plain name fragments aren't discussed as canonical or non-canonical - @handrews

I'm not sure that's accurate...
Appendix A:
Schema of...

{
    "$id": "https://example.com/root.json",
    "$defs": {
        "A": { "$anchor": "foo" },

Followed by... (note line 3301)

<t hangText="#/$defs/A">
<list>
<t hangText="base URI">https://example.com/root.json</t>
<t hangText="canonical URI with plain fragment">
https://example.com/root.json#foo
</t>
<t hangText="canonical URI with pointer fragment">
https://example.com/root.json#/$defs/A
</t>
</list>
</t>

So to me, it looks like we've said there are multiple valid canonical URIs... and that kinda goes against the naming "canonical" in this instance.

@handrews
Copy link
Contributor Author

@jdesrosiers regarding:

The biggest issue is that the spec sometimes speaks of non-canonical URIs as potentially not supported. That's certainly not what canonical means. Canonical is about choosing between valid options. If they weren't valid, they wouldn't even be considered.

The mental model here came from when you have something like (in YAML because I'm lazy today):

$id: https://example.com/schema/foo
$defs:
    bar
        $id: https://example.com/schema/bar
        type: object
        properties:
            biz: {}

So some implementation would support addressing the biz schema object as https://example.com/schema/foo#/$defs/bar/properties/biz, which would function as the request URI. However, because of the $id under bar, the canonical URI would be https://example.com/schema/bar#/properties/biz.

That fits the request vs canonical model just fine.

Separately, it is not necessary to support https://example.com/schema/foo#/$defs/bar/properties/biz at all. This is one reason why it's not the canonical URI (the other being that it's annoying to keep track of all of the different JSON Pointer fragments from different possible base URIs, or at least I thought it was annoying when I implemented it).

So really, it's "these URIs are not canonical because they might not even work" more than "these URIs might not work because they are non-canonical." Although it is a bit of a chicken-and-egg philosophical problem and maybe there's a better option.

@handrews
Copy link
Contributor Author

What I would like to end up with is this, and I'm flexible on the terminology, except that if there is no clear consensus it stays as-is:

  • Addressing the root schema object of a resource using an absolute-URI (no fragment at all) is always valid
  • Addressing a schema object with a plain name fragment where possible is always valid
  • Addressing a schema object with a JSON Pointer relative to the nearest $id is always valid
  • Addressing a schema object with a JSON Pointer relative to a more distant $id may or may not work (should not be relied upon, except in closed systems where the implementation is known etc.)
    .
  • Using a absolute-URI is preferred when the intention is to address the entire resource
  • Using a plain-name fragment is preferred when treating the internal structure of the resource as an encapsulated implementation detail
  • Using a JSON Pointer fragment relative to the nearest $id is preferred when the referrer can reasonably expect to have up-to-date knowledge of the internal structure (and may be the only option)
  • Using a JSON Pointer fragment relative to a more distant $id is never preferred

I am open to suggestion on how to best communicate the above.

@jdesrosiers
Copy link
Member

I think we're all on the same page about how to use canonical.

I believe we wanted to say, when there are multiple possible URIs to a given target, THIS is the one you should use.

Yep. That's exactly the kind of thing "canonical" should be used for.

So to me, it looks like we've said there are multiple valid canonical URIs... and that kinda goes against the naming "canonical" in this instance.

Agreed. That sounds wrong.

So really, it's "these URIs are not canonical because they might not even work" more than "these URIs might not work because they are non-canonical."

Yep. Our only difference was that I was thinking of those URIs not as "they might not work", but as "invalid" and I find it awkward to assign a canonical status to something that isn't even a valid alternative. If the URI "might" work, then I agree that it can be considered non-canonical.

@handrews I agree with your list in the previous comment. The only thing I'd change is that I wouldn't say anything about preference. That kind of thing seems more the domain of a style guide. URIs with fragments that cross $id boundaries are specified to be undefined. To me that means "don't do that", not "we prefer you didn't". Since pointers crossing $id boundaries is no longer a thing, the only cases where we multiple URIs that can point to same document are (1) $id vs the URL used to fetch the schema and (2) a plain-name fragment vs pointer. They're equally valid and there's no need for the spec to choose a favorite.

In my implementation's documentation, I use the terms "internal identifier" and "external identifier". You can retrieve the schema using either one, but if you ask the schema for it's URI, it will tell you the "internal identifier". So, "internal"/"external" is one way to explain it. Another way to describe it could be as similar to the "base" tag in HTML where all references are resolved against the base rather than the URL used to retrieve the document.

@amosonn
Copy link

amosonn commented Oct 27, 2020

I hope this is a good place for this question, but what about $id which is a partial URI but not a fragment, i.e."$id": "schemaFoo.json"? Where the assumption is that it is resolved to an absolute URI, using as the base URI the one on which this schema was retrieved. The main use case is to allow grouping several schemas (in separate documents) under the same path, but where the base URI is not known before hand (or can change). This can be useful for storing them on a file system, or on a server installation which happens on premise and therefore doesn't have a fixed domain name.

Another use-case for this is multiple schemas within a single document. Admittedly this use-case is also supported by fragments, as discussed here, but will make later separating them into multiple documents harder.

As far as I know, there are at least some implementations which already support something like this.

@jdesrosiers
Copy link
Member

what about $id which is a partial URI but not a fragment, i.e."$id": "schemaFoo.json"? Where the assumption is that it is resolved to an absolute URI, using as the base URI the one on which this schema was retrieved.

Yes, that's the way $id works. To use the terminology I used in my previous comment, the external identifier is an absolute URI used to retrieve the schema (or the internal identifier of the parent schema if it's an embedded schema) and the internal identifier is the $id resolved against the external identifier. If you don't have an external identifier then the $id can't be a relative URI because the internal URI will not resolve to be absolute.

As far as I know, there are at least some implementations which already support something like this.

If you find an implementation that doesn't work like this for embedded schemas, it's a bug. However, not all implementations allow you to specify an external identifier for a root schema. That's not a bug, just optional behavior that's not supported. If the implementation does allow you the specify an external identifier, but doesn't use it, then it would be a bug.

@Relequestual
Copy link
Member

I don't believe there are any functional changes to be had as a result of resolving this issue, so I'd like to punt it to next draft in order to enable the publication of 2020 draft sooner.

@handrews
Copy link
Contributor Author

handrews commented May 8, 2021

As far as the original question posed by this issue, @jdesrosiers and @karenetheridge say they use the JSON Pointer forms as canonical, and not the plain name fragment. That is good enough for me! A clarification PR of some sort is probably in order.

If there are other debates that need to continue (e.g. is "canonical" even the term we want?) I think those should move out to their own issues.

handrews added a commit to handrews/json-schema-spec that referenced this issue May 14, 2021
Fixes issue json-schema-org#937, clarifying a number of other things along the way.
While it touches a fair number of lines, I'm fairly sure that it
doesn't anything about conformance.

After spending more time reading various writings on the concept
of the "canonical" URI for a resource, and reviewing our language,
I came to the following conclusions:

* canonical URIs only make sense at the whole-resource scope
* A URI with a fragment is neither canonical nor non-canonical
* It makes more sense to talk about fragments w.r.t. canonical URIs
* Our language was sufficiently confusing that going this way seems fine.

As part of this, I fixed an outright incorrect statement that
identifier keywords set canonical URIs.  Since there is only
one canonical URI and a single schema object could contain
three ($id, $anchor, $dynamicAnchor) or more identifier keywords,
this statement is clearly a bug.  These keywords assign URIs,
but only $id assigns a canonical one.

I revamped a lot of wording in descriptions and examples to
hopefully be more precise.  I separated the discussion of
the empty fragment in $id from the main paragraph of its
functionality, and clarified that this is talking about a
media-type-specific semantic equivalence, and is not asserting
that RFC 3986 normalization applies to fragments (this has
been a point of confusion).
@handrews handrews modified the milestones: draft-next, draft-patch May 14, 2021
@Relequestual
Copy link
Member

I don't believe we have reached consensus.
We've done a lot of sidetracking and I'm close to opening a new issue for focus.
Let's revist the opening comment...

  • URIs with non-empty JSON Pointer fragment URIs, relative to the nearest base URI, are considered canonical.
  • URIs with non-empty JSON Pointer fragment URIs, relative to a more distant base URI, are considered non-canonical and are not guaranteed to work.
    • This was the main point of the canonical URI stuff in 2019-09.
  • URIs with plain name fragments aren't discussed as canonical or non-canonical
    • However, they only work with the nearest base URI so there's never any ambiguity
    • Therefore, plain name fragment URIs should always work.

The part in bold above, we are not communicating clearly enough.
We need to make the two sub points fully explained in the spec.

In 2020-12, we do detail these facts, even if only in the form of showing multiple canonical URI for a path as part of Appendix A. See: https://json-schema.org/draft/2020-12/json-schema-core.html#rfc.appendix.A


{
    "$id": "https://example.com/root.json",
    "$defs": {
        "A": { "$anchor": "foo" },
...
#/$defs/A

    base URI
        https://example.com/root.json
    canonical URI with plain fragment
        https://example.com/root.json#foo 
    canonical URI with pointer fragment
        https://example.com/root.json#/$defs/A 

I think we all agree, when we say "canonical URI" we mean ONE URI.

I feel like I got lost a little when @karenetheridge (correctly) pointed out every location in a schema has a canonical URI.

Remember, the issue was about URIs which crossed resource boundaries in the same document (Schemas in which $id was used in the non-root schema object).

An implementation MAY choose not to support addressing schemas
by non-canonical URIs. As such, it is RECOMMENDED that schema authors only
use canonical URIs, as using non-canonical URIs may reduce
schema interoperability.

What if we were to change this phrasing to something like...

An implementation MAY choose not to support addressing schema resources (or a schema resource's subschemas) by non-canonical URI parts (which excludes any fragment). As such, it is RECOMMENDED that schema authors only use canonical schema resource URIs, as using non-canonical schema resource URIs may reduce schema interoperability.

The issue is with addressing locations which have a defined $id, making it a schema resource.
The canonical URI for a schema resource will always be fragmentless.
Fragments added after the canonical URI of a schema resource should not effect the canonical nature of the URI.

In fact, RFC6596, The Canonical Link Relation, which we reference for the definition of canonical, only describes in terms of resources. The fragments are attched the the resource's canonical URI to provide in-resource location.

It was said in #1104 that...

  • canonical URIs only make sense at the whole-resource scope
  • A URI with a fragment is neither canonical nor non-canonical
  • It makes more sense to talk about fragments w.r.t. canonical URIs
  • Our language was sufficiently confusing that going this way seems fine.

It has taken me up till now to understand the above and draw the same conclusion.

I'm going to focus my effots to modify language to reflect this as a PR.


As an aside... Although I feel this may be out of scope for this issue...

Picking up regarding #1104 and a thread of comments on a review.

zero or more other URIs that MAY also be used to refer to it (which some implementations may or may not choose to support).

I dislike this phrasing. I think it's backwards. It makes it sound like this is an optional feature that some will choose to support and others will not. It's not a feature. It's always wrong. Just because it MAY happen to work in some implementations doesn't mean it's not always wrong.
@jdesrosiers

We are talking about the lines quoted previously above, regarding what an implementation may choose not to support.

@karenetheridge @jdesrosiers, Would you be happy with something that reworked this to make addressing schema resources with non-canonical URIs something where the "behaviour is undefined"? I would add a CREF to note that the reason it might still work is due to the nature of using JSON Path, it's likely in most cases easy to resolve the resulting location?

(Personally, I don't see why implementations have such a hard time with this issue specifically. I haven't implemented JSON Schema myself, but surley you take the JSON at a location, apply the JSON Path, and you have a new location. I'd be interested to hear what I'm missing... but not in this Issue, please.)

@jdesrosiers
Copy link
Member

Would you be happy with something that reworked this to make addressing schema resources with non-canonical URIs something where the "behaviour is undefined"?

For the patch release, anything is an improvement, but yes I would like to see it described as "undefined". For the future, we need to stop using "canonical" in the spec. It's not the right abstraction. It's confusing, complicated, and I think it misses the point. An embedded schema is it's own independent schema distinct from it's parent (think iframe). A pointer can only point to a location within a single schema resource (I can't craft an xpath to point to a location within an iframe). That's the whole concept.

I don't see why implementations have such a hard time with this issue

It's not necessarily hard, it's just unnecessary complexity. It's much simpler to not have to track base URI and dialect changes depending on where you are in the schema. With strict boundaries, every schema has one base URI and one dialect no matter where you are. I'm lazy, I don't want to write code to track those changes when there's a neat and simple alternative conceptual model that doesn't require additional code and supports everything schema authors need. I just break down compound schemas when they are loaded and then I don't have to worry about anything changing. It's not even extra work because I have to break down the compound schema anyway for validation against the meta-schema.

Relequestual added a commit to Relequestual/json-schema-spec that referenced this issue Feb 22, 2022
Attempt to resolve json-schema-org#937
Add note and cref in appendix A clarifying that we intended to define a URI phrasing which would avoid the requirement to allow for location shadowing in implementations, as this is tricky. Clarifying that plain name fragments should always be supported, and that they only can work in relation to the base URI of the Schema Resource. Otherwise there could be duplicate plain name fragments and addressing wouldn't work
@Relequestual
Copy link
Member

Closed #1104
in favour of #1192
in attempt to resolve this issue.

Relequestual added a commit that referenced this issue Mar 14, 2022
…cal (#1192)

* Clarify that plain name fragments are neither canonical or non-canonical
Attempt to resolve #937
Add note and cref in appendix A clarifying that we intended to define a URI phrasing which would avoid the requirement to allow for location shadowing in implementations, as this is tricky. Clarifying that plain name fragments should always be supported, and that they only can work in relation to the base URI of the Schema Resource. Otherwise there could be duplicate plain name fragments and addressing wouldn't work.
@Relequestual
Copy link
Member

Closing this issue as #1192 was merged.
Please comment if you wish to re-open.

Relequestual pushed a commit to Relequestual/json-schema-spec that referenced this issue Mar 14, 2022
Fixes issue json-schema-org#937, clarifying a number of other things along the way.
While it touches a fair number of lines, I'm fairly sure that it
doesn't anything about conformance.

After spending more time reading various writings on the concept
of the "canonical" URI for a resource, and reviewing our language,
I came to the following conclusions:

* canonical URIs only make sense at the whole-resource scope
* A URI with a fragment is neither canonical nor non-canonical
* It makes more sense to talk about fragments w.r.t. canonical URIs
* Our language was sufficiently confusing that going this way seems fine.

As part of this, I fixed an outright incorrect statement that
identifier keywords set canonical URIs.  Since there is only
one canonical URI and a single schema object could contain
three ($id, $anchor, $dynamicAnchor) or more identifier keywords,
this statement is clearly a bug.  These keywords assign URIs,
but only $id assigns a canonical one.

I revamped a lot of wording in descriptions and examples to
hopefully be more precise.  I separated the discussion of
the empty fragment in $id from the main paragraph of its
functionality, and clarified that this is talking about a
media-type-specific semantic equivalence, and is not asserting
that RFC 3986 normalization applies to fragments (this has
been a point of confusion).
Relequestual pushed a commit to Relequestual/json-schema-spec that referenced this issue Mar 14, 2022
Fixes issue json-schema-org#937, clarifying a number of other things along the way.
While it touches a fair number of lines, I'm fairly sure that it
doesn't anything about conformance.

After spending more time reading various writings on the concept
of the "canonical" URI for a resource, and reviewing our language,
I came to the following conclusions:

* canonical URIs only make sense at the whole-resource scope
* A URI with a fragment is neither canonical nor non-canonical
* It makes more sense to talk about fragments w.r.t. canonical URIs
* Our language was sufficiently confusing that going this way seems fine.

As part of this, I fixed an outright incorrect statement that
identifier keywords set canonical URIs.  Since there is only
one canonical URI and a single schema object could contain
three ($id, $anchor, $dynamicAnchor) or more identifier keywords,
this statement is clearly a bug.  These keywords assign URIs,
but only $id assigns a canonical one.

I revamped a lot of wording in descriptions and examples to
hopefully be more precise.  I separated the discussion of
the empty fragment in $id from the main paragraph of its
functionality, and clarified that this is talking about a
media-type-specific semantic equivalence, and is not asserting
that RFC 3986 normalization applies to fragments (this has
been a point of confusion).
Relequestual pushed a commit that referenced this issue Mar 14, 2022
Fixes issue #937, clarifying a number of other things along the way.
While it touches a fair number of lines, I'm fairly sure that it
doesn't anything about conformance.

After spending more time reading various writings on the concept
of the "canonical" URI for a resource, and reviewing our language,
I came to the following conclusions:

* canonical URIs only make sense at the whole-resource scope
* A URI with a fragment is neither canonical nor non-canonical
* It makes more sense to talk about fragments w.r.t. canonical URIs
* Our language was sufficiently confusing that going this way seems fine.

As part of this, I fixed an outright incorrect statement that
identifier keywords set canonical URIs.  Since there is only
one canonical URI and a single schema object could contain
three ($id, $anchor, $dynamicAnchor) or more identifier keywords,
this statement is clearly a bug.  These keywords assign URIs,
but only $id assigns a canonical one.

I revamped a lot of wording in descriptions and examples to
hopefully be more precise.  I separated the discussion of
the empty fragment in $id from the main paragraph of its
functionality, and clarified that this is talking about a
media-type-specific semantic equivalence, and is not asserting
that RFC 3986 normalization applies to fragments (this has
been a point of confusion).
Relequestual added a commit that referenced this issue Jun 16, 2022
…cal (#1192)

* Clarify that plain name fragments are neither canonical or non-canonical
Attempt to resolve #937
Add note and cref in appendix A clarifying that we intended to define a URI phrasing which would avoid the requirement to allow for location shadowing in implementations, as this is tricky. Clarifying that plain name fragments should always be supported, and that they only can work in relation to the base URI of the Schema Resource. Otherwise there could be duplicate plain name fragments and addressing wouldn't work.
Relequestual pushed a commit that referenced this issue Jun 16, 2022
Fixes issue #937, clarifying a number of other things along the way.
While it touches a fair number of lines, I'm fairly sure that it
doesn't anything about conformance.

After spending more time reading various writings on the concept
of the "canonical" URI for a resource, and reviewing our language,
I came to the following conclusions:

* canonical URIs only make sense at the whole-resource scope
* A URI with a fragment is neither canonical nor non-canonical
* It makes more sense to talk about fragments w.r.t. canonical URIs
* Our language was sufficiently confusing that going this way seems fine.

As part of this, I fixed an outright incorrect statement that
identifier keywords set canonical URIs.  Since there is only
one canonical URI and a single schema object could contain
three ($id, $anchor, $dynamicAnchor) or more identifier keywords,
this statement is clearly a bug.  These keywords assign URIs,
but only $id assigns a canonical one.

I revamped a lot of wording in descriptions and examples to
hopefully be more precise.  I separated the discussion of
the empty fragment in $id from the main paragraph of its
functionality, and clarified that this is talking about a
media-type-specific semantic equivalence, and is not asserting
that RFC 3986 normalization applies to fragments (this has
been a point of confusion).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment