Skip to content

Commit ee2a5f4

Browse files
handrewsRelequestual
authored andcommitted
Clarify various things about canonical URIs
Fixes issue json-schema-org#937, clarifying a number of other things along the way. While it touches a fair number of lines, I'm fairly sure that it doesn't anything about conformance. After spending more time reading various writings on the concept of the "canonical" URI for a resource, and reviewing our language, I came to the following conclusions: * canonical URIs only make sense at the whole-resource scope * A URI with a fragment is neither canonical nor non-canonical * It makes more sense to talk about fragments w.r.t. canonical URIs * Our language was sufficiently confusing that going this way seems fine. As part of this, I fixed an outright incorrect statement that identifier keywords set canonical URIs. Since there is only one canonical URI and a single schema object could contain three ($id, $anchor, $dynamicAnchor) or more identifier keywords, this statement is clearly a bug. These keywords assign URIs, but only $id assigns a canonical one. I revamped a lot of wording in descriptions and examples to hopefully be more precise. I separated the discussion of the empty fragment in $id from the main paragraph of its functionality, and clarified that this is talking about a media-type-specific semantic equivalence, and is not asserting that RFC 3986 normalization applies to fragments (this has been a point of confusion).
1 parent 371dae1 commit ee2a5f4

File tree

1 file changed

+77
-64
lines changed

1 file changed

+77
-64
lines changed

jsonschema-core.xml

+77-64
Original file line numberDiff line numberDiff line change
@@ -315,8 +315,8 @@
315315
of five categories:
316316
<list style="hanging">
317317
<t hangText="identifiers:">
318-
control schema identification through setting the schema's
319-
canonical URI and/or changing how the base URI is determined
318+
control schema identification through setting a URI
319+
for the schema and/or changing how the base URI is determined
320320
</t>
321321
<t hangText="assertions:">
322322
produce a boolean result when applied to an instance
@@ -426,7 +426,9 @@
426426
<t>
427427
A JSON Schema resource is a schema which is
428428
<xref target="RFC6596">canonically</xref> identified by an
429-
<xref target="RFC3986">absolute URI</xref>.
429+
<xref target="RFC3986">absolute URI</xref>. Schema resources MAY
430+
also be identified by URIs including fragments. Any such URIs
431+
are considered to be non-canonical.
430432
</t>
431433
<t>
432434
The root schema is the schema that comprises the entire JSON document
@@ -730,9 +732,9 @@
730732
be able to support those keywords or vocabularies that contain them.
731733
</t>
732734
</section>
733-
<section title="Identifiers" anchor="identifiers">
735+
<section title="Identifiers">
734736
<t>
735-
Identifiers set the canonical URI of a schema, or affect how such URIs are
737+
Identifiers define URIs for a schema, or affect how such URIs are
736738
resolved in <xref target="references">references</xref>, or both.
737739
The Core vocabulary defined in this document defines several
738740
identifying keywords, most notably "$id".
@@ -1340,26 +1342,31 @@
13401342
<t>
13411343
If present, the value for this keyword MUST be a string, and MUST represent a
13421344
valid <xref target="RFC3986">URI-reference</xref>. This URI-reference
1343-
SHOULD be normalized, and MUST resolve to an
1344-
<xref target="RFC3986">absolute-URI</xref> (without a fragment). Therefore,
1345-
"$id" MUST NOT contain a non-empty fragment, and SHOULD NOT contain an
1346-
empty fragment.
1345+
SHOULD be normalized, and MUST be semantically equivalent to an
1346+
<xref target="RFC3986">absolute-URI</xref> (without a fragment).
13471347
</t>
13481348
<t>
1349-
Since an empty fragment in the context of the application/schema+json media
1350-
type refers to the same resource as the base URI without a fragment,
1351-
an implementation MAY normalize a URI ending with an empty fragment by removing
1352-
the fragment. However, schema authors SHOULD NOT rely on this behavior
1353-
across implementations.
1349+
The application/schema+json media type defines that an absolute-URI
1350+
identifying a resource and the same URI with an empty fragment
1351+
appended (which identifies the resource's root schema object) are
1352+
semantically equivalent. Since this semantic equivalence is not part
1353+
of the <xref target="RFC3986">RFC 3986 normalization process</xref>,
1354+
implementors and schema authors cannot rely on generic URI libraries
1355+
understanding the equivalence.
1356+
</t>
1357+
<t>
1358+
Therefore, "$id" MUST NOT contain a non-empty fragment, and SHOULD NOT
1359+
contain an empty fragment. The absolute-URI form MUST be considered
1360+
the canonical URI, regardless of the presence or absence of an empty fragment.
13541361
<cref>
1355-
This is primarily allowed because older meta-schemas have an empty
1356-
fragment in their $id (or previously, id). A future draft may outright
1357-
forbid even empty fragments in "$id".
1362+
An empty fragment is currently allowed because older meta-schemas have
1363+
an empty fragment in their $id (or previously, id).
1364+
A future draft may outright forbid even empty fragments in "$id".
13581365
</cref>
13591366
</t>
13601367
<t>
1361-
This URI also serves as the base URI for relative URI-references in keywords
1362-
within the schema resource, in accordance with
1368+
The absolute-URI also serves as the base URI for relative URI-references
1369+
in keywords within the schema resource, in accordance with
13631370
<xref target="RFC3986">RFC 3986 section 5.1.1</xref> regarding base URIs
13641371
embedded in content.
13651372
</t>
@@ -1623,7 +1630,7 @@
16231630
media type.
16241631
</t>
16251632
<t>
1626-
Unless the "$id" keyword described in the next section is present in the
1633+
Unless the "$id" keyword described in an earlier section is present in the
16271634
root schema, this base URI SHOULD be considered the canonical URI of the
16281635
schema document's root schema resource.
16291636
</t>
@@ -1750,7 +1757,7 @@
17501757
Since JSON Pointer URI fragments are constructed based on the structure
17511758
of the schema document, an embedded schema resource and its subschemas
17521759
can be identified by JSON Pointer fragments relative to either its own
1753-
canonical URI, or relative to the containing resource's URI.
1760+
canonical URI, or relative to a containing resource's URI.
17541761
</t>
17551762
<t>
17561763
Conceptually, a set of linked schema resources should behave
@@ -1782,13 +1789,18 @@
17821789
}
17831790
]]>
17841791
</artwork>
1785-
<postamble>
1786-
The URI "https://example.com/foo#/items/additionalProperties"
1787-
points to the schema of the "additionalProperties" keyword in
1788-
the embedded resource. The canonical URI of that schema, however,
1789-
is "https://example.com/bar#/additionalProperties".
1790-
</postamble>
17911792
</figure>
1793+
<t>
1794+
The URI "https://example.com/foo#/items" points to the "items" schema,
1795+
which is an embedded resource. The canonical URI of that schema
1796+
resource, however, is "https://example.com/bar".
1797+
</t>
1798+
<t>
1799+
For the "additionalProperties" schema within that embedded resource,
1800+
the URI "https://example.com/foo#/items/additionalProperties" points
1801+
to the correct object, but that object's URI relative to its resource's
1802+
canonical URI is "https://example.com/bar#/additionalProperties".
1803+
</t>
17921804
<figure>
17931805
<preamble>
17941806
Now consider the following two schema resources linked by reference
@@ -1810,24 +1822,25 @@
18101822
]]>
18111823
</artwork>
18121824
<postamble>
1813-
Here we see that the canonical URI for that "additionalProperties"
1814-
subschema is still valid, while the non-canonical URI with the fragment
1815-
beginning with "#/items/$ref" now resolves to nothing.
1825+
Here we see that the URI for the "additionalProperties" schema object
1826+
that is relative to its resource's canonical URI is still valid,
1827+
while the URI relative to the "items" schema object's URI no longer
1828+
resolves to anything.
18161829
</postamble>
18171830
</figure>
18181831
<t>
18191832
Note also that "https://example.com/foo#/items" is valid in both
18201833
arrangements, but resolves to a different value. This URI ends up
1821-
functioning similarly to a retrieval URI for a resource. While valid,
1822-
examining the resolved value and either using the "$id" (if the value
1823-
is a subschema), or resolving the reference and using the "$id" of the
1824-
reference target, is preferable.
1834+
functioning similarly to a retrieval URI for a resource. While this URI
1835+
is valid, it is more robust to use the "$id" of the embedded or referenced
1836+
resource unless it is specifically desired to identify the object containing
1837+
the "$ref" in the second (non-embedded) arrangement.
18251838
</t>
18261839
<t>
1827-
An implementation MAY choose not to support addressing schemas
1828-
by non-canonical URIs. As such, it is RECOMMENDED that schema authors only
1829-
use canonical URIs, as using non-canonical URIs may reduce
1830-
schema interoperability.
1840+
An implementation MAY choose not to support addressing schema resource
1841+
contents by URIs using a base other than the resource's canonical URI,
1842+
plus a JSON Pointer fragment relative to that base. Therefore, schema
1843+
authors SHOULD NOT rely on such URIs, as using them may reduce interoperability.
18311844
<cref>
18321845
This is to avoid requiring implementations to keep track of a whole
18331846
stack of possible base URIs and JSON Pointer fragments for each,
@@ -1839,9 +1852,9 @@
18391852
</cref>
18401853
</t>
18411854
<t>
1842-
Further examples of such non-canonical URIs, as well as the appropriate
1843-
canonical URIs to use instead, are provided in appendix
1844-
<xref target="idExamples" format="counter"></xref>.
1855+
Further examples of such non-canonical URI construction, as well as
1856+
the appropriate canonical URI-based fragments to use instead,
1857+
are provided in appendix <xref target="idExamples" format="counter"></xref>.
18451858
</t>
18461859
</section>
18471860
</section>
@@ -2704,8 +2717,8 @@
27042717
<section title="Keyword Absolute Location">
27052718
<t>
27062719
The absolute, dereferenced location of the validating keyword. The value MUST
2707-
be expressed as a full URI using the canonical URI of the relevant
2708-
schema object, and it MUST NOT include by-reference applicators
2720+
be expressed as a full URI using the canonical URI of the relevant schema resource
2721+
with a JSON Pointer fragment, and it MUST NOT include by-reference applicators
27092722
such as "$ref" or "$dynamicRef" as non-terminal path components.
27102723
It MAY end in such keywords if the error or annotation is for that
27112724
keyword, such as an unresolvable reference.
@@ -3314,76 +3327,76 @@ https://example.com/schemas/common#/$defs/count/minimum
33143327
<list style="hanging">
33153328
<t hangText="# (document root)">
33163329
<list style="hanging">
3317-
<t hangText="canonical absolute-URI (and also base URI)">
3330+
<t hangText="canonical (and base) URI">
33183331
https://example.com/root.json
33193332
</t>
3320-
<t hangText="canonical URI with pointer fragment">
3333+
<t hangText="canonical resource URI plus pointer fragment">
33213334
https://example.com/root.json#
33223335
</t>
33233336
</list>
33243337
</t>
33253338
<t hangText="#/$defs/A">
33263339
<list>
33273340
<t hangText="base URI">https://example.com/root.json</t>
3328-
<t hangText="canonical URI with plain fragment">
3341+
<t hangText="canonical resource URI plus plain fragment">
33293342
https://example.com/root.json#foo
33303343
</t>
3331-
<t hangText="canonical URI with pointer fragment">
3344+
<t hangText="canonical resource URI plus pointer fragment">
33323345
https://example.com/root.json#/$defs/A
33333346
</t>
33343347
</list>
33353348
</t>
33363349
<t hangText="#/$defs/B">
33373350
<list style="hanging">
3338-
<t hangText="base URI">https://example.com/other.json</t>
3339-
<t hangText="canonical URI with pointer fragment">
3351+
<t hangText="canonical (and base) URI">https://example.com/other.json</t>
3352+
<t hangText="canonical resource URI plus pointer fragment">
33403353
https://example.com/other.json#
33413354
</t>
3342-
<t hangText="non-canonical URI with fragment relative to root.json">
3355+
<t hangText="base URI of enclosing (root.json) resource plus fragment">
33433356
https://example.com/root.json#/$defs/B
33443357
</t>
33453358
</list>
33463359
</t>
33473360
<t hangText="#/$defs/B/$defs/X">
33483361
<list style="hanging">
33493362
<t hangText="base URI">https://example.com/other.json</t>
3350-
<t hangText="canonical URI with plain fragment">
3363+
<t hangText="canonical resource URI plus plain fragment">
33513364
https://example.com/other.json#bar
33523365
</t>
3353-
<t hangText="canonical URI with pointer fragment">
3366+
<t hangText="canonical resource URI plus pointer fragment">
33543367
https://example.com/other.json#/$defs/X
33553368
</t>
3356-
<t hangText="non-canonical URI with fragment relative to root.json">
3369+
<t hangText="base URI of enclosing (root.json) resource plus fragment">
33573370
https://example.com/root.json#/$defs/B/$defs/X
33583371
</t>
33593372
</list>
33603373
</t>
33613374
<t hangText="#/$defs/B/$defs/Y">
33623375
<list style="hanging">
3363-
<t hangText="base URI">https://example.com/t/inner.json</t>
3364-
<t hangText="canonical URI with plain fragment">
3376+
<t hangText="canonical (and base) URI">https://example.com/t/inner.json</t>
3377+
<t hangText="canonical URI plus plain fragment">
33653378
https://example.com/t/inner.json#bar
33663379
</t>
3367-
<t hangText="canonical URI with pointer fragment">
3380+
<t hangText="canonical URI plus pointer fragment">
33683381
https://example.com/t/inner.json#
33693382
</t>
3370-
<t hangText="non-canonical URI with fragment relative to other.json">
3383+
<t hangText="base URI of enclosing (other.json) resource plus fragment">
33713384
https://example.com/other.json#/$defs/Y
33723385
</t>
3373-
<t hangText="non-canonical URI with fragment relative to root.json">
3386+
<t hangText="base URI of enclosing (root.json) resource plus fragment">
33743387
https://example.com/root.json#/$defs/B/$defs/Y
33753388
</t>
33763389
</list>
33773390
</t>
33783391
<t hangText="#/$defs/C">
33793392
<list style="hanging">
3380-
<t hangText="base URI">
3393+
<t hangText="canonical (and base) URI">
33813394
urn:uuid:ee564b8a-7a87-4125-8c96-e9f123d6766f
33823395
</t>
3383-
<t hangText="canonical URI with pointer fragment">
3396+
<t hangText="canonical URI plus pointer fragment">
33843397
urn:uuid:ee564b8a-7a87-4125-8c96-e9f123d6766f#
33853398
</t>
3386-
<t hangText="non-canonical URI with fragment relative to root.json">
3399+
<t hangText="base URI of enclosing (root.json) resource plus fragment">
33873400
https://example.com/root.json#/$defs/C
33883401
</t>
33893402
</list>
@@ -3415,16 +3428,16 @@ https://example.com/schemas/common#/$defs/count/minimum
34153428
<t>
34163429
This transformation can be safely and reversibly done as long as
34173430
all static references (e.g. "$ref") use URI-references that resolve
3418-
to canonical URIs, and all schema resources have an absolute-URI
3419-
as the "$id" in their root schema.
3431+
to URIs using the canonical resource URI as the base, and all schema
3432+
resources have an absolute-URI as the "$id" in their root schema.
34203433
</t>
34213434
<t>
34223435
With these conditions met, each external resource can be copied
34233436
under "$defs", without breaking any references among the resources'
34243437
schema objects, and without changing any aspect of validation or
34253438
annotation results. The names of the schemas under "$defs" do
34263439
not affect behavior, assuming they are each unique, as they
3427-
do not appear in canonical URIs for the embedded resources.
3440+
do not appear in the canonical URIs for the embedded resources.
34283441
</t>
34293442
</section>
34303443
<section title="Reference removal is not always safe">

0 commit comments

Comments
 (0)