"unevaluatedProperties" to facilitate re-use and better schema organization

## TL;DR

`unevaluatedProperties` is like `additionalProperties`, except that it can "see through" `$ref` and "see inside" `allOf`, `anyOf`, `oneOf`, `if`, `then`, `else`, and whatever else I'm forgetting right now.

This is a problem that becomes important at large scale.  In small examples, you can easily refactor your way out of it, but it becomes very cumbersome at scale.  This is why there's a lot of discussion about OpenAPI's schema for their spec.

Figuring out how to make this work turns out to be complicated, which accounts for the length of the rest of the issue.

-----

_This is a distillation of the results of 230+ comments on #515, not to mention the 300+ comments spread across several other older issues that fed into that one.  I know it's long.  Please don't complain unless you can offer a shorter write-up. :-)_

-----

_**CRITICALLY IMPORTANT NOTE:** This issue is for discussing the use case given in the next section, and the `unevaluatedProperties` proposal to solve it.  If you want to discuss a different use case or a different proposal, you **MUST file your own issue** to do so.  Any comments attempting to revive other lines of discussion from #515, introduce new problems or solutions, or otherwise derail this discussion will be deleted to keep the focus clear.  Please file a new issue and link back to this one instead._

## The use case

Many JSON Schema authors want validation to fail if any properties that are not explicitly matched to a successfully validating schema object are present.

An example is the [OpenAPI Specification (OAS)](https://github.com/OAI/OpenAPI-Specification/blob/master/versions/3.0.1.md), which uses a regular expression, `^x-`, for all extension keywords.  Any keyword that is neither a standard keyword nor begins with "x-" is in violation of the specification and should cause validation of the OpenAPI document to fail.

OAS has many conditional keywords or allowed values.  The most complex scenario is the [Parameter Object](https://github.com/OAI/OpenAPI-Specification/blob/master/versions/3.0.1.md#parameterObject), which can describe parameters in four different values for "in" (path, query, header, or cookie), each of which works with a different subset of another enumerated field, "style", and in the case of "path", restricts the behavior of "required".  Additionally, the object can use one of several mutually exclusive schema and example approaches to describe the parameter.

This results in 10 separate schemas in their [currently proposed schema for OAS 3.0](https://github.com/OAI/OpenAPI-Specification/pull/1270/files#diff-ae03581069a5fdad057783eae9b1ab88R715) to cover all of the variations, because each variation needs to specify `"additionalProperties": false` to correctly validate the specification.

Some of these variations only vary by an `enum` subset, so they could be reduced somewhat.  But it's worth observing that that is not the most intuitive approach.

More importantly, many of the variations, as well as variations of numerous other objects in the OAS 3.0 spec, *cannot* be substantially reduced because they vary in terms of which properties are allowed to be present.

In the cases where the variance is in allowed properties (`example` and `examples` are mutually exclusive, as are `content` and `schema`, and different sets of other properties are allowed for `content` vs `schema`), the set of properties to forbid can only be determined dynamically at runtime.

In their current proposed schema, this is accomplished by `oneOf`-ing all of the variations together, so that they each use `"additionalProperties": false` independently.

This does work, but as can be seen in the PR, it raised serious concerns over the maintainability of the resulting schema due to all of the duplication.

I'll come back to how this proposal solves the problems at the end of this write-up.

## Rationale for `additionalProperties`

It's worth re-stating the intended use case of `additionalProperties`, which is _to validate objects with uniform property structures_, similar to a map of strings to a specific type or class.  Its behavior with respect to `patternProperties` and `properties` allows for carving out exceptions and special cases within such a map.

It meets this use case very well.  The reason new schema authors are often confused is not that this is an unimportant use case (it appears repeatedly in the meta-schema, for example), but because the use case described in the previous section is not addressed, and `additionalProperties` "feels" the closest.

We do not want to change or remove `additionalProperties`.  Providing a clear solution for the above use case will dramatically reduce or eliminate the misunderstandings around `additionalProperties`.


## The proposal: `unevaluatedProperties`

`unevaluatedProperties` is similar to `additionalProperties` in that it has a single subschema, and it applies that subschema to instance properties that are **not** a member of some set.

However, `unevaluatedProperties` has dynamic behavior, meaning that the set of properties to which it applies cannot be determined from static analysis of the schema (either the immediate schema object or any subschemas of that object).

Instead, it uses the results, specifically annotation results, of adjacent keywords to compute the set of properties to which it applies.  This relies on annotation collection mechanisms examined in detail in #530.  _Please **do not** debate the annotation collection mechanism here in this issue.  Please **do** debate it in #530._

Note that `unevaluatedProperties` can specify any subschema.  The examples here will focus on the `false` boolean subschema as that is the most common use case.

### The "evaluatedProperties" annotation

The `unevaluatedProperties` keyword defines an annotation named **"evaluatedProperties"**, to which `properties`, `patternProperties`, and `additionalProperties` contribute.

Whenever an instance property is successfully evaluated against any subschema of any of these keywords, it is added to the "evaluatedProperties" annotation.  As with all annotations, if at any point validation of a subschema fails, all annotations are dropped, and only the validation failure is propagated up to the parent schema.

The instance location with which "evaluatedProperties" is associated is the object containing the properties being evaluated, not the individual property locations.  For an instance:

```JSON
{
    "foo": 1,
    "bar": "hi",
    "baz": true
}
```

being validated against the schema:

```JSON
{
    "properties": {
        "foo": {"type": "integer"}
    },
    "patternProperties": {
        "r$": {"type": "string"}
    }
}
```

The set of evaluated properties associated with location **"#"** (the root schema) would be `{"evaluatedProperties": ["foo", "bar"]}`.  No evaluated properties would be associated with the locations **"#/foo"** or **"#/bar"**.

#### Interaction with `additionalProperties`

Note that `additionalProperties` also contributes to the "evaluatedProperties" annotation.  This is to properly handle a situation such as the following:

```JSON
{
    "oneOf": [
        {
            "required": ["special"],
            "properties": {
                "special": {...},
                ...
            }
        },
        {
            "properties": {"special": false},
            "additionalProperties": {...}
        }
    ],
    "unevaluatedProperties": false
}
```

This validates schemas that either meet the special case by including a property named "special", in which case any properties not present in that subschema's `properties` should cause validation to fail, _or_ it is a uniform map that cannot contain the key "special", in which case no property should be considered unevaluated.

Furthermore, this means that `"additionalProperties": true` can be used to exempt a branch of a conditional from `unevaluatedProperties`, which may sometimes be useful when one branch is extensible and the others are not.

#### Interaction with itself

_**Open question:** Should `unevaluatedProperties` itself contribute to "evaluatedProperties"?  This would mean that if `unevaluatedProperties` appears in both a parent schema and a subschema, only the one in the subschema will ever be used.  This, of course, is only relevant if the `unevaluatedProperties` in the subschema has a non-`false` subschema of its own._

### Depending on adjacent keyword results

With only `unevaluatedProperties` depending on adjacent results, the order of operations is quite simple: process all other keywords, and then process `unevaluatedProperties`.  However, if more keywords with similarly dynamic behavior are defined, this becomes less clear.

To keep things simple, all keywords with dynamic results dependencies are processed after all keywords that do **not** have such dependencies, and only the results from non-dynamic keywords are used.

Therefore, adjacent dynamic keywords do not affect each other.  If interaction is desired, then `allOf` may be used to put the static keywords and the dynamic keyword that should be allowed first into a subschema beneath an `allOf`, while the dynamic keyword that should be evaluated second is adjacent to the `allOf`.  Here is an example with two hypothetical dynamic keywords:

```JSON
{
    "processThisSecond": {...},
    "allOf": [
        {
            "oneOf": [...],
            "processThisFirst": {...}
        }
    ]
}
```

## A simple example

Consider the following schema.  First, some caveats, because there has been some confusion over the purpose of this example.

* This example illustrates the _mechanism_ of `unevaluatedProperties`
* It is intentionally very simple and self-contained, to make it easy to follow
* This means that it is trivial to refactor to remove the need for `unevaluatedProperties`
* For a non-trivial example to show why simply refactoring is not sufficient, see the OpenAPI example linked in the next section
* Alternatively, consider a situation where the branches of the `oneOf` are separate schemas owned by other entities (and therefore impossible to refactor without forking), which are intended to provide an opaque validation interface (and therefore may change internal details without warning, but without changing the desired validation outcome) and are included by `$ref`

```JSON
{
    "title": "Vehicle",
    "type": "object",
    "oneOf": [
        {
            "title": "Car",
            "required": ["wheels", "headlights"],
            "properties": {
                "wheels": {},
                "headlights": {}
            }
        },
        {
            "title": "Boat",
            "required": ["pontoons"],
            "properties": {
                "pontoons": {}
            }   
        },  
        {
            "title": "Plane",
            "required": ["wings"],
            "properties": {
                "wings": {}
            }
        }
    ],
    "unevaluatedProperties": false
}
```

Given the above schema, this instance:

`{"pontoons": ...}`

Is a boat, but not a car or a plane. It only has the one property defined by the boat schema, so `unevaluatedProperties` does not come into play.

This instance:

`{"pontoons": ..., "wheels": ...}`

however, fails validation because of `unevaluatedProperties`.

It is a valid boat according to the Boat schema. It has pontoons, and the Boat schema doesn't forbid anything else on its own. It is still not a valid Car (because it lacks headlights), and it is not a valid Plane (because it lacks wings). So it is valid against the `oneOf` (it's a boat but not a car or plane).


Since only the Boat branch of the `oneOf` passed validation, only "pontoons" is considered to have been evaluated.  While "wheels" was successfully validated against the relevant `properties` subschema in the Car branch, the instance failed to validate against the Car schema as a whole, so "wheels" was dropped from the set of evaluated properties.

This means that when considering the `"unevaluatedProperties": false` in the root schema, "wheels" has not been evaluated, so `unevaluatedProperties` applies to it, and therefore validation fails because the `false` subschema fails by definition against any instance.

## The OAS specification as an example

The [refactored OAS 3.0 schema](https://gist.github.com/handrews/6dfebd56ef97328f9e4dc7a47a1e8bc7) shows how `unevaluatedProperties` enabled cutting the schema length by more than 40% (1500 lines to 850).

In particular, it allowed for organizing common traits (such as extensibility, or different ways of showing examples as schemas that can be mixed in to the main object definitions.

Additionally, it allows leveraging the `*Of` keywords fully to eliminate duplication in the [Parameter Object schema](https://gist.github.com/handrews/6dfebd56ef97328f9e4dc7a47a1e8bc7#file-oas3-draft-08-schema-yaml-L452).  The Parameter object uses up to four levels of `*Of` alongside of `unevaluatedProperties`.

I presented this refactor to the OpenAPI Technical Steering Council on March 2, and it was well-received.
 
## Writing PRs for this proposal

Obviously this is a complex addition to the spec.  I do not expect to write it all up as the description of the `unevaluatedProperties` keyword.  Rather, the core spec will define the appropriate mechanisms in a general way (as they should be available for other keywords).  PRs will introduce various mechanisms step by step.  Some of these have issues already.  A possible breakdown could be:

* Annotation collection using instance values (`links` also does this)
* Defining annotations to which multiple keywords contribute (this is new, see #530)
* Defining subschema and keyword processing results to include annotations
* Processing sequence for keywords that dynamically rely on the results of static keywords
* The actual definition of `unevaluatedProperties`
* An example of `unevaluatedProperties`

A fair amount of text in this issue explains various use cases and interactions.  Such material will not be included directly in the specification, although one concise and clear example should be included.  Use case material may be moved to the web site.

Note also that several other issues, such as #513, #514, and #523, will be addressed before working on this, to give time for feedback on this proposal.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

"unevaluatedProperties" to facilitate re-use and better schema organization #556

TL;DR

The use case

Rationale for `additionalProperties`

The proposal: `unevaluatedProperties`

The "evaluatedProperties" annotation

Interaction with `additionalProperties`

Interaction with itself

Depending on adjacent keyword results

A simple example

The OAS specification as an example

Writing PRs for this proposal

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

"unevaluatedProperties" to facilitate re-use and better schema organization #556

Description

TL;DR

The use case

Rationale for additionalProperties

The proposal: unevaluatedProperties

The "evaluatedProperties" annotation

Interaction with additionalProperties

Interaction with itself

Depending on adjacent keyword results

A simple example

The OAS specification as an example

Writing PRs for this proposal

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Rationale for `additionalProperties`

The proposal: `unevaluatedProperties`

Interaction with `additionalProperties`