Skip to content

Integers need further clarification #898

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ssilverman opened this issue Apr 25, 2020 · 28 comments · Fixed by #1437
Closed

Integers need further clarification #898

ssilverman opened this issue Apr 25, 2020 · 28 comments · Fixed by #1437
Labels

Comments

@ssilverman
Copy link
Member

ssilverman commented Apr 25, 2020

There's still an ambiguity when defining an integer. (Note: This issue isn't here to debate how one defines an "integer" for JSON schemas.)

It says in core/section 6.3:

"For consistency, integer JSON numbers SHOULD NOT be encoded with a fractional part."

TLDR: Does "encoded with a fractional part" mean "encoded internally after processing the JSON" or "a floating point number having a zero fractional part encoded as the JSON 1.0"? Another way of putting it: "Is the written JSON number 1.0 considered to be encoded with a fractional part since it has the .0 there?"

There's an ambiguity with how this is phrased. Immediately above this sentence, it mentions that numbers may have "different internal representations." This implies "a number encoded to an internal representation." However, when we speak of JSON documents, we can think of a number "encoded into its JSON format."

For the first case, the int 1 is encoded internally as 1 and has no fractional part, and the float 1.0 is encoded internally as 1.0, which has a fractional part of zero.

For the second case, depending on the parser, the "JSON-encoded" number could end up as either an "internally-encoded" float with a zero fractional part in the case of JavaScript, or as either an "internally-encoded" int or float or even "arbitrary-precision decimal or integer" in other languages, depending on the parser.

So my question is: What does it mean to "not be encoded with a fractional part"? Does this mean: "no empty fractional part and no zero-valued fractional part", or does it just mean "no empty fractional part"? i.e. how does a "zero-valued fractional part" fit in here?

Also see: #79 and json-schema-org/JSON-Schema-Test-Suite#132

@ssilverman
Copy link
Member Author

ssilverman commented Apr 25, 2020

Let me clarify with some cases. Which of the following "encode a fractional part"?

  1. "a_number": 1.0: Ambiguous, this is JSON-encoded to have a .0 suffix and hence a zero-valued fractional part.
  2. "a_number": 1.1: Yes
  3. "a_number": 1: Ambiguous, depends on internal storage. For example, JavaScript would store as a float with a zero-valued fractional part. It might be internally-encoded to contain a fractional part.

Note that I'm assuming that all floating-point internal representations necessarily have a fractional part, by virtual of the fact that they're floating-point numbers.

The bottom line: Do we consider the existence of a fractional part only after mathematical interpretation and after dropping any zero-valued fractional part?

@handrews
Copy link
Contributor

@Julian knows more than he would like to on this topic, I believe.

@handrews handrews added the core label Apr 25, 2020
@karenetheridge
Copy link
Member

karenetheridge commented Apr 25, 2020

~~I thought the 2019-09 spec made it clear that what mattered was how the value was encoded as a json string. That is, "a number": 1.0 is not an integer, because it has a fractional part in its json encoding.

If a particular language loses some data when decoding the json string into its internal representation, then it may not be able to accurately distinguish integers from numbers, and JSON Schema evaluators in that language should either inspect the json string directly, or treat numbers and integers as equivalent (and probably advise json schema authors using that language to only use "number" in all cases, not "integer").

(Some languages even share the same internal data type for characters, strings, and numbers! So they have the same problem but even more so.)~~

I was wrong

@handrews
Copy link
Contributor

What section 6.3 is saying (and a bit of this is context from other parts of section 6) is that since not all languages can distinguish between 1.0 and 1, even if the JSON Parser being used provides some way to make that distinction, that the JSON Schema data model does not make a distinction. They are both integers, and equivalent.

Specifically, if your programming language has both int and float (or similar), parse 1.0 into an int and not a float. Trailing post-decimal zeros in the JSON text cannot be used to change the type behavior.

@awwright
Copy link
Member

awwright commented Apr 25, 2020

I think "a number encoded to an internal representation" is a contradiction of terms, here.

Iirc, I wrote that phrasing as it's been used in draft-05 and beyond.

Does "encoded with a fractional part" mean "encoded internally after processing the JSON" or "a floating point number having a zero fractional part encoded as the JSON 1.0"?

"encoding" always refers to the written series of bytes. We are taking a value (a position on a number line), and turning it into code, hence, encoding.
decoding is the opposite, taking code (a JSON document, in our case) and parsing it into the value space.

I'd like to introduce two terms here—
The lexical space is the series of bytes in the document, the JSON code that travels over the network.
The value space is the value that the application (JSON Schema) cares about.

In JSON Schema things like whitespace, indenting, backslash escapes, and so on, are only part of the lexical space; they do not exist in the value space. If you decode two documents with different whitespace, you will get identical values in the value space.

Likewise, trailing zeros in the decimal part of a number does not change its value—it only changes its lexical representation. Some languages (like C) do make a distinction, and 1.0 and 1 will produce different values; but JSON does not.

※ While 1.0 and 1 are different values in C; 1.00 and 1.0 are the same value (they're both floats equal to 1). 1.0 and 1 can still be "equal" though, because the default == operator performs a mathematical comparison, and C is smart enough to convert the values to the same type first. JSON Schema is significantly less complicated, the value space of numbers is the mathematical value, full stop.

"For consistency, integer JSON numbers SHOULD NOT be encoded with a fractional part."

So what this is saying is:

Since 1.0 is the same value as 1, prefer the shorter encoding of that value.

@ssilverman
Copy link
Member Author

I thought the 2019-09 spec made it clear that what mattered was how the value was encoded as a json string. That is, "a number": 1.0 is not an integer, because it has a fractional part in its json encoding.

This is how I read the current spec, and why I'm asking for clarification on what "encoded with a fractional part" means. I'm with @karenetheridge on this.

@awwright
Copy link
Member

@ssilverman Did my response make sense? There's no such thing as "encoded internally" so I'm having trouble making out what the question here is (do you mean decoded?)

Maybe I can explain the purpose of that section... The paragraph does not relate to decoding, only encoding. First, it points out that JSON numbers do not work the way integers & floats work in most programming languages. And the programming language should not impact how a JSON document is encoded (written), of course.

So, it's suggesting that if (for example) a C encoder is writing a 64-bit floating point number with the value 1 into JSON, it should omit the fractional part: 1 and not 1.0.

Or to refer to the ABNF: numbers in JSON should avoid use the frac production.

Is that a better explanation?

@handrews
Copy link
Contributor

handrews commented Apr 26, 2020

@awwright thanks- to clear up possible confusion, I think I focused on the wrong direction (reading in numbers). This clarifies that it's about writing them out, as this ensures that a C float 1.0, a C int 1 (different types in C), and a JavaScript number 1 or 1.0 (same type in JavaScript) produce the same representation in JSON.

My one question is: Is JSON encoding even within the scope of our spec? I definitely see your point, I had just thought of our spec as focusing on the data model, and the conversion to and from JSON text as the provenance of the JSON RFC. But I haven't thought about it much so I'm open to being wrong here 😄

@awwright
Copy link
Member

awwright commented Apr 26, 2020

Is JSON encoding even within the scope of our spec?

You're right, it's not. I no longer think a capitals SHOULD is appropriate here.

We could probably take out the entire section, or significantly rewrite it.

@Julian
Copy link
Member

Julian commented Apr 26, 2020

(I can't tell if there's still something to respond to here), so will just list a direct response to the first comment -- there's 2 considerations, whether something is an integer or not, and then the second piece which it looks like @handrews and @awwright responded to, so I hope I'm just repeating:

"a_number": 1.0: Ambiguous, this is JSON-encoded to have a .0 suffix and hence a zero-valued fractional part.

In predraft 7, this was potentially not {"type": "integer"} in languages that have floats and whose JSON parsers will parse the JSON text 1.0 into a float.

"a_number": 1.1: Yes

not an integer

"a_number": 1: Ambiguous, depends on internal storage. For example, JavaScript would store as a float with a zero-valued fractional part. It might be internally-encoded to contain a fractional part.

an integer, even pre draft 7.

What that paragraph is saying (clear or otherwise, obviously we can tidy up things) -- is, if you intend to write/encode the integer 1, prefer to encode it as JSON text as 1, rather than 1.0. It's a recommendation for JSON encoding, in order to fit with the post-draft-7 behavior where 1.0 is an integer in the JSON Schema data model, so writing it as such, rather than writing it as 1, is just an encoder being unnecessarily "confusing".

Hopefully that makes some sense and isn't wrong :)

You're right, it's not. I no longer think a capitals SHOULD is appropriate here.

Agree.

@ssilverman
Copy link
Member Author

ssilverman commented Apr 26, 2020

TLDR: I point to this: "A fraction part is a decimal point followed by one or more digits." here: https://tools.ietf.org/html/rfc8259#section-6. ".0" is very explicity a "fractional part". To have otherwise literally disagrees with JSON spec.

"a_number": 1.0: Ambiguous, this is JSON-encoded to have a .0 suffix and hence a zero-valued fractional part.

In predraft 7, this was potentially not {"type": "integer"} in languages that have floats and whose JSON parsers will parse the JSON text 1.0 into a float.

So in draft-2019-09, is "a_number": 1.0 considered an integer? If so, why is the zeroTerminatedFloats.json test in the "optional" section? The test returns "valid" if "data": 1.0 matches an integer. And what does draft-07 have to say? I'm still confused by "encoded with a fractional part" language.

@awwright to answer your response, "encoding" and "decoding" mean something different to me. "encoding" doesn't necessarily mean "encode to bytes". "encode" and "decode" are relative concepts, and opposite to each other. For example, the mathematical number "1" is "internally encoded" as the 32-bit floating-point value 0x3f800000. As well, the concept of the mathematical number "1" can be "encoded as JSON" with "a_number": 1.0. It can also be encoded as "a_number": 1.00. And this: "a_number": 1. Languages encoding as an int will have 0x00000001, and languages encoding as a float will have 0x3f800000.

"decoding" is just the opposite (direction) to "encoding", with the "what" being completely arbitrary. [Note: I don't want to hijack my own thread with a philosophical "what does encoding mean" discussion. I do hear your points but I disagree with some of them; we can hammer that out elsewhere? I agree with the lexical and value distinction. Lexical encodes the concept of the value, etc.]

The first two "JSON encodings of 1" both have a fractional part, ".0" and ".00" respectively. The spec says that a number is an integer if it's "not encoded with a fractional part". I'm just suggesting language cleanup. AND make that zeroTerminatedFloats.json not optional if 1.0 is truly considered to "not have a fractional part." JSON parsers are allowed to make the distinction.

Related are these questions:

  1. Can we JSON-encode the float 1.0 as "a_number": 1?
  2. Can we JSON-encode the int 1 as "a_number": 1.0?
  3. How do the answers to these questions differ between the JSON spec and the JSON Schema spec?
  4. Should all round-trip encoding be bijective? i.e. is there an unambiguous way to move back and forth between a variety of encodings?

I point to this: "A fraction part is a decimal point followed by one or more digits." here: https://tools.ietf.org/html/rfc8259#section-6. ".0" is very explicity a "fractional part". To have otherwise literally disagrees with JSON spec.

I'll reiterate: I just want the language in core/section 6.3 to be clarified, whatever the decision. While I believe "1.0" has a fractional part, I'm not pushing for that specific solution here; that's for a different thread or issue. I just want clarification so I can help update tests and get a better understanding of the consensus decision regarding the JSON Schema spec.

@Julian
Copy link
Member

Julian commented Apr 26, 2020

So in draft-2019-09, is "a_number": 1.0 considered an integer?

Yes, as per 6.1.1's '"integer" which matches any number with a zero fractional part.', and I misspoke, it's since draft 6, not since draft 7.

If so, why is the zeroTerminatedFloats.json test in the "optional" section?

A mistake :) so we should fix that -- the mistake is quite simple, it used to be optional behavior (in pre-draft6) to call 1.0 a float (i.e. not an integer) in languages that could distinguish between them. It is now no longer optional, it must be considered an integer, so that test should be moved out of optional and into type.json. Probably we simply forgot to move it after changing "invalid" to "valid" (see the diff between draft 6 and earlier on that file). PR definitely welcome for that, it applies to draft{6,7,2019-09}.

Going to ignore the middle of your comment not because I don't have opinions but just because I don't see the relevant parts of them yet :), and skip directly to question number 2 of yours:

  1. Can we JSON-encode the int 1 as "a_number": 1.0?

This is the case the spec is trying to discourage.

  1. How do the answers to these questions differ between the JSON spec and the JSON Schema spec?

Probably the JSON spec doesn't care and doesn't say anything about this but I didn't look -- but the JSON spec allows parsers to use floats if it feels like distinguishing (a la Python), and numbers for everything if it doesn't feel like distinguishing (a la JS).

@ssilverman
Copy link
Member Author

Ah, so it's the validator spec that describes this better. If the validation vocabulary is optional, that would explain why the zeroTerminatedFloats.json test is optional.

@ssilverman
Copy link
Member Author

  1. Can we JSON-encode the int 1 as "a_number": 1.0?

This is the case the spec is trying to discourage.

This is the problematic thing for me: I'd much rather see a prohibition instead of a discouragement. If it's not disallowed, then that's where things become muddy and "undefined behaviour".

  1. How do the answers to these questions differ between the JSON spec and the JSON Schema spec?

Probably the JSON spec doesn't care and doesn't say anything about this but I didn't look -- but the JSON spec allows parsers to use floats if it feels like distinguishing (a la Python), and numbers for everything if it doesn't feel like distinguishing (a la JS).

I'm surmising that there are more common languages that differentiate between integers and floating-point than those that don't, so why not cater to those instead? Just a thought.

@Julian
Copy link
Member

Julian commented Apr 27, 2020

Most of the point of JSON Schema is portability :)

If it didn't acknowledge the existence of languages like JS it's just excluding its ability to be used.

@notEthan
Copy link
Contributor

notEthan commented Apr 27, 2020

Ah, so it's the validator spec that describes this better. If the validation vocabulary is optional, that would explain why the zeroTerminatedFloats.json test is optional.

the validation vocabulary isn't optional - it is used by the core (e.g. oneOf requires validation of each subschema)

@ssilverman
Copy link
Member Author

Most of the point of JSON Schema is portability :)

If it didn't acknowledge the existence of languages like JS it's just excluding its ability to be used.

JSON itself doesn't restrict to 53-bit (significant digits) numbers, but that's what it seems like you're suggesting here. Just because some languages (JavaScript) restrict numbers to 53-bits, doesn't mean that should hobble the schema spec. That's also why I think making "1.0" an "integer" should be optional. It should be acknowledged, but I don't think it's a good thing to always restrict to lowest common feature set.

"Acknowledge" isn't the same thing as "restrict".

This discussion brings to mind the question, since you bring up "acknowledge the existence of JS": Is the whole reason that "1.0" is considered an "integer" because of JavaScript's number limitations? What's the original intent? Even JSON doesn't restrict this, why hobble for most other languages?

@ssilverman
Copy link
Member Author

ssilverman commented Apr 27, 2020

Ah, so it's the validator spec that describes this better. If the validation vocabulary is optional, that would explain why the zeroTerminatedFloats.json test is optional.

the validation vocabulary isn't optional - it is used by the core (e.g. oneOf requires validation of each subschema)

Ah, for some reason, I thought only the core vocabulary was not optional. I see in the main schema definition that only "format" is optional.

@Julian
Copy link
Member

Julian commented Apr 27, 2020

JSON itself doesn't restrict to 53-bit (significant digits) numbers, but that's what it seems like you're suggesting here. Just because some languages (JavaScript) restrict numbers to 53-bits, doesn't mean that should hobble the schema spec. That's also why I think making "1.0" an "integer" should be optional. It should be acknowledged, but I don't think it's a good thing to always restrict to lowest common feature set.

I'll be honest, I can't tell anymore what if anything we're arguing about :)

Do you have a specific suggestion that I lost track of for what you think should change and to what? That may make it easier to know if we are, and apologies if you do and said it and it's me that can't concentrate for long enough.

(Beyond of course "move zeroTerminatedFloats.json to required", which yes I definitely agree with)

@ssilverman
Copy link
Member Author

You're right, this is off into the weeds... (But to summarize and explain: I was giving an illustrative example of JavaScript's restrictions influencing specifications. I also don't agree that "1.0" should be correctly validated as an "integer" because "1.1" does not, and they're JSON-encoded the same way, with a valid "fractional part" according to the JSON specification. But, this isn't about changing the schema spec; I can disagree but still go along with it.)

Anyways... here's what I'd like to see:

  1. Make that test for "1.0" being an "integer" be non-optional.
  2. Clarify the language in the Core spec to say something equivalent to "integers don't contain a non-zero fractional part but can have a zero-valued fractional part." I know it's also in the validation spec, but since the core spec touches on it, may as well be complete.
  3. Add a caution, since the phrasing is currently "SHOULD NOT", that if someone chooses to write an integer with a zero-valued fractional part, that the result may be unexpected. For example, many programmers are used to the code "1.0" being a non-integer.

@Julian
Copy link
Member

Julian commented Apr 27, 2020

Make that test for "1.0" being an "integer" be non-optional.

Agreed, PR welcome, or issue ticket certainly.

Clarify the language in the Core spec to say something equivalent to "integers don't contain a non-zero fractional part but can have a zero-valued fractional part." I know it's also in the validation spec, but since the core spec touches on it, may as well be complete.

No personal preference, I agree the current language can be confusing but I don't have a suggestion on how to improve it, so it seems like this issue ticket should track any suggested better language, sound like you have a suggested one.

Add a caution, since the phrasing is currently "SHOULD NOT", that if someone chooses to write an integer with a zero-valued fractional part, that the result may be unexpected. For example, many programmers are used to the code "1.0" being a non-integer.

This I disagree with. The spec is unambiguous here. It may be programmers are used to things -- but the spec unambiguously says 1.0 is an integer -- it cannot also say you may expect something "unexpected" about that. You may not like it (I don't either!) but I don't want the spec saying it's unexpected behavior, it's the behavior the spec says is proscribed.

@Julian
Copy link
Member

Julian commented Apr 27, 2020

Oh, maybe you mean "unexpected" just in a layman's sense though? As a callout that this may be different from behavior elsewhere (e.g. in said programming languages)? That I don't disagree with, but I don't think is appropriate for the text of a specification, it's more something for external notes AIUI, but I'm no expert on the norms of what goes in a spec, so maybe if this is what you mean I also have no opinion about it :)

@handrews
Copy link
Contributor

handrews commented Apr 27, 2020

JSON does restrict interoperability to IEEE binary64 (double precision) numbers.

If it didn't acknowledge the existence of languages like JS it's just excluding its ability to be used.

Yeah, one of the things I dislike most about working on this spec is when the strict static typed language people show up and demand that JSON Schema do things that only benefit them and completely sabotage the system's usefulness for all other styles of languages. I pretty much always disregard those arguments now b/c people who make them never seem to understand any other perspective and I've wasted enough time trying to get them to look beyond their own needs.

@handrews
Copy link
Contributor

Ah, so it's the validator spec that describes this better. If the validation vocabulary is optional, that would explain why the zeroTerminatedFloats.json test is optional.

the validation vocabulary isn't optional - it is used by the core (e.g. oneOf requires validation of each subschema)

Ah, for some reason, I thought only the core vocabulary was not optional. I see in the main schema definition that only "format" is optional.

@notEthan The validator vocabulary is optional in the sense that you can build a JSON Schema system without it (there's an issue somewhere where such systems were discussed). But that doesn't mean you can run the test suite without it.

Regarding oneOf, validation is necessary, but the validation vocabulary is not. This schema uses core only:

{
  "oneOf": [
    {"oneOf": [false, {"title": "foo"}, false]},
    {"oneOf": [true, false, {"title": "bar"}]}
  ]
}

this schema will always annotate the instance with the title "foo" and never the title "bar".

@notEthan
Copy link
Contributor

interesting. thank you for the clarification.

@awwright
Copy link
Member

Probably we simply forgot to move it after changing "invalid" to "valid" (see the diff between draft 6 and earlier on that file).

This is exactly what happened, yeah. I think I requested that change, but didn't realize the file should move too.

@handrews
Copy link
Contributor

Side note: I'm really thrilled over how many people are digging into this and helping improve the spec.

Julian added a commit to json-schema-org/JSON-Schema-Test-Suite that referenced this issue Apr 29, 2020
Move the tests from zeroTerminatedFloat.json to type, since
as of draft 6, this behavior is required, and required to
consider such numbers to be integers.

See json-schema-org/json-schema-spec#898 for some discussion,
not of the underlying requirement, simply of this file being
in the wrong place.
@gregsdennis
Copy link
Member

Not sure why section 4.2.2 hasn't been brought up yet. Seems to very clearly define two numbers as equal if their values are equal. This means 1.0 is defined to be equal to as 1, so 1.0 is an integer, even if it's not written that way.

(Reading through the conversation, it seems to have landed on that conclusion anyway.)

I'm writing up a PR to just remove the "Mathematical Integers" section. That should resolve this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants