-
-
Notifications
You must be signed in to change notification settings - Fork 307
Integers need further clarification #898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Let me clarify with some cases. Which of the following "encode a fractional part"?
Note that I'm assuming that all floating-point internal representations necessarily have a fractional part, by virtual of the fact that they're floating-point numbers. The bottom line: Do we consider the existence of a fractional part only after mathematical interpretation and after dropping any zero-valued fractional part? |
@Julian knows more than he would like to on this topic, I believe. |
~~I thought the 2019-09 spec made it clear that what mattered was how the value was encoded as a json string. That is, If a particular language loses some data when decoding the json string into its internal representation, then it may not be able to accurately distinguish integers from numbers, and JSON Schema evaluators in that language should either inspect the json string directly, or treat numbers and integers as equivalent (and probably advise json schema authors using that language to only use "number" in all cases, not "integer"). (Some languages even share the same internal data type for characters, strings, and numbers! So they have the same problem but even more so.)~~ I was wrong |
What section 6.3 is saying (and a bit of this is context from other parts of section 6) is that since not all languages can distinguish between Specifically, if your programming language has both |
I think "a number encoded to an internal representation" is a contradiction of terms, here. Iirc, I wrote that phrasing as it's been used in draft-05 and beyond.
"encoding" always refers to the written series of bytes. We are taking a value (a position on a number line), and turning it into code, hence, encoding. I'd like to introduce two terms here— In JSON Schema things like whitespace, indenting, backslash escapes, and so on, are only part of the lexical space; they do not exist in the value space. If you decode two documents with different whitespace, you will get identical values in the value space. Likewise, trailing zeros in the decimal part of a number does not change its value—it only changes its lexical representation. Some languages (like C) do make a distinction, and ※ While
So what this is saying is:
|
This is how I read the current spec, and why I'm asking for clarification on what "encoded with a fractional part" means. I'm with @karenetheridge on this. |
@ssilverman Did my response make sense? There's no such thing as "encoded internally" so I'm having trouble making out what the question here is (do you mean decoded?) Maybe I can explain the purpose of that section... The paragraph does not relate to decoding, only encoding. First, it points out that JSON numbers do not work the way integers & floats work in most programming languages. And the programming language should not impact how a JSON document is encoded (written), of course. So, it's suggesting that if (for example) a C encoder is writing a 64-bit floating point number with the value 1 into JSON, it should omit the fractional part: Or to refer to the ABNF: numbers in JSON should avoid use the Is that a better explanation? |
@awwright thanks- to clear up possible confusion, I think I focused on the wrong direction (reading in numbers). This clarifies that it's about writing them out, as this ensures that a C float 1.0, a C int 1 (different types in C), and a JavaScript number 1 or 1.0 (same type in JavaScript) produce the same representation in JSON. My one question is: Is JSON encoding even within the scope of our spec? I definitely see your point, I had just thought of our spec as focusing on the data model, and the conversion to and from JSON text as the provenance of the JSON RFC. But I haven't thought about it much so I'm open to being wrong here 😄 |
You're right, it's not. I no longer think a capitals SHOULD is appropriate here. We could probably take out the entire section, or significantly rewrite it. |
(I can't tell if there's still something to respond to here), so will just list a direct response to the first comment -- there's 2 considerations, whether something is an integer or not, and then the second piece which it looks like @handrews and @awwright responded to, so I hope I'm just repeating:
In predraft 7, this was potentially not
not an integer
an integer, even pre draft 7. What that paragraph is saying (clear or otherwise, obviously we can tidy up things) -- is, if you intend to write/encode the integer 1, prefer to encode it as JSON text as Hopefully that makes some sense and isn't wrong :)
Agree. |
TLDR: I point to this: "A fraction part is a decimal point followed by one or more digits." here: https://tools.ietf.org/html/rfc8259#section-6. ".0" is very explicity a "fractional part". To have otherwise literally disagrees with JSON spec.
So in draft-2019-09, is @awwright to answer your response, "encoding" and "decoding" mean something different to me. "encoding" doesn't necessarily mean "encode to bytes". "encode" and "decode" are relative concepts, and opposite to each other. For example, the mathematical number "1" is "internally encoded" as the 32-bit floating-point value 0x3f800000. As well, the concept of the mathematical number "1" can be "encoded as JSON" with "decoding" is just the opposite (direction) to "encoding", with the "what" being completely arbitrary. [Note: I don't want to hijack my own thread with a philosophical "what does encoding mean" discussion. I do hear your points but I disagree with some of them; we can hammer that out elsewhere? I agree with the lexical and value distinction. Lexical encodes the concept of the value, etc.] The first two "JSON encodings of 1" both have a fractional part, ".0" and ".00" respectively. The spec says that a number is an integer if it's "not encoded with a fractional part". I'm just suggesting language cleanup. AND make that Related are these questions:
I point to this: "A fraction part is a decimal point followed by one or more digits." here: https://tools.ietf.org/html/rfc8259#section-6. ".0" is very explicity a "fractional part". To have otherwise literally disagrees with JSON spec. I'll reiterate: I just want the language in core/section 6.3 to be clarified, whatever the decision. While I believe "1.0" has a fractional part, I'm not pushing for that specific solution here; that's for a different thread or issue. I just want clarification so I can help update tests and get a better understanding of the consensus decision regarding the JSON Schema spec. |
Yes, as per 6.1.1's '"integer" which matches any number with a zero fractional part.', and I misspoke, it's since draft 6, not since draft 7.
A mistake :) so we should fix that -- the mistake is quite simple, it used to be optional behavior (in pre-draft6) to call Going to ignore the middle of your comment not because I don't have opinions but just because I don't see the relevant parts of them yet :), and skip directly to question number 2 of yours:
This is the case the spec is trying to discourage.
Probably the JSON spec doesn't care and doesn't say anything about this but I didn't look -- but the JSON spec allows parsers to use floats if it feels like distinguishing (a la Python), and numbers for everything if it doesn't feel like distinguishing (a la JS). |
Ah, so it's the validator spec that describes this better. If the validation vocabulary is optional, that would explain why the |
This is the problematic thing for me: I'd much rather see a prohibition instead of a discouragement. If it's not disallowed, then that's where things become muddy and "undefined behaviour".
I'm surmising that there are more common languages that differentiate between integers and floating-point than those that don't, so why not cater to those instead? Just a thought. |
Most of the point of JSON Schema is portability :) If it didn't acknowledge the existence of languages like JS it's just excluding its ability to be used. |
the validation vocabulary isn't optional - it is used by the core (e.g. |
JSON itself doesn't restrict to 53-bit (significant digits) numbers, but that's what it seems like you're suggesting here. Just because some languages (JavaScript) restrict numbers to 53-bits, doesn't mean that should hobble the schema spec. That's also why I think making "1.0" an "integer" should be optional. It should be acknowledged, but I don't think it's a good thing to always restrict to lowest common feature set. "Acknowledge" isn't the same thing as "restrict". This discussion brings to mind the question, since you bring up "acknowledge the existence of JS": Is the whole reason that "1.0" is considered an "integer" because of JavaScript's number limitations? What's the original intent? Even JSON doesn't restrict this, why hobble for most other languages? |
Ah, for some reason, I thought only the core vocabulary was not optional. I see in the main schema definition that only "format" is optional. |
I'll be honest, I can't tell anymore what if anything we're arguing about :) Do you have a specific suggestion that I lost track of for what you think should change and to what? That may make it easier to know if we are, and apologies if you do and said it and it's me that can't concentrate for long enough. (Beyond of course "move zeroTerminatedFloats.json to required", which yes I definitely agree with) |
You're right, this is off into the weeds... (But to summarize and explain: I was giving an illustrative example of JavaScript's restrictions influencing specifications. I also don't agree that "1.0" should be correctly validated as an "integer" because "1.1" does not, and they're JSON-encoded the same way, with a valid "fractional part" according to the JSON specification. But, this isn't about changing the schema spec; I can disagree but still go along with it.) Anyways... here's what I'd like to see:
|
Agreed, PR welcome, or issue ticket certainly.
No personal preference, I agree the current language can be confusing but I don't have a suggestion on how to improve it, so it seems like this issue ticket should track any suggested better language, sound like you have a suggested one.
This I disagree with. The spec is unambiguous here. It may be programmers are used to things -- but the spec unambiguously says |
Oh, maybe you mean "unexpected" just in a layman's sense though? As a callout that this may be different from behavior elsewhere (e.g. in said programming languages)? That I don't disagree with, but I don't think is appropriate for the text of a specification, it's more something for external notes AIUI, but I'm no expert on the norms of what goes in a spec, so maybe if this is what you mean I also have no opinion about it :) |
JSON does restrict interoperability to IEEE binary64 (double precision) numbers.
Yeah, one of the things I dislike most about working on this spec is when the strict static typed language people show up and demand that JSON Schema do things that only benefit them and completely sabotage the system's usefulness for all other styles of languages. I pretty much always disregard those arguments now b/c people who make them never seem to understand any other perspective and I've wasted enough time trying to get them to look beyond their own needs. |
@notEthan The validator vocabulary is optional in the sense that you can build a JSON Schema system without it (there's an issue somewhere where such systems were discussed). But that doesn't mean you can run the test suite without it. Regarding {
"oneOf": [
{"oneOf": [false, {"title": "foo"}, false]},
{"oneOf": [true, false, {"title": "bar"}]}
]
} this schema will always annotate the instance with the title "foo" and never the title "bar". |
interesting. thank you for the clarification. |
This is exactly what happened, yeah. I think I requested that change, but didn't realize the file should move too. |
Side note: I'm really thrilled over how many people are digging into this and helping improve the spec. |
Move the tests from zeroTerminatedFloat.json to type, since as of draft 6, this behavior is required, and required to consider such numbers to be integers. See json-schema-org/json-schema-spec#898 for some discussion, not of the underlying requirement, simply of this file being in the wrong place.
Not sure why section 4.2.2 hasn't been brought up yet. Seems to very clearly define two numbers as equal if their values are equal. This means (Reading through the conversation, it seems to have landed on that conclusion anyway.) I'm writing up a PR to just remove the "Mathematical Integers" section. That should resolve this issue. |
There's still an ambiguity when defining an integer. (Note: This issue isn't here to debate how one defines an "integer" for JSON schemas.)
It says in core/section 6.3:
TLDR: Does "encoded with a fractional part" mean "encoded internally after processing the JSON" or "a floating point number having a zero fractional part encoded as the JSON
1.0
"? Another way of putting it: "Is the written JSON number1.0
considered to be encoded with a fractional part since it has the.0
there?"There's an ambiguity with how this is phrased. Immediately above this sentence, it mentions that numbers may have "different internal representations." This implies "a number encoded to an internal representation." However, when we speak of JSON documents, we can think of a number "encoded into its JSON format."
For the first case, the
int
1 is encoded internally as1
and has no fractional part, and thefloat
1.0 is encoded internally as1.0
, which has a fractional part of zero.For the second case, depending on the parser, the "JSON-encoded" number could end up as either an "internally-encoded" float with a zero fractional part in the case of JavaScript, or as either an "internally-encoded"
int
orfloat
or even "arbitrary-precision decimal or integer" in other languages, depending on the parser.So my question is: What does it mean to "not be encoded with a fractional part"? Does this mean: "no empty fractional part and no zero-valued fractional part", or does it just mean "no empty fractional part"? i.e. how does a "zero-valued fractional part" fit in here?
Also see: #79 and json-schema-org/JSON-Schema-Test-Suite#132
The text was updated successfully, but these errors were encountered: