Skip to content

RUST-882 Remove lossy From<u64> impl for Bson #280

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 23, 2021

Conversation

patrickfreed
Copy link
Contributor

@patrickfreed patrickfreed commented Jul 22, 2021

RUST-882

This PR removes the From<u64> impl for Bson that could be lossy in the event that the u64 > i64::MAX. I opted to not replace it with a TryFrom implementation as I would have to specify an error, and I wasn't sure if we wanted to create yet another error type just for this case. An alternative could be to use <i64 as TryFrom<u64>>::Error, but I wasn't sure if that would be weird. Let me know what you guys think.

The lossiness in From<u32> was fixed in #276.

Copy link
Contributor

@abr-egn abr-egn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

In the lack of concrete requests, I think it's fine to not have a TryFrom implementation; if we do need to add one, I'd lean towards reusing <i64 as TryFrom<u64>>::Error for it.

Copy link
Contributor

@isabelatkinson isabelatkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, agree w/ Abraham on holding off until someone requests the TryFrom functionality

@patrickfreed patrickfreed merged commit 88ae843 into mongodb:master Jul 23, 2021
lukewestby added a commit to lukewestby/bson-rust that referenced this pull request Jul 31, 2021
@bugadani
Copy link
Contributor

bugadani commented Aug 9, 2021

As this broke my code, I have a question: why? What data loss happened with the From<u64> impl? As I understand, i64 <-> u64 conversions in rust should be a no-op (as per the Rustonomicon), which for me suggests it doesn't change the binary representation, or, in other words, wraps the numeric value.

@patrickfreed
Copy link
Contributor Author

What the rustnomicon means by no-op is that the bits themselves aren't changed as part of the cast since they all still fit in the new integer type; however, when switching from a signed integer to an unsigned one, the interpretation of those bits does change now that one of them is reserved for the sign bit.

The issue is that there are values of u64 which cannot be represented as i64s, since the max value of u64 is greater than the max positive value of an i64. Prior to this change, conversions of such values would overflow and end up being negative, since what used to be interpreted as a numeric bit is now interpreted as a negative sign.

e.g. in 2.0.0-beta.2, we could have the following:

let myval: u64 = i64::MAX as u64 + 10;
println!("{}", myval);
let doc = doc! {                     
    "u64": myval                     
};                                   
println!("{}", doc);                 

And it would print:

9223372036854775817
{ "u64": -9223372036854775799 }

Note that for this same reason, the standard library only implements From<u32> for i64 and not From<u64>, and we've elected to do this too. This forces users to explicitly handle the conversions and any possible overflows, rather than silently converting to the wrong value.

Hope this helps, and please let us know if you have any further questions!

@bugadani
Copy link
Contributor

bugadani commented Aug 9, 2021

I understand the theory behind the decision, but I'm approaching from the UX side: this change just makes using unsigned ints annoying. All I want to do is serialize a u64, I don't really care about the underlying representation. Normally I'd assume the u2i feature handles anything required.

The readme states the following: "Any types that implement Serialize and Deserialize can be used in this way." This is unfortunately not true due to restrictions like in this PR.

A lossy operation means loss of information. In case of unsigned - signed conversion, if the types are of the same size, there is no loss of information, only a change in interpretation. This means, that even if the intermediate bson representation wraps my big_num around to a negative value, serializing and then deserializing my data would return the original value.

@patrickfreed
Copy link
Contributor Author

The readme states the following: "Any types that implement Serialize and Deserialize can be used in this way." This is unfortunately not true due to restrictions like in this PR.

Good point; I filed RUST-968 to discuss with the team the possibility of enabling u2i by default going forward, which will make this statement more true.

A lossy operation means loss of information. In case of unsigned - signed conversion, if the types are of the same size, there is no loss of information, only a change in interpretation. This means, that even if the intermediate bson representation wraps my big_num around to a negative value, serializing and then deserializing my data would return the original value.

The bits aren't being lost sure, but the type information certainly is. The database, the shell, other tools, etc. can't know the value is unsigned-but-stored-as-signed while the original Rust application did. For example, this means queries based on ordering (e.g. $lt, $gt, etc) will not function properly and could very easily lead to bugs. While this restriction may be frustrating for your use case, it is a much needed safeguard for the majority of our users to avoid such bugs, and it's also in line with the restrictions included in std as I mentioned before. If you would like to opt-in to using signed integers to store unsigned integers--even in the event of overflow--you're able to, but we don't want to add explicit support for this behavior due to the risks involved with it.

@bugadani
Copy link
Contributor

bugadani commented Aug 9, 2021

For example, this means queries based on ordering (e.g. $lt, $gt, etc) will not function properly

This is an excellent point and an oversight on my end, for sure.

If you would like to opt-in to using signed integers to store unsigned integers--even in the event of overflow--you're able to, but we don't want to add explicit support for this behavior due to the risks involved with it.

I guess this is fair enough. A bit uncomfortable, but not impossible.

Thanks!

@patrickfreed
Copy link
Contributor Author

No problem, and don't hesitate to reach out if you have any further questions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants