Skip to content

Any reason serializer.is_human_readable() == true? #297

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
fjarri opened this issue Aug 31, 2021 · 7 comments
Closed

Any reason serializer.is_human_readable() == true? #297

fjarri opened this issue Aug 31, 2021 · 7 comments
Labels
tracked-in-jira Ticket filed in Mongo's Jira system

Comments

@fjarri
Copy link

fjarri commented Aug 31, 2021

I have some types naturally representable as bytestrings, and serializers for them that make use of is_human_readable(), encoding bytes in b64 if it is true and using raw bytes otherwise. I noticed that in this crate is_human_readable() for Serializer is not overridden, so it returns true, which seems strange for a binary format. Is that intentional?

@patrickfreed
Copy link
Contributor

Hi @fjarri, thanks for reporting this issue! For the raw BSON serializer, this was certainly an oversight and something we need to fix.

For the Bson enum based serializer, it's a little less clear. Yes the format that Bson models is binary based, but the serializer produces the model not actual BSON. On the other hand, there still is the Bson::Binary variant which can accept plain-old-bytes, so there could still be savings in treating it as a non-human-readable format. All that being said, given that the Bson based serializer has existed with is_human_readable() == true for a long time now, I'm inclined to leave it as-is in case it breaks any existing users' code.

I filed RUST-1009 to cover the work for updating the raw BSON serializer. Thanks again for reporting this!

@patrickfreed
Copy link
Contributor

This was fixed in #298 and will be included in the 2.0.0 release.

@dgrijalva
Copy link

Thank you for this helpful clarification. I'd like to raise a use case where this solution is challenging: UUIDs. Specifically, I would like to be able to encode one type with multiple serializers (eg. BSON and JSON).

The UUID type itself uses the is_human_readable flag to switch between binary and string output. Their implementation is not great for BSON specifically because it doesn't contemplate subtypes, but I can overcome this by providing my own implementation.

The current recommended solution seems to be using the #[serde(with=... helper. This does work, but forgetting this attribute introduces a pretty subtle bug that can be hard to unwind, especially once you have millions of records in mongo that use an unexpected identifier type. Additionally, it means that any type I annotate this way can only be encoded as BSON.

A related issue with this solution is that to_vec and to_bson will now produce materially different values for any type that switches on the is_human_readable flag. This is likely to produce some difficult to track down bugs for any user who's not anticipating this behavior. At a minimum, this behavior should be called out in the documentation. Further, it's not currently possible to opt-in to the behavior of to_vec in other contexts. The mongodb library expects Document or Into<Document>, which means I can't use to_vec when building a query. This could be resolved in a backward compatible way by adding something like to_bson_binary, which would also help users understand that this behavior difference exists.

One of the things that's great about serde is that it (usually) allows me to solve specific encoding challenges once, directly on the type, then have it transparently behave correctly wherever it's used. This proves to be incredibly valuable as a team and codebase grows.

This challenge isn't unique to BSON. UUIDs are, in general, difficult to use well with serde because so many serialization formats treat UUIDs as a special case, but serde's intermediary format doesn't carry enough information.

Thank you for this very useful library.

@patrickfreed
Copy link
Contributor

Thanks for the detailed comment @dgrijalva, you bring up some very good points. The mismatch between to_bson and to_vec does seem concerning--I've filed RUST-1022 to discuss it with the team. In all likelihood, we're going to need to introduce something like the to_bson_binary you suggest. Returning false from is_human_readable from both serializers seems like it may have been the correct path, but unfortunately now that 2.0.0 has been released, I don't think we'll be able to fix it that way without breaking compatibility.

As a workaround in the meantime, you can first serialize to a Vec and then read it into a Document like so:

let bytes = bson::to_vec(my_query)?;
let doc = Document::from_reader(bytes.as_slice())?

This isn't the most performant way to do this, but it at least avoids having to write any special serialization code.

Regarding the concerns of forgetting #[serde(with = "...")], unfortunately there isn't much we can do right now. Maybe once specialization becomes stabilized, we can do something like what was suggested in #301, which could make this situation a lot better.

@dgrijalva
Copy link

That all makes sense. Thanks for the update. Sounds like I just missed the cutoff. I'm glad you're sticking to the semver rules. They feel painful for scenarios like this, but it's better for the community in the long run. We put a somewhat clumsy workaround in place for now. I'll keep an eye out for to_bson_binary, or whatever solution you decide on. Thank you!

@patrickfreed
Copy link
Contributor

Quick update on this: we just released v2.1.0-beta, which includes support for configuring the Serializer/Deserializer used in functions like to_bson and from_bson to be human readable or not (by default it still is).

e.g.

#[derive(Debug, Deserialize, Serialize)]
struct MyData {
    a: String,
}

let data = MyData { a: "ok".to_string() };
let options = SerializerOptions::builder().human_readable(false).build();
let bson = bson::to_bson_with_options(&data, options)?;

let options = DeserializerOptions::builder().human_readable(false).build();
let rt_data: MyData = bson::from_bson_with_options(bson, options)?;

@dgrijalva
Copy link

That seems great! We'll have a look at it. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tracked-in-jira Ticket filed in Mongo's Jira system
Projects
None yet
Development

No branches or pull requests

3 participants