Skip to content

Can't serialize/deserialize uuid properly #311

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nanoqsh opened this issue Sep 27, 2021 · 5 comments
Closed

Can't serialize/deserialize uuid properly #311

nanoqsh opened this issue Sep 27, 2021 · 5 comments
Labels
tracked-in-jira Ticket filed in Mongo's Jira system

Comments

@nanoqsh
Copy link

nanoqsh commented Sep 27, 2021

I want to store my objects with uuid identifier. One of my types looks like this:

#[derive(Deserialize, Serialize)]
pub struct Account<S: State> {
    #[serde(rename = "_id")]
    id: S::Id,
    name: String,
    balance: u64,
}

I use State trait to express two states: New and Saved. It allows me to conveniently create an object of Account<New> type without uuid anywhere in the application, then id will be an unit type. So I can't directly use uuid_as_binary attribute, this throws a build error. But that wouldn't be a problem if I could serialize an uuid to string. The doc says it serialize as string by default:

#[derive(Serialize, Deserialize)]
struct Foo {
    // serializes as a String.
    uuid: uuid::Uuid,
    // ..
}

But when I save any object to db, then fetch all (for debug) I see it stores as binary format (with BinarySubtype::Generic):

{ "_id": Binary(0x0, Nz6CfTaSS82T3q/pqjrhXA==), "name": "nano", "balance": 0 }

Ok, I can use binary as well, but then I can't find an object by _id because when I try to do something like:

let uuid = Uuid::parse_str(id)?;
let filter = doc! { "_id": uuid };
let res = collection::<Account<Saved>>
    .find_one(filter, None)
    .await?;

The filter document serializes as:

{ "_id": Binary(0x4, Nz6CfTaSS82T3q/pqjrhXA==) }

(with BinarySubtype::Uuid)

How can these different serializations be made to work together? Of course I can use uuid_as_binary attribute, but that will break my application's flexibility and invariant checking. Maybe there is some more suitable solution?
In addition, it seems to me that various serialization behavior is unintuitive and more like an implementation flaw. It would be more convenient if uuid was just (de)serialized as string by default.
I also think it would be great to implement this with type wrappers. For example, Uuid will be (de)serialized as a string, and Binary(Uuid) as binary data. Then it would be more convenient to use them in generic code.

Project's Cargo.toml dependencies
[dependencies]
rocket = { version = "0.5.0-rc.1", features = ["tls", "json", "uuid"] }
mongodb = { version = "2.0.0", features = ["bson-uuid-0_8"] }
uuid = { version = "0.8.2", features = ["v4"] }
serde = "1.0"
@pooooodles pooooodles added the tracked-in-jira Ticket filed in Mongo's Jira system label Sep 27, 2021
@dizda
Copy link

dizda commented Sep 30, 2021

I have the same problem.
It used to save the UUIDs as Strings which I'm satisfied with, but now it saves them as Binary, which I don't want. I'd prefer to keep as what it was, human readable way.

@dizda
Copy link

dizda commented Sep 30, 2021

I also have the same issue with the struct IpAddr which converts it into an Object.
@nanoqsh I've temporary fixed both issues with the lib https://crates.io/crates/serde-strz

@patrickfreed
Copy link
Contributor

Hi @nanoqsh and @dizda, thanks for reporting this issue. I had a response drafted last week but forgot to actually post it here, sorry for the delay! There indeed are some issues with using Uuid with bson right now that we hope to address soon.

The current situation

The problems stem from the fact that the Uuid type is provided by another crate, so we can't control how it gets serialized in BSON. For reference, Uuid's Serialize implementation serializes it as a String if the Serializer is human readable and as bytes if it isn't.

Due to what was potentially an oversight (see #297), the raw serializer (used in bson::to_vec) is not human readable whereas the one used in bson::to_bson and bson::to_document is, which means a Uuid will be serialized as a string in one and as a Binary with subtype 0 in the other. What further complicates things is that UUIDs in BSON are meant to be Binary values with subtype 4 (not 0), but the only way for our serializers to know to do that is via something like uuid_as_binary, since as mentioned before, we don't control Uuid's Serialize implementation.

Since we do own the Bson type, we can control the From<Uuid> impl for Bson, and so we've implemented to do the correct thing and convert to a Binary with subtype 4. The doc! macro calls through to this implementation, which is why you're seeing that result.

So in summary: a Uuid can be serialized in the following ways depending on context:

  • A string, via bson::to_bson or bson::to_document
  • A binary value with subtype 0, via bson::to_vec (used in the driver)
  • A binary value with subtype 4, via uuid_as_binary or From

Next steps

The documentation is currently limited and incorrect as you point out. We should update it to include some of the info in the comment here as well update the incorrect "serializes as a string" comment.

Unfortunately, directly serializing Uuid without any sort of configuration will continue to produce imperfect results regardless of what we do, since again we do not control its Serialize implementation. However, we can introduce our own bson::Uuid type that wraps a uuid::Uuid and does behave properly, as you suggest (RUST-465). This will make using Uuid relatively straightforward with BSON, but users will still have to remember to use it instead of Uuid directly.

We can also introduce integration with serde_with (RUST-1024), which will allow users to do something like the following:

#[serde_as]
#[derive(Debug, Serialize, Deserialize)]
struct MyData {
#[serde_as(as = "Option<bson::Binary>")
    date: Option<uuid::Uuid>
}

Lastly, we can introduce some mechanism that makes the to_bson and to_vec serializers agree on whether they're human readable or not (RUST-1022). Unfortunately, this won't really improve the situation for Uuids, since if both serializers aren't human readable, they'll produce subtype 0 Binary values which are still incorrect.


Thanks again for reporting this issue. Be sure to follow the RUST tickets linked above to track progress towards improving this situation. In the meantime, you'll need to implement custom (de)serialization logic to ensure everything is consistent. Sorry for any inconvenience this may cause, and we hope to have some improvements out soon!

@patrickfreed
Copy link
Contributor

We just released version 2.1.0-beta of the bson crate, which includes a new bson::Uuid type, serde_with integration, and a way to set Serializer/Deserializer as not human readable. Going to close this issue out now, but please let us know if you run into further issues or have any other suggestions!

@jonasbb
Copy link
Contributor

jonasbb commented Nov 17, 2021

@patrickfreed It would be better to include #[serde(default)] in your example, like so. Otherwise the date field becomes mandatory and no longer defaults to None if missing.

#[serde_as]
#[derive(Debug, Serialize, Deserialize)]
struct MyData {
    #[serde(default)]
    #[serde_as(as = "Option<bson::Uuid>")
    date: Option<uuid::Uuid>
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tracked-in-jira Ticket filed in Mongo's Jira system
Projects
None yet
Development

No branches or pull requests

5 participants