-
Notifications
You must be signed in to change notification settings - Fork 144
RUST-1082 Implement Serialize
and Deserialize
for raw BSON types
#318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RUST-1082 Implement Serialize
and Deserialize
for raw BSON types
#318
Conversation
} | ||
|
||
#[test] | ||
fn all_types() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is the same, but I moved the struct definition and initialization out of the body to reuse it in other tests.
"max_key": { "$maxKey": 1 }, | ||
}); | ||
|
||
assert_eq!(serde_json::to_value(&v).unwrap(), json); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test only verifies that serialization works, since deserializing binary values from JSON is very limited (serde_json treats byte slices as arrays of integers, which Binary
can't round trip from). I filed RUST-1091 to cover the work for adding more robust tests in this area though.
/// Hint provided to the deserializer via `deserialize_newtype_struct` as to the type of thing | ||
/// being deserialized. | ||
#[derive(Debug, Clone, Copy)] | ||
enum DeserializerHint { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I introduced this hint type as a way of tracking whether or not we're deserializing raw BSON types. If we are, we use slightly different visiting patterns. For example, a document will be visited as a slice of bytes rather than as a map. It's also used for the UUID serialization hinting behavior introduced in #314. The hint communicated for raw BSON the same way it is for UUIDs, i.e. via a specifically named newtype struct. See deserialize_newtype_struct
for where that happens.
) -> Result<V::Value> | ||
/// Deserialize the next element in the BSON, using the type of the element along with the | ||
/// provided hint to determine how to visit the data. | ||
fn deserialize_next<V>(&mut self, visitor: V, hint: DeserializerHint) -> Result<V::Value> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to borrow from the underling BSON data, we had to introduce new MapAccess
implementations for the various types that didn't get them in the initial implementation (e.g. Regex
).
} | ||
|
||
#[derive(Debug, Clone, Copy)] | ||
enum SerializerHint { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to deserialization, we use a hint here to indicate what the next call to serialize_bytes
is serializing.
@@ -59,6 +65,34 @@ struct ParseError { | |||
string: String, | |||
} | |||
|
|||
struct FieldVisitor<'a, T>(&'a str, PhantomData<T>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This type is used to deserialize the value provided by a valid corpus test. We need to do some runtime hackery since the field name isn't the same per test and is instead provided in the field_name
field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Extremely minor comments only.
On a larger scope, it's difficult for me to keep all the cogs in my head to get a sense of the overall machine operation - a walkthrough would be very welcome at some point.
src/de/raw.rs
Outdated
/// Whether the first key has been deserialized yet or not. | ||
deserialized_first: bool, | ||
|
||
/// Whether or not this document being deserialized is for anarray or not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: "anarray" => "an array"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
src/raw/bson.rs
Outdated
where | ||
E: serde::de::Error, | ||
{ | ||
crate::de::convert_unsigned_to_signed_raw(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: might be worth just use
ing convert_unsigned_to_signed_raw
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree, this code is pretty hard to read due to the sheer amount of boilerplate involved, especially across multiple serializer/deserializer implementations. I filed RUST-1098 to cover the work for writing up a doc that explains it all at a high level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm with one typo fix
src/raw/document.rs
Outdated
RawBson::Document(d) => Ok(d), | ||
|
||
// For non-BSON formats, RawDocument gets serialized as bytes, so we need to deserialize | ||
// from them here too. For BSON, the deserialzier will return an error if it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// from them here too. For BSON, the deserialzier will return an error if it | |
// from them here too. For BSON, the deserializer will return an error if it |
RUST-1082
This PR adds
Serialize
andDeserialize
implementations for the raw BSON types.e.g. the following is now possible:
The implementations behave as follows:
&RawDocument
:{ "$rawBsonDocument": <bytes> }
for binary, non-BSON formats? This will allowRawBson
to round trip, but that's about it.RawDocumentBuf
: same asRawDocument
. In a future PR it will be able to deserialize from arbitrary maps, but that requires the ability to append.RawBson
:Bson
.Bson
&RawArray
&RawDocument
, but uses a different newtype name to signal an array.RawJavaScriptCodeWithScope
{ "$code": <borrowed string>, "$scope": <borrowed string> }
RawDbPointer
{ "$ref": <borrowed string>, "$id": <objectid> }
ObjectId
{ "$oid": <12 byte slice> }
RawBinary
{ "$binary": { "bytes": <borrowed bytes>, subType: <u8> } }
in the serde data modelRawBinary
to round trip on non-BSON formats, the separate bytes model (rather than base64 string model) was required. It is also much faster.Binary
to exhibit this behavior, as it would technically be a breaking change.RawRegex
:DateTime
: updated to also deserialize from{ "$date": <i64> }
to avoid a string allocation during deserialization.