-
Notifications
You must be signed in to change notification settings - Fork 144
Add raw bson types for zero-copy handling of bson wire data. #136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I've filed https://jira.mongodb.org/browse/RUST-284 to track this as well as the corresponding issue |
…map for better validation and flexibility.
Breaks the test to deserialize to a HashMap<String, String>, but why is that a valid bson representation? I think we need a stricter separation between bson and extended json format. Serde shouldn't try to paper over that.
Thanks! As requested, I'll find some time this week to separate this PR into multiple parts, to ease the burden on reviewers. I'm envisioning three parts:
I think those also go in order roughly from least invasive to most invasive, so the first one should be relatively safe to merge and experiment with, before merging the others. |
I had a similar but slightly different idea of how to split it up:
I agree that we can do most of these changes without needing to mess with the existing code too much. In particular, I think we still want to keep the existing Document and Bson APIs intact, but just offer raw versions as well for improved performance, as well as conversions between the raw and non-raw types. Assuming everything goes well and the raw types do prove to be faster (which I expect will be the case), we could change the driver to receive/return raw types so that the user gets the faster versions by default but can still opt-into the traditional Document type by just calling a method to convert what they send over and receive from the driver. |
Apologies for the CI failures; we just updated the repository to check for clippy lints and merged a change to fix/whitelist all of them. I think if you rebase with master, everything should turn green. |
I pulled back all the serde related changes. One thing I'd like to clean up a little more is the error handling. It would be nice to unify the return types of the field accessors ( So we could:
The reason for having the extra Error Variants for raw documents are as follows: First, the raw types can reveal errors with malformed bson at query time, whereas that cost is paid at instantiation time with the parsed types, so the Second, and to my mind more importantly, is that the raw type can return a Utf8 error, if mongo returns a string that is not valid UTF-8. This is something I've seen in production mongo databases, and often there is no way to recover (that I've found) with other drivers. IIRC, the go driver crashes during fetching, and the python driver also crashes unless you pass a poorly documented Unfortunately, either Option 1 or 2 would be a semver incompatible change. It would be fairly trivial in the case of Option 2, but Option 1 would leave the library feeling overall more consistent in the end. |
One benefit of unifying the return types is that the following expressions would be interchangeable, which seems desireable to me. let oid1 = rawdoc.get("_id").and_then(RawBson::as_object_id);
let oid2 = rawdoc.get_object_id("_id");
let oid3 = OrderedDocument::from(rawdoc).get("_id").and_then(Bson::as_object_id);
let oid4 = OrderedDocument::from(rawdoc).get_object_id("_id");
let oid5 = rawdoc.get("_id").map(Bson::from).and_then(Bson::as_object_id); |
If you run the included benchmarks you should see that repeated random access on the raw type eventually gets slower than the parsed type, as you would expect (because access to a given element takes linear time), but in-order iteration and direct access to a single element are significantly faster.
Note that one important use of the parsed type is constructing bson documents from scratch. The raw type isn't well suited to this, so methods like |
Oops. I just saw that the |
I've published this work as a standalone crate (https://github.com/jcdyer/rawbson / https://crates.io/crates/rawbson). Whenever you get ready to incorporate this into bson-rust, let me know, and we can figure out the best way to integrate it. For now, I'll close this PR. |
Fixes #133
RawBsonDocBuf
to hold aVec<u8>
of bson data, and a RawBson<'a> for a borrowed version of the same, along with various other types for supporting individual values and specific types within a bson type.Bson
type, and for deserializing to custom structs.Caveats:
Looking forward to your feedback.