-
Notifications
You must be signed in to change notification settings - Fork 144
RUST-284: Incorporate rawbson code from rawbson = "0.2.1" in mod raw #229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for the pull request @jcdyer! Is it all right with you if I start pushing commits to this branch? We're interested in getting this out to users as soon as possible (specifically in our 2.0 release, which we just released an alpha for), so it will go more quickly if we can write the code for changes we want to make ourselves. We definitely would love your input as a reviewer for the code we write! |
by all means! Use it as you see fit. I'm happy to answer questions, discuss implementation ideas, review code, or whatever helps. |
src/raw/mod.rs
Outdated
DocBuf { | ||
data: data.into_boxed_slice(), | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function doesn't actually do anything unsafe. I marked it that way out of a suspicion that it might someday in the future, so it's marked to prevent backwards compatibility semver issues, but everything interacts with the underlying slice in a completely safe manner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting; I was wondering why it was marked as unsafe. I think if it ever does need to become unsafe, we would probably prefer to just add a separate method and deprecate this one rather than marking it as unsafe even though it currently doesn't need to be, so I've removed that annotation in my follow-up commit.
src/raw/mod.rs
Outdated
/// Error to indicate that either a value was empty or it contained an unexpected | ||
/// type, for use with the direct getters. | ||
#[derive(Debug, PartialEq)] | ||
pub enum RawError { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to merge this with the primary crate Error type. The main difference is needing to be able to signal malformed bson when accessing elements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, we don't actually have a single error type in the crate, but each submodule defines their own (e.g. document::ValueAccessError
, oid::Error
, etc.). Given the current situation, it makes sense to leave this as-is for now, and maybe in a 2.0 release we can considering unifying the crate under a single error type.
b3888e6
to
2c30181
Compare
Thanks so much for this PR! We're going to hold off for a couple of weeks before merging the serde implementation in order to focus on the API that we need to be able to update the driver for the 2.0 release, but we definitely plan on using it once we're ready, and we'll make sure that you get proper credit for it. I've pushed some small API and documentation changes as a separate commit and added the rest of the team as reviewers to the pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jcdyer, thanks again for the PR on this--it looks great so far! To avoid possibly duplicating any work though, @saghm will handle making any code changes @isabelatkinson and I request going forward, but feel free to chime in during the review as you see fit.
To summarize some of my comments from the other day, I'd caution against thinking about (or advertising) In some cases, as the benchmarks show, Where there is clear similarity between the two types, it makes sense to keep a similar API, but document mutation is not one of those situations. Raw document can support some mutations fairly simply, but others are quite complex, and should be discouraged: RawDocument appendEasy. O(1) performance (assuming a reasonably sized element)
Requires an O(n) read to guarantee non-duplicate keys, because you have to iterate over the document to compare with the existing keys. RawArray appendNote: there currently isn't an owning RawArray type, since that's not part of the bson spec. If such a type were created, this could be implemented. Easy, O(n) performance:
Non-size-changing element modification(e.g., changing the value of a fixed size element, without changing its type)
General element modification / element replacementSlow, moderately complicated
Modifying a nested document:Slow and complicated (O(N * depth of nesting)). For a nested key x.y.z:
Note, this algorithmic complexity is exacerbated by the fact that the data is owned by the top-level document, so inner documents aren't even capable of resizing themselves to accomodate modified elements. SummaryI would recommend supporting the following operations:
Then document that other modifications should be made either by iterating over the existing document and collecting to a fresh RawDocument, or by deserializing to a more appropriate data structure to perform mutations (a Document or custom type), then serializing back to RawDocument. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed overview @jcdyer ! I definitely agree that RawDocument
is not a strict improvement performance-wise for all use-cases. I think the lion's share of the perf benefits will come from within the driver actually, since it should allow us to skip an intermediate serialization step for insertions and potentially reduce a lot of heap allocations in other places. It's unclear if deserialization will improve significantly though, since that requires mostly key lookups.
I also agree with the summary / suggested operations. One additional operation that may be worth including is entry deletion, since that should be pretty straightforward as well (will be O(n) as with all other key-based operations though). This would make it easier for a user to do a replace should they have to, though again we should try to encourage them to use a different type if they plan to do mutations of that sort. I guess they could also trivially implement it via a filter + collect + non-key-checking append, though, so maybe its not necessary.
src/raw/error.rs
Outdated
@@ -0,0 +1,27 @@ | |||
/// An error that occurs when attempting to parse raw BSON bytes. | |||
#[derive(Debug, PartialEq)] | |||
pub enum Error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As part of refactoring this into a separate submodule, I removed the Raw
prefix from the Error
and Result
types defined here. This is consistent with how we define errors elsewhere in the library (e.g. bson::de::Error
and bson::ser::Error
), and users will need to import it with bson::raw::{Error, Result}
anyhow.
As for RawDocument and friends, I do not think we should remove the prefix here. raw::Error
and raw::Result
are essentially semantically equivalent to other Error
and Result
types both within this library and externally, Document
and RawDocument
are semantically different in significant ways. Put another way, I don't think it would be confusing for readers of code using this library to gloss over what the exact Result or Error type is given that their expectations of how they work will be correct, but glossing over whether a regular Document or a RawDocument is used seems much more dangerous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the tests aren't compiling after the module reorganization.
src/raw/array.rs
Outdated
} | ||
} | ||
|
||
/// An iterator over borrwed raw BSON array values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: typo here (should be borrowed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
src/raw/doc.rs
Outdated
/// # use bson::raw::{RawDocument, Error}; | ||
/// let doc = RawDocument::new(b"\x13\x00\x00\x00\x02hi\x00\x06\x00\x00\x00y'all\x00\x00".to_vec())?; | ||
/// let mut iter = doc.iter(); | ||
/// let (key, value) = iter.next().unwrap()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it necessary to have both unwrap
and ?
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The return type is Option<Result<,>>. It could also be .unwrap().unwrap().
Tangential to this, are there any plans to derive(Copy) for ObjectId? Its current impl is stack allocated, fixed size, and smaller than the pointer for a &[u8] on 64 bit platforms, and would clean up a lot of code. |
Yeah, I think this probably is worth doing. Feel free to open a separate PR with this change! |
Where does this stand? From what I can see, it seems to have stalled out again. Is there anything I can do to help get it moving again? |
We had to put this on the backburner for a little while to focus on getting the 2.0 release of the driver out as well as a few other internal priorities. We'll be picking this up again in the near term future after the 2.0 stable release is out. Thanks again for your continued patience and all the work you've put into this so far, and sorry about the delay! |
d6f1b7b
to
b552bb8
Compare
Hey @jcdyer, just wanted to let you know that we've picked this up again now that the 2.0 has been released. As a first step, I'll be tidying things up and addressing any of the outstanding comments from the last round of reviews. After that's done, I'll open it up to the team for another review. As with last time, feel free to chime in with any comments as you see fit. Thanks again for getting the ball rolling on this! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The recent batch of commits made a number of API and implementation changes, most notably:
- Rewrote
RawBson
as an enum. If we want to introduce an API that matches the oldRawBson
(i.e. a struct that wraps a reference to a slice of bytes), then I think anEntry
API might be a good opportunity for that. - Renamed
RawDocumentRef
toRawDoc
. This name matches the convention ofString
/str
, and it is also slightly more correct, since the type isn't a reference itself but has to be referred to as part of one, being dynamically sized.- Something like
RawDocumentBuf
/RawDocument
would also work fine IMO (matchesPathBuf
/Path
)
- Something like
- Renamed
RawArray
toRawArr
. This was done for similar reasons as theRawDoc
rename, and it also makes room for an owned version ofRawArray
should we need it. - Baked in a bunch of bounds checking into the code that parses the raw BSON into Rust types.
- Introduced a
ValueAccessError
type for the typed getters. Originally, we were planning on moving away from this paradigm, but given that we decided to keep it for the typed getters onDocument
, I think it made sense to reintroduce it here. - Broke off error enums into
ErrorKind
andValueAccessErrorKind
to enable storing of the key / other metadata on the actual error structs. - Implemented
Copy
forDecimal128
- Introduced a
RawDbPointer
type - Removed the
RawTimestamp
type - we can just rely on the regular one since it'sCopy
- Hooked up the raw document API to the corpus tests
- Touched up the documentation
- Moved code into separate files (
document.rs
anditer.rs
are new files containing mostly code that already existing in the PR,props.rs
was simply moved to the test directory) - Marked
RawDoc
andRawArr
as#[repr(transparent)]
, which provides the necessary safety guarantees for doing the casts we do innew_unchecked
. This means we'll need to commit to the byte layout of these types too in addition to their APIs.
I filed a ticket to cover the work for introducing an append
method to RawDocument
, which can be done separately from this PR.
I think this PR is good to be reviewed again, let me know if you have any questions about the above changes.
src/raw/array.rs
Outdated
} | ||
} | ||
|
||
/// An iterator over borrwed raw BSON array values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
src/raw/doc.rs
Outdated
/// Note that the internal structure of the bytes representing the | ||
/// BSON elements is _not_ validated at all by this method. If the | ||
/// bytes do not conform to the BSON spec, then method calls on | ||
/// the RawDocument will return Errors where appropriate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// the RawDocument will return Errors where appropriate. | |
/// the RawDoc will return Errors where appropriate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed this and a few of the other non-linkified references to types in this file.
src/raw/array.rs
Outdated
inner: Iter<'a>, | ||
} | ||
|
||
impl<'a> Iterator for RawArrIter<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One validation we're missing here is checking to make sure that the keys in the array are well-formed (i.e. incrementing integers starting with 0). This could lead to some confusing behavior in which malformed BSON bytes that include an "\x04" tag followed by a regular document are successfully interpreted as an array. I think we should add some checks here to ensure that the document we're dealing with is a proper array-style document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surprisingly, accepting array documents with messed up indexes is actually required by the bson corpus tests. e.g. the "Single Element array with index set incorrectly to ab" test is included under "valid": https://github.com/mongodb/specifications/blob/master/source/bson-corpus/tests/array.json#L23
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, that's interesting.
Cargo.toml
Outdated
uuid = { version = "0.8.1", features = ["serde", "v4"] } | ||
proptest = "0.10" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like this crate is now at 1.0.0 -- should we bump the dependency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
src/raw/array.rs
Outdated
inner: Iter<'a>, | ||
} | ||
|
||
impl<'a> Iterator for RawArrIter<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, that's interesting.
007137f
to
eecedc8
Compare
Sounds good, went ahead and renamed |
Thanks again for spearheading this @jcdyer; we're super happy to have this finally merged! |
@patrickfreed That's fantastic! Looking forward to seeing the new speediness in action. |
Implements #133