-
Notifications
You must be signed in to change notification settings - Fork 0
Fix multi-op support and add support for oplog refs #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: tokumx-compatibility
Are you sure you want to change the base?
Conversation
…e that when we encounter an oplog.refs entry, the river doesn't bomb out on an NPE. * Fix oplog.refs oid entry handling * Fix UPDATE_ROW op * Validate that we got an item before processing it * Add explicit log message for NPE on namespace fetch * Fix findAndModify support for TokuMX
If the oplog.refs query isn't using indexOnly mode, I would consider I'm also surprised that you think continuing to use the same cursor will On 04/18/2014 03:35 PM, Chris Heald wrote:
|
I haven't tested it explicitly, but I was noticing that when watching the TRACE info in the log that the river was continually re-playing ops that had already been handled; my guess was that this was due to reopening new cursors (in fact, after all writes to our tokumx instance had stopped, the river continued to replicate the last chunk of ops over and over). Changing the loop so that it loops around hasNext solves the issue cleanly (and anecdotally, has made the whole shebang feel substantially faster, though that's completely unquantified). The cursor that gets opened is done so with If you run the Slurper with debug logging ,you'll notice that the 500ms timeout debug message is never logged - this is because hasNext never returns false for a tailable cursor! If the cursor does get closed, then attempting to read |
Ah, cool. In that case I would suggest creating and submitting a pull On Fri 18 Apr 2014 04:10:47 PM PDT, Chris Heald wrote:
|
So it looks like my updated unit tests all still pass after a merge of this pull request. I'm still a bit concerned, though, about the 'a' flag on oplog entries, when you're reading from a secondary. I think I need to figure out how to write a Toku-specific unit test for that. Probably using a mocked cursor. |
See this post, if you haven't already, for more info on the issue: http://www.tokutek.com/2014/04/on-tokumx-oplog-tailable-cursors-and-concurrency/ On second thought, that post implies that a cursor with the right flags set may never return an uncommitted entry. Maybe we can just add a verification of that, so we'll see if it's ever violated? |
I do think it would be a good idea to at least add that assert for truthiness of the "a" flag, even though we expect the cursor to never return an |
FWIW, we have this reading from our master and replicating collections with tens of millions of documents just fine. We've run into an issue when |
Your Float issue sounds like a serialization issue that should be raised on the upstream project. Java's float datatype does in fact includes infinity and NaN values, per the IEEE spec. |
Oh yeah, it definitely is. I just mean that as a "We aren't having any TokuMX-specific issues". |
Sadly, when I inserted a check for a=false at the beginning of processOplogEntry, it caused most of the TomuMX tests to fail. So apparently TokuMX does not block the tailable cursor until a=true, when you're reading from a secondary. And it's possible that's why Richard reported that the tests failed for him: The fix I've come up with has now been made into to a new pull request: |
I've expanded on my previous fixes and spent a fair amount of time with my head down in tokumx's oplog. Rather than flattening oplog entries, we treat all oplog entries as potential multi-op entries. When a ref op is encountered, the appropriate refs are found and iterated (using the same technique as the tokumx server's replication code). This results in full support for tokumx transactions and multi-ops, while still maintaining compatibility with the vanilla Mongo oplog format.
Additionally, the inner loop issue you previous mentioned has been resolved; checking the river status in the inner loop allows for dashboard interrupts of the river without requiring continual creation of new cursors.
The biggest outstanding issue is the oplog.refs lookup. The semantics are kinda fuzzy, but the general idea is that we will have one or more refs with monotonically-increasing seq numbers. The oplog.refs _id format is
{oid: ObjectId(...), seq: num}
, but an index only exists on_id
. The way the server finds entries is via:The only thing it does here that I haven't been able to replicate is the fourth parameter in findOne, which is a
requireIndex
boolean, which appears to cause the query to be executed in index-only mode. I haven't found a way to replicate that with the driver yet, and explain() shows that the query is not executed in indexOnly mode. Improving that should substantially improve performance, since the _id index is clustered.I've not yet written tests, but since you're working on that, I figured this code might be useful in the course of testing full compatibility.