-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Add a version clock/identifier to documents #490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
a system-wide time-stamp is also an appropriate answer here, although time can be mismanaged more than just an ever-increasing integer for some of the intended usages. See #491 for time-stamp, but I still think a vector-clock type answer is useful in different ways. |
did I close this on accident, please re-open. |
Its certainly possible, and then the question is what level the version works on, it can be on a "resource" level (a document with type and id), index leve, or cluster wide level. All are not simple to implement. The resource level means that a real time mapping of id to version need to exists (either in memory, or in the index, which then requires real time aspect) and then handle concurrency. Index level and cluster wide level incremental version means that it needs to be maintained across the cluster (the versioning)... |
Yep, it is hard to do, and very hard for the user to implement. It is "clustering magic" that only the system can do really well... after the developers go through pain and suffering to build it. It's the cluster-wide vector-clock style problem, but well discussed in the world (I think Cassander just did it, people do it on ZooKeeper, and so on). You have the choice of "perfect accuracy" or getting it somewhat close. A system-wide timestamp is another approach where you just have to be close. You can state the guarantee of the version number, or of the timestamp and people work around the level you can provide. A bit loose at first, but tighter later. Cassandra just went through this, not sure what their approach was, and ZooKeeper folks of course do it. But I'm not saying their approaches are right/wrong or desirable (knowing not what they did). At the index level might be fine or System wide would be fine. I can't think of a reason (for my use cases) why either would be a problem. And a bit of loose accuracy isn't always a problem as you can always ask for a "loose idea of what the current number is" or a "sync them all up and give me an accurate number for sure" when you request it from outside the cluster. And inside the cluster it can be basically a reasonable approximate (i.e. handing out blocks to each node that consume them, but on sync-up they may discard their blocks to get somewhat back in order again; similar to how I think Oracle does sequence numbers in that they aren't always contiguous but are basically in order). So in short, timestamps might be easier (let the sys admin maintain similar clocks on all the servers, they seem to be good at that) and a version number can come later which might help with synchronization and balancing nodes and other fun things if those ever come about. |
FYI, as ES should not be your primary data store, it makes a good amount of sense to have this value propagated in from whatever your primary data store is. Granted, if your primary store doesn't have this feature, you're out of luck, but then again, if your primary data store doesn't have this, you probably don't need it in your ES setup. (disclosure: I am not an ES developer and this is just my personal thoughts) Thanks, |
This can be closed as of #594 ? |
Right, it can be closed. Also, @ppearcy request is solved with |
The system can manage a version clock/identifier for documents, so every new insert increases the version by 1 across the entire index (or system). This is useful for reindexing, soft purge-reloads (process add/updates, delete all with a version < N).
You must be able to query the index to ask its current value for the version number.
The text was updated successfully, but these errors were encountered: