Add a version clock/identifier to documents #490

apatrida · 2010-11-08T03:16:47Z

The system can manage a version clock/identifier for documents, so every new insert increases the version by 1 across the entire index (or system). This is useful for reindexing, soft purge-reloads (process add/updates, delete all with a version < N).

You must be able to query the index to ask its current value for the version number.

apatrida · 2010-11-08T03:18:58Z

a system-wide time-stamp is also an appropriate answer here, although time can be mismanaged more than just an ever-increasing integer for some of the intended usages. See #491 for time-stamp, but I still think a vector-clock type answer is useful in different ways.

apatrida · 2010-11-08T03:19:27Z

did I close this on accident, please re-open.

kimchy · 2010-11-09T12:21:10Z

Its certainly possible, and then the question is what level the version works on, it can be on a "resource" level (a document with type and id), index leve, or cluster wide level.

All are not simple to implement. The resource level means that a real time mapping of id to version need to exists (either in memory, or in the index, which then requires real time aspect) and then handle concurrency.

Index level and cluster wide level incremental version means that it needs to be maintained across the cluster (the versioning)...

apatrida · 2010-11-09T12:54:08Z

Yep, it is hard to do, and very hard for the user to implement. It is "clustering magic" that only the system can do really well... after the developers go through pain and suffering to build it. It's the cluster-wide vector-clock style problem, but well discussed in the world (I think Cassander just did it, people do it on ZooKeeper, and so on).

You have the choice of "perfect accuracy" or getting it somewhat close. A system-wide timestamp is another approach where you just have to be close. You can state the guarantee of the version number, or of the timestamp and people work around the level you can provide. A bit loose at first, but tighter later. Cassandra just went through this, not sure what their approach was, and ZooKeeper folks of course do it. But I'm not saying their approaches are right/wrong or desirable (knowing not what they did).

At the index level might be fine or System wide would be fine. I can't think of a reason (for my use cases) why either would be a problem. And a bit of loose accuracy isn't always a problem as you can always ask for a "loose idea of what the current number is" or a "sync them all up and give me an accurate number for sure" when you request it from outside the cluster. And inside the cluster it can be basically a reasonable approximate (i.e. handing out blocks to each node that consume them, but on sync-up they may discard their blocks to get somewhat back in order again; similar to how I think Oracle does sequence numbers in that they aren't always contiguous but are basically in order).

So in short, timestamps might be easier (let the sys admin maintain similar clocks on all the servers, they seem to be good at that) and a version number can come later which might help with synchronization and balancing nodes and other fun things if those ever come about.

ppearcy · 2010-11-12T18:45:00Z

FYI, as ES should not be your primary data store, it makes a good amount of sense to have this value propagated in from whatever your primary data store is.

Granted, if your primary store doesn't have this feature, you're out of luck, but then again, if your primary data store doesn't have this, you probably don't need it in your ES setup.

(disclosure: I am not an ES developer and this is just my personal thoughts)

Thanks,
Paul

karussell · 2011-05-21T20:04:33Z

This can be closed as of #594 ?

kimchy · 2011-05-21T20:25:44Z

Right, it can be closed. Also, @ppearcy request is solved with version_type set to external.

kimchy closed this as completed May 21, 2011

apatrida mentioned this issue Apr 25, 2014

Reindex from _source by document ID or Query #492

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a version clock/identifier to documents #490

Add a version clock/identifier to documents #490

apatrida commented Nov 8, 2010

apatrida commented Nov 8, 2010

apatrida commented Nov 8, 2010

kimchy commented Nov 9, 2010

apatrida commented Nov 9, 2010

ppearcy commented Nov 12, 2010

karussell commented May 21, 2011

kimchy commented May 21, 2011

Add a version clock/identifier to documents #490

Add a version clock/identifier to documents #490

Comments

apatrida commented Nov 8, 2010

apatrida commented Nov 8, 2010

apatrida commented Nov 8, 2010

kimchy commented Nov 9, 2010

apatrida commented Nov 9, 2010

ppearcy commented Nov 12, 2010

karussell commented May 21, 2011

kimchy commented May 21, 2011