Description
Today, mappings updates are performed in-place. This adds complexity for several reasons:
Synchronization
We have two places which are allowed to write to mappings:
- dynamic mapping updates,
- mapping updates coming from the API.
And basically every search request needs to read them in order to figure out things like the search analyzer to use and volatiles are used to make sure that reads do not block.
The fact that writes can come from two different places makes synchronization subtle. We need to make sure to use the same locks (for correctness) and in the same order (to avoid deadlocks).
No atomicity
Let's say that you perform a mapping update that modifies the mapping of two fields A and B. It's possible that a search request on these two fields will see the old mapping of field A and the new mapping of field B.
We also have a hack in order to prevent conflicting mappings to be half applied. Imagine that you want to change A and B, yet B would introduce a conflict (for instance by changing the index-time analyzer). In order to not apply the mapping update to A, we run the mapping update twice, first with a simulate
flag set in order to not apply the mappings but only detect inconsistencies and then if the first run succeeded, we apply the mapping update. But this is quite fragile as you need to make sure that all field mappers are implemented correctly and:
- do NOT modify anything if
simulate
istrue
- detect ALL inconsistencies if
simulate
istrue
(otherwise we are going to have a half-applied mapping)
Proposal
I think the following design would be more resilient:
- make mappings immutable
- document parsing generates mapping updates (se Mappings: dynamic mapping updates should use the same path as mapping updates from the API #9364)
- mapping updates create a new copy of the mapping (copy-on-write), which is atomatically swapped with the current mapping if the update succeeded and the resulting mapping is consistent
- at the beginning of search requests, we take a snapshot of the mappings and make sure to use it for the whole execution of the search request (to make sure everything inside this search request sees the same mappings)
NOTE: the fact that we copy on write does not mean that mapping updates would be costly, we just need to use appropriate data-structures like http://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/common/collect/CopyOnWriteHashMap.java