Skip to content

Commit 4943bc0

Browse files
authored
HybridDirectory should mmap postings. (#52641)
Since version 8.4, `MMapDirectory` has an optimization to read long[] arrays directly in little endian order, which postings leverage. So it'd be more efficient to open postings with `MMapDirectory`. I refactored a bit the existing logic to better explain why every listed file extension is open with `mmap`.
1 parent a3a98c7 commit 4943bc0

File tree

1 file changed

+15
-3
lines changed

1 file changed

+15
-3
lines changed

server/src/main/java/org/elasticsearch/index/store/FsDirectoryFactory.java

+15-3
Original file line numberDiff line numberDiff line change
@@ -152,15 +152,27 @@ public void close() throws IOException {
152152
boolean useDelegate(String name) {
153153
String extension = FileSwitchDirectory.getExtension(name);
154154
switch(extension) {
155-
// We are mmapping norms, docvalues as well as term dictionaries, all other files are served through NIOFS
156-
// this provides good random access performance and does not lead to page cache thrashing.
155+
// Norms, doc values and term dictionaries are typically performance-sensitive and hot in the page
156+
// cache, so we use mmap, which provides better performance.
157157
case "nvd":
158158
case "dvd":
159159
case "tim":
160+
// We want to open the terms index and KD-tree index off-heap to save memory, but this only performs
161+
// well if using mmap.
160162
case "tip":
161-
case "cfs":
162163
case "dim":
164+
// Compound files are tricky because they store all the information for the segment. Benchmarks
165+
// suggested that not mapping them hurts performance.
166+
case "cfs":
167+
// MMapDirectory has special logic to read long[] arrays in little-endian order that helps speed
168+
// up the decoding of postings. The same logic applies to positions (.pos) of offsets (.pay) but we
169+
// are not mmaping them as queries that leverage positions are more costly and the decoding of postings
170+
// tends to be less a bottleneck.
171+
case "doc":
163172
return true;
173+
// Other files are either less performance-sensitive (e.g. stored field index, norms metadata)
174+
// or are large and have a random access pattern and mmap leads to page cache trashing
175+
// (e.g. stored fields and term vectors).
164176
default:
165177
return false;
166178
}

0 commit comments

Comments
 (0)