Skip to content

Commit 718db1a

Browse files
committed
Expand docs on force-merge and global ordinals
Some small clarifications about force-merging and global ordinals, particularly that global ordinals are cheap on a single-segment index and how this relates to frozen indices. Fixes elastic#41687
1 parent 1c384fb commit 718db1a

File tree

3 files changed

+65
-23
lines changed

3 files changed

+65
-23
lines changed

docs/reference/how-to/search-speed.asciidoc

Lines changed: 13 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -299,13 +299,17 @@ leveraging the query cache.
299299
[float]
300300
=== Force-merge read-only indices
301301

302-
Indices that are read-only would benefit from being
303-
<<indices-forcemerge,merged down to a single segment>>. This is typically the
304-
case with time-based indices: only the index for the current time frame is
305-
getting new documents while older indices are read-only.
306-
307-
IMPORTANT: Don't force-merge indices that are still being written to -- leave
308-
merging to the background merge process.
302+
Indices that are read-only may benefit from being <<indices-forcemerge,merged
303+
down to a single segment>>. This is typically the case with time-based indices:
304+
only the index for the current time frame is getting new documents while older
305+
indices are read-only. Shards that have been force-merged into a single segment
306+
can use simpler and more efficient data structures to perform searches.
307+
308+
IMPORTANT: Do not force-merge indices to which you are still writing, or to
309+
which you will write again in the future. Instead, rely on the automatic
310+
background merge process to perform merges as needed to keep the index running
311+
smoothly. If you continue to write to a force-merged index then its performance
312+
may become much worse.
309313

310314
[float]
311315
=== Warm up global ordinals
@@ -315,7 +319,8 @@ Global ordinals are a data-structure that is used in order to run
315319
<<keyword,`keyword`>> fields. They are loaded lazily in memory because
316320
Elasticsearch does not know which fields will be used in `terms` aggregations
317321
and which fields won't. You can tell Elasticsearch to load global ordinals
318-
eagerly at refresh-time by configuring mappings as described below:
322+
eagerly when starting or refreshing a shard by configuring mappings as
323+
described below:
319324

320325
[source,js]
321326
--------------------------------------------------

docs/reference/indices/forcemerge.asciidoc

Lines changed: 32 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,24 @@
11
[[indices-forcemerge]]
22
=== Force Merge
33

4-
The force merge API allows to force merging of one or more indices through an
5-
API. The merge relates to the number of segments a Lucene index holds within
6-
each shard. The force merge operation allows to reduce the number of segments by
7-
merging them.
4+
The force merge API allows you to force a <<index-modules-merge,merge>> on the
5+
shards of one or more indices. Merging reduces the number of segments in each
6+
shard by merging some of them together, and also frees up the space used by
7+
deleted documents. Merging normally happens automatically, but sometimes it is
8+
useful to trigger a merge manually.
89

9-
This call will block until the merge is complete. If the http connection is
10-
lost, the request will continue in the background, and any new requests will
11-
block until the previous force merge is complete.
10+
WARNING: **Force merge should only be called against an index after you have
11+
finished writing to it.** Force merge can cause very large (>5GB) segments to
12+
be produced, and if you continue to write to such an index then the automatic
13+
merge policy will never consider these segments for future merges until they
14+
mostly consist of deleted documents. This can cause very large segments to
15+
remain in the index which can result in increased disk usage and worse search
16+
performance.
1217

13-
WARNING: Force merge should only be called against *read-only indices*. Running
14-
force merge against a read-write index can cause very large segments to be produced
15-
(>5Gb per segment), and the merge policy will never consider it for merging again until
16-
it mostly consists of deleted docs. This can cause very large segments to remain in the shards.
18+
Calls to this API block until the merge is complete. If the client connection
19+
is lost before completion then the force merge process will continue in the
20+
background. Any new requests to force merge the same indices will also block
21+
until the ongoing force merge is complete.
1722

1823
[source,js]
1924
--------------------------------------------------
@@ -22,6 +27,22 @@ POST /twitter/_forcemerge
2227
// CONSOLE
2328
// TEST[setup:twitter]
2429

30+
Force-merging can be useful with time-based indices and when using
31+
<<indices-rollover-index,rollover>>. In these cases each index only receives
32+
indexing traffic for a certain period of time, and once an index will receive
33+
no more writes its shards can be force-merged down to a single segment:
34+
35+
[source,js]
36+
--------------------------------------------------
37+
POST /logs-000001/_forcemerge?max_num_segments=1
38+
--------------------------------------------------
39+
// CONSOLE
40+
// TEST[setup:twitter]
41+
// TEST[s/logs-000001/twitter/]
42+
43+
This can be a good idea because single-segment shards can sometimes use simpler
44+
and more efficient data structures to perform searches.
45+
2546
[float]
2647
[[forcemerge-parameters]]
2748
==== Request Parameters

docs/reference/mapping/params/eager-global-ordinals.asciidoc

Lines changed: 20 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ efficiently compressed.
3030

3131
By default, global ordinals are loaded at search-time, which is the right
3232
trade-off if you are optimizing for indexing speed. However, if you are more
33-
interested in search speed, it could be interesting to set
33+
interested in search speed, it could be beneficial to set
3434
`eager_global_ordinals: true` on fields that you plan to use in terms
3535
aggregations:
3636

@@ -49,9 +49,25 @@ PUT my_index/_mapping
4949
// CONSOLE
5050
// TEST[s/^/PUT my_index\n/]
5151

52-
This will shift the cost from search-time to refresh-time. Elasticsearch will
53-
make sure that global ordinals are built before publishing updates to the
54-
content of the index.
52+
This will shift the cost of building the global ordinals from search-time to
53+
refresh-time. Elasticsearch will make sure that global ordinals are built
54+
before exposing to searches any changes to the content of the index.
55+
Elasticsearch will also eagerly build global ordinals when starting a new copy
56+
of a shard, such as when increasing the number of replicas or when relocating a
57+
shard onto a new node.
58+
59+
If a shard has been <<indices-forcemerge,force-merged>> down to a single
60+
segment then its global ordinals are identical to the ordinals for its unique
61+
segment, which means there is no extra cost for using global ordinals on such a
62+
shard. Note that for performance reasons you should only force-merge an index
63+
to which you will never write again.
64+
65+
On a <<frozen-indices,frozen index>>, global ordinals are discarded after each
66+
search and rebuilt again on the next search if needed or if
67+
`eager_global_ordinals` is set. This means `eager_global_ordinals` should not
68+
be used on frozen indices. Instead, force-merge an index to a single segment
69+
before freezing it so that global ordinals need not be built separately on each
70+
search.
5571

5672
If you ever decide that you do not need to run `terms` aggregations on this
5773
field anymore, then you can disable eager loading of global ordinals at any

0 commit comments

Comments
 (0)