Skip to content

Commit bd01f71

Browse files
author
Adam Locke
committed
[DOCS] Add documentation for near real-time search (elastic#57560)
* Adding documentation for near real-time search. * Adding link to NRT topic and clarifying some text. * Adding diagrams and incorporating changes from David T.
1 parent 3d95675 commit bd01f71

File tree

5 files changed

+35
-11
lines changed

5 files changed

+35
-11
lines changed

docs/reference/docs/refresh.asciidoc

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -31,15 +31,15 @@ visible at some point after the request returns.
3131

3232
[float]
3333
==== Choosing which setting to use
34-
35-
Unless you have a good reason to wait for the change to become visible always
36-
use `refresh=false`, or, because that is the default, just leave the `refresh`
37-
parameter out of the URL. That is the simplest and fastest choice.
34+
// tag::refresh-default[]
35+
Unless you have a good reason to wait for the change to become visible, always
36+
use `refresh=false` (the default setting). The simplest and fastest choice is to omit the `refresh` parameter from the URL.
3837

3938
If you absolutely must have the changes made by a request visible synchronously
40-
with the request then you must pick between putting more load on
41-
Elasticsearch (`true`) and waiting longer for the response (`wait_for`). Here
42-
are a few points that should inform that decision:
39+
with the request, you must choose between putting more load on
40+
Elasticsearch (`true`) and waiting longer for the response (`wait_for`).
41+
// end::refresh-default[]
42+
Here are a few points that should inform that decision:
4343

4444
* The more changes being made to the index the more work `wait_for` saves
4545
compared to `true`. In the case that the index is only changed once every
Loading
Loading

docs/reference/intro.asciidoc

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,9 @@ the {stack}. {ls} and {beats} facilitate collecting, aggregating, and
99
enriching your data and storing it in {es}. {kib} enables you to
1010
interactively explore, visualize, and share insights into your data and manage
1111
and monitor the stack. {es} is where the indexing, search, and analysis
12-
magic happen.
12+
magic happens.
1313

14-
{es} provides real-time search and analytics for all types of data. Whether you
14+
{es} provides near real-time search and analytics for all types of data. Whether you
1515
have structured or unstructured text, numerical data, or geospatial data,
1616
{es} can efficiently store and index it in a way that supports fast searches.
1717
You can go far beyond simple data retrieval and aggregate information to discover
@@ -46,8 +46,7 @@ as JSON documents. When you have multiple {es} nodes in a cluster, stored
4646
documents are distributed across the cluster and can be accessed immediately
4747
from any node.
4848

49-
When a document is stored, it is indexed and fully searchable in near
50-
real-time--within 1 second. {es} uses a data structure called an
49+
When a document is stored, it is indexed and fully searchable in <<near-real-time,near real-time>>--within 1 second. {es} uses a data structure called an
5150
inverted index that supports very fast full-text searches. An inverted index
5251
lists every unique word that appears in any document and identifies all of the
5352
documents each word occurs in.
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
[[near-real-time]]
2+
== Near real-time search
3+
The overview of <<documents-indices,documents and indices>> indicates that when a document is stored in {es}, it is indexed and fully searchable in _near real-time_--within 1 second. What defines near real-time search?
4+
5+
Lucene, the Java libraries on which {es} is based, introduced the concept of per-segment search. A _segment_ is similar to an inverted index, but the word _index_ in Lucene means "a collection of segments plus a commit point". After a commit, a new segment is added to the commit point and the buffer is cleared.
6+
7+
Sitting between {es} and the disk is the filesystem cache. Documents in the in-memory indexing buffer (<<img-pre-refresh,Figure 1>>) are written to a new segment (<<img-post-refresh,Figure 2>>). The new segment is written to the filesystem cache first (which is cheap) and only later is it flushed to disk (which is expensive). However, after a file is in the cache, it can be opened and read just like any other file.
8+
9+
[[img-pre-refresh]]
10+
.A Lucene index with new documents in the in-memory buffer
11+
image::images/lucene-in-memory-buffer.png["A Lucene index with new documents in the in-memory buffer"]
12+
13+
Lucene allows new segments to be written and opened, making the documents they contain visible to search ​without performing a full commit. This is a much lighter process than a commit to disk, and can be done frequently without degrading performance.
14+
15+
[[img-post-refresh]]
16+
.The buffer contents are written to a segment, which is searchable, but is not yet committed
17+
image::images/lucene-written-not-committed.png["The buffer contents are written to a segment, which is searchable, but is not yet committed"]
18+
19+
In {es}, this process of writing and opening a new segment is called a _refresh_. A refresh makes all operations performed on an index since the last refresh available for search. You can control refreshes through the following means:
20+
21+
* Waiting for the refresh interval
22+
* Setting the <<docs-refresh,?refresh>> option
23+
* Using the <<indices-refresh,Refresh API>> to explicitly complete a refresh (`POST _refresh`)
24+
25+
By default, {es} periodically refreshes indices every second, but only on indices that have received one search request or more in the last 30 seconds. This is why we say that {es} has _near_ real-time search: document changes are not visible to search immediately, but will become visible within this timeframe.

0 commit comments

Comments
 (0)