Documented the query cache module

clintongormley · areek · commit c38252dec3d4 · 2014-09-07T22:02:06.000-04:00
Related to #7161 and #7167
diff --git a/docs/reference/index-modules.asciidoc b/docs/reference/index-modules.asciidoc
@@ -72,6 +72,8 @@ include::index-modules/translog.asciidoc[]
 
 include::index-modules/cache.asciidoc[]
 
+include::index-modules/query-cache.asciidoc[]
+
 include::index-modules/fielddata.asciidoc[]
 
 include::index-modules/codec.asciidoc[]
diff --git a/docs/reference/index-modules/query-cache.asciidoc b/docs/reference/index-modules/query-cache.asciidoc
@@ -0,0 +1,145 @@
+[[index-modules-shard-query-cache]]
+== Shard query cache
+
+coming[1.4.0]
+
+When a search request is run against an index or against many indices, each
+involved shard executes the search locally and returns its local results to
+the _coordinating node_, which combines these shard-level results into a
+``global'' result set.
+
+The shard-level query cache module caches the local results on each shard.
+This allows frequently used (and potentially heavy) search requests to return
+results almost instantly. The query cache is a very good fit for the logging
+use case, where only the most recent index is being actively updated --
+results from older indices will be served directly from the cache.
+
+[IMPORTANT]
+==================================
+
+For now, the query cache will only  only cache the results of search requests
+where <<count,`?search_type=count`>>, so it will not cache `hits`,
+but it will cache `hits.total`,  <<search-aggregations,aggregations>>, and
+<<search-suggesters,suggestions>>.
+
+Queries that use `now` (see <<date-math>>) cannot be cached.
+==================================
+
+[float]
+=== Cache invalidation
+
+The cache is smart -- it keeps the same _near real-time_ promise as uncached
+search.
+
+Cached results are invalidated automatically whenever the shard refreshes, but
+only if the data in the shard has actually changed.  In other words, you will
+always get the same results from the cache as you would for an uncached search
+request.
+
+The longer the refresh interval, the longer that cached entries will remain
+valid. If the cache is full, the least recently used cache keys will be
+evicted.
+
+The cache can be expired manually with the <<indices-clearcache,`clear-cache` API>>:
+
+[source,json]
+------------------------
+curl -XPOST 'localhost:9200/kimchy,elasticsearch/_cache/clear?query_cache=true'
+------------------------
+
+[float]
+=== Enabling caching by default
+
+The cache is not enabled by default, but can be enabled when creating a new
+index as follows:
+
+[source,json]
+-----------------------------
+curl -XPUT localhost:9200/my_index -d'
+{
+  "settings": {
+    "index.cache.query.enable": true
+  }
+}
+'
+-----------------------------
+
+It can also be enabled or disabled dynamically on an existing index with the
+<<indices-update-settings,`update-settings`>> API:
+
+[source,json]
+-----------------------------
+curl -XPUT localhost:9200/my_index/_settings -d'
+{ "index.cache.query.enable": true }
+'
+-----------------------------
+
+[float]
+=== Enabling caching per request
+
+The `query_cache` query-string parameter can be used to enable or disable
+caching on a *per-query* basis.  If set, it overrides the index-level setting:
+
+[source,json]
+-----------------------------
+curl localhost:9200/my_index/_search?search_type=count&query_cache=true -d'
+{
+  "aggs": {
+    "popular_colors": {
+      "terms": {
+        "field": "colors"
+      }
+    }
+  }
+}
+'
+-----------------------------
+
+IMPORTANT: If your query uses a script whose result is not deterministic (e.g.
+it uses a random function or references the current time) you should set the
+`query_cache` flag to `false` to disable caching for that request.
+
+[float]
+=== Cache key
+
+The whole JSON body is used as the cache key.  This means that if the JSON
+changes -- for instance if keys are output in a different order -- then the
+cache key will not be recognised.
+
+TIP: Most JSON libraries support a _canonical_ mode which ensures that JSON
+keys are always emitted in the same order. This canonical mode can be used in
+the application to ensure that a request is always serialized in the same way.
+
+[float]
+=== Cache settings
+
+The cache is managed at the node level, and has a default maximum size of `1%`
+of the heap.  This can be changed in the `config/elasticsearch.yml` file with:
+
+[source,yaml]
+--------------------------------
+indices.cache.query.size: 2%
+--------------------------------
+
+Also, you can use the +indices.cache.query.expire+ setting to specify a TTL
+for cached results, but there should be no reason to do so.  Remember that
+stale results are automatically invalidated when the index is refreshed. This
+setting is provided for completeness' sake only.
+
+[float]
+=== Monitoring cache usage
+
+The size of the cache (in bytes) and the number of evictions can be viewed
+by index, with the <<indices-stats,`indices-stats`>> API:
+
+[source,json]
+------------------------
+curl -XPOST 'localhost:9200/_stats/query_cache?pretty&human'
+------------------------
+
+or by node with the <<cluster-nodes-stats,`nodes-stats`>> API:
+
+[source,json]
+------------------------
+curl -XPOST 'localhost:9200/_nodes/stats/indices/query_cache?pretty&human'
+------------------------
diff --git a/docs/reference/indices/clearcache.asciidoc b/docs/reference/indices/clearcache.asciidoc
@@ -9,9 +9,9 @@ associated with one ore more indices.
 $ curl -XPOST 'http://localhost:9200/twitter/_cache/clear'
 --------------------------------------------------
 
-The API, by default, will clear all caches. Specific caches can be
-cleaned explicitly by setting `filter`, `field_data` or `id_cache` to
-`true`.
+The API, by default, will clear all caches. Specific caches can be cleaned
+explicitly by setting `filter`, `field_data`, `query_cache` coming[1.4.0],
+or `id_cache` to `true`.
 
 All caches relating to a specific field(s) can also be cleared by
 specifying `fields` parameter with a comma delimited list of the
diff --git a/docs/reference/indices/stats.asciidoc b/docs/reference/indices/stats.asciidoc
@@ -39,20 +39,32 @@ specified as well in the URI. Those stats can be any of:
                 groups). The `groups` parameter accepts a comma separated list of group names.
                 Use `_all` to return statistics for all groups.
 
-`warmer`:: 		Warmer statistics.
-`merge`:: 		Merge statistics.
-`fielddata`:: 		Fielddata statistics.
-`flush`:: 		Flush statistics.
-`completion`:: 		Completion suggest statistics.
-`refresh`:: 	Refresh statistics.
-`suggest`:: 	Suggest statistics.
-
-Some statistics allow per field granularity which accepts a list comma-separated list of included fields. By default all fields are included:
+`completion`::  Completion suggest statistics.
+`fielddata`::   Fielddata statistics.
+`flush`::       Flush statistics.
+`merge`::       Merge statistics.
+`query_cache`:: <<index-modules-shard-query-cache,Shard query cache>> statistics. coming[1.4.0]
+`refresh`::     Refresh statistics.
+`suggest`::     Suggest statistics.
+`warmer`::      Warmer statistics.
+
+Some statistics allow per field granularity which accepts a list
+comma-separated list of included fields. By default all fields are included:
 
 [horizontal]
-`fields`::	List of fields to be included in the statistics. This is used as the default list unless a more specific field list is provided (see below).
-`completion_fields`::	List of fields to be included in the Completion Suggest statistics
-`fielddata_fields`:: 	List of fields to be included in the Fielddata statistics
+`fields`::
+
+    List of fields to be included in the statistics. This is used as the
+    default list unless a more specific field list is provided (see below).
+
+`completion_fields`::
+
+    List of fields to be included in the Completion Suggest statistics.
+
+`fielddata_fields`::
+
+    List of fields to be included in the Fielddata statistics.
+
 
 Here are some samples:
 
diff --git a/docs/reference/search/aggregations.asciidoc b/docs/reference/search/aggregations.asciidoc
@@ -104,9 +104,9 @@ are being aggregated. The values are typically extracted from the fields of the
 can also be generated using scripts.
 
 Numeric metrics aggregations are a special type of metrics aggregation which output numeric values. Some aggregations output
-a single numeric metric (e.g. `avg`) and are called `single-value numeric metrics aggregation`, others generate multiple 
-metrics (e.g. `stats`) and are called `multi-value numeric metrics aggregation`. The distinction between single-value and 
-multi-value numeric metrics aggregations plays a role when these aggregations serve as direct sub-aggregations of some 
+a single numeric metric (e.g. `avg`) and are called `single-value numeric metrics aggregation`, others generate multiple
+metrics (e.g. `stats`) and are called `multi-value numeric metrics aggregation`. The distinction between single-value and
+multi-value numeric metrics aggregations plays a role when these aggregations serve as direct sub-aggregations of some
 bucket aggregations (some bucket aggregations enable you to sort the returned buckets based on the numeric metrics in each bucket).
 
 
@@ -125,6 +125,18 @@ aggregated for the buckets created by their "parent" bucket aggregation.
 There are different bucket aggregators, each with a different "bucketing" strategy. Some define a single bucket, some
 define fixed number of multiple buckets, and others dynamically create the buckets during the aggregation process.
 
+[float]
+=== Caching heavy aggregations
+
+coming[1.4.0]
+
+Frequently used aggregations (e.g. for display on the home page of a website)
+can be cached for faster responses. These cached results are the same results
+that would be returned by an uncached aggregation -- you will never get stale
+results.
+
+See <<index-modules-shard-query-cache>> for more details.
+
 
 include::aggregations/metrics.asciidoc[]
 
diff --git a/docs/reference/search/request-body.asciidoc b/docs/reference/search/request-body.asciidoc
@@ -46,39 +46,50 @@ And here is a sample response:
 [float]
 === Parameters
 
-[cols="<,<",options="header",]
-|=======================================================================
-|Name |Description
-|`timeout` |A search timeout, bounding the search request to be executed
-within the specified time value and bail with the hits accumulated up to
-that point when expired. Defaults to no timeout. See <<time-units>>.
-
-|`from` |The starting from index of the hits to return. Defaults to `0`.
-
-|`size` |The number of hits to return. Defaults to `10`.
-
-|`search_type` |The type of the search operation to perform. Can be
-`dfs_query_then_fetch`, `dfs_query_and_fetch`, `query_then_fetch`,
-`query_and_fetch`. Defaults to `query_then_fetch`. See
-<<search-request-search-type,_Search Type_>> for
-more details on the different types of search that can be performed.
-
-|coming[1.4.0] `terminate_after` |The maximum number of documents to collect for
-each shard, upon reaching which the query execution will terminate early.
-If set, the response will have a boolean field `terminated_early` to
-indicate whether the query execution has actually terminated_early.
-Defaults to no terminate_after.
-|=======================================================================
-
-Out of the above, the `search_type` is the one that can not be passed
-within the search request body, and in order to set it, it must be
-passed as a request REST parameter.
-
-The rest of the search request should be passed within the body itself.
-The body content can also be passed as a REST parameter named `source`.
-
-Both HTTP GET and HTTP POST can be used to execute search with body.
-Since not all clients support GET with body, POST is allowed as well.
+[horizontal]
+`timeout`::
+
+    A search timeout, bounding the search request to be executed within the
+    specified time value and bail with the hits accumulated up to that point
+    when expired. Defaults to no timeout. See <<time-units>>.
+
+`from`::
+
+    The starting from index of the hits to return. Defaults to `0`.
+
+`size`::
+
+    The number of hits to return. Defaults to `10`.
+
+`search_type`::
+
+    The type of the search operation to perform. Can be
+    `dfs_query_then_fetch`, `dfs_query_and_fetch`, `query_then_fetch`,
+    `query_and_fetch`. Defaults to `query_then_fetch`. See
+    <<search-request-search-type,_Search Type_>> for more.
+
+`query_cache`::
+
+    coming[1.4.0] Set to `true` or `false` to enable or disable the caching
+    of search results for requests where `?search_type=count`, ie
+    aggregations and suggestions.  See <<index-modules-shard-query-cache>>.
+
+`terminate_after`::
+
+    coming[1.4.0] The maximum number of documents to collect for each shard,
+    upon reaching which the query execution will terminate early. If set, the
+    response will have a boolean field `terminated_early` to indicate whether
+    the query execution has actually terminated_early. Defaults to no
+    terminate_after.
+
+
+Out of the above, the `search_type` and the `query_cache` must be passed as
+query-string parameters. The rest of the search request should be passed
+within the body itself. The body content can also be passed as a REST
+parameter named `source`.
+
+Both HTTP GET and HTTP POST can be used to execute search with body. Since not
+all clients support GET with body, POST is allowed as well.
 
 
 include::request/query.asciidoc[]