Skip to content

Commit 9a11f75

Browse files
committed
Introduce a constant_keyword field. (elastic#49713)
This field is a specialization of the `keyword` field for the case when all documents have the same value. It typically performs more efficiently than keywords at query time by figuring out whether all or none of the documents match at rewrite time, like `term` queries on `_index`. The name is up for discussion. I liked including `keyword` in it, so that we still have room for a `singleton_numeric` in the future. However I'm unsure whether to call it `singleton`, `constant` or something else, any opinions? For this field there is a choice between 1. accepting values in `_source` when they are equal to the value configured in mappings, but rejecting mapping updates 2. rejecting values in `_source` but then allowing updates to the value that is configured in the mapping This commit implements option 1, so that it is possible to reindex from/to an index that has the field mapped as a keyword with no changes to the source. Backport of elastic#49713
1 parent a267849 commit 9a11f75

File tree

16 files changed

+1184
-1
lines changed

16 files changed

+1184
-1
lines changed

docs/reference/how-to/search-speed.asciidoc

+112
Original file line numberDiff line numberDiff line change
@@ -418,3 +418,115 @@ The <<text,`text`>> field has an <<index-prefixes,`index_prefixes`>> option that
418418
indexes prefixes of all terms and is automatically leveraged by query parsers to
419419
run prefix queries. If your use-case involves running lots of prefix queries,
420420
this can speed up queries significantly.
421+
422+
[[faster-filtering-with-constant-keyword]]
423+
=== Use <<constant-keyword,`constant_keyword`>> to speed up filtering
424+
425+
There is a general rule that the cost of a filter is mostly a function of the
426+
number of matched documents. Imagine that you have an index containing cycles.
427+
There are a large number of bicycles and many searches perform a filter on
428+
`cycle_type: bicycle`. This very common filter is unfortunately also very costly
429+
since it matches most documents. There is a simple way to avoid running this
430+
filter: move bicycles to their own index and filter bicycles by searching this
431+
index instead of adding a filter to the query.
432+
433+
Unfortunately this can make client-side logic tricky, which is where
434+
`constant_keyword` helps. By mapping `cycle_type` as a `constant_keyword` with
435+
value `bicycle` on the index that contains bicycles, clients can keep running
436+
the exact same queries as they used to run on the monolithic index and
437+
Elasticsearch will do the right thing on the bicycles index by ignoring filters
438+
on `cycle_type` if the value is `bicycle` and returning no hits otherwise.
439+
440+
Here is what mappings could look like:
441+
442+
[source,console]
443+
--------------------------------------------------
444+
PUT bicycles
445+
{
446+
"mappings": {
447+
"properties": {
448+
"cycle_type": {
449+
"type": "constant_keyword",
450+
"value": "bicycle"
451+
},
452+
"name": {
453+
"type": "text"
454+
}
455+
}
456+
}
457+
}
458+
459+
PUT other_cycles
460+
{
461+
"mappings": {
462+
"properties": {
463+
"cycle_type": {
464+
"type": "keyword"
465+
},
466+
"name": {
467+
"type": "text"
468+
}
469+
}
470+
}
471+
}
472+
--------------------------------------------------
473+
474+
We are splitting our index in two: one that will contain only bicycles, and
475+
another one that contains other cycles: unicycles, tricycles, etc. Then at
476+
search time, we need to search both indices, but we don't need to modify
477+
queries.
478+
479+
480+
[source,console]
481+
--------------------------------------------------
482+
GET bicycles,other_cycles/_search
483+
{
484+
"query": {
485+
"bool": {
486+
"must": {
487+
"match": {
488+
"description": "dutch"
489+
}
490+
},
491+
"filter": {
492+
"term": {
493+
"cycle_type": "bicycle"
494+
}
495+
}
496+
}
497+
}
498+
}
499+
--------------------------------------------------
500+
// TEST[continued]
501+
502+
On the `bicycles` index, Elasticsearch will simply ignore the `cycle_type`
503+
filter and rewrite the search request to the one below:
504+
505+
[source,console]
506+
--------------------------------------------------
507+
GET bicycles,other_cycles/_search
508+
{
509+
"query": {
510+
"match": {
511+
"description": "dutch"
512+
}
513+
}
514+
}
515+
--------------------------------------------------
516+
// TEST[continued]
517+
518+
On the `other_cycles` index, Elasticsearch will quickly figure out that
519+
`bicycle` doesn't exist in the terms dictionary of the `cycle_type` field and
520+
return a search response with no hits.
521+
522+
This is a powerful way of making queries cheaper by putting common values in a
523+
dedicated index. This idea can also be combined across multiple fields: for
524+
instance if you track the color of each cycle and your `bicycles` index ends up
525+
having a majority of black bikes, you could split it into a `bicycles-black`
526+
and a `bicycles-other-colors` indices.
527+
528+
The `constant_keyword` is not strictly required for this optimization: it is
529+
also possible to update the client-side logic in order to route queries to the
530+
relevant indices based on filters. However `constant_keyword` makes it
531+
transparently and allows to decouple search requests from the index topology in
532+
exchange of very little overhead.

docs/reference/mapping/types.asciidoc

+5-1
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,8 @@ string:: <<text,`text`>> and <<keyword,`keyword`>>
5959

6060
<<histogram>>:: `histogram` for pre-aggregated numerical values for percentiles aggregations.
6161

62+
<<constant-keyword>>:: Specialization of `keyword` for the case when all documents have the same value.
63+
6264
[float]
6365
[[types-array-handling]]
6466
=== Arrays
@@ -130,4 +132,6 @@ include::types/text.asciidoc[]
130132

131133
include::types/token-count.asciidoc[]
132134

133-
include::types/shape.asciidoc[]
135+
include::types/shape.asciidoc[]
136+
137+
include::types/constant-keyword.asciidoc[]
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
[role="xpack"]
2+
[testenv="basic"]
3+
4+
[[constant-keyword]]
5+
=== Constant keyword datatype
6+
++++
7+
<titleabbrev>Constant keyword</titleabbrev>
8+
++++
9+
10+
Constant keyword is a specialization of the <<keyword,`keyword`>> field for
11+
the case that all documents in the index have the same value.
12+
13+
[source,console]
14+
--------------------------------
15+
PUT logs-debug
16+
{
17+
"mappings": {
18+
"properties": {
19+
"@timestamp": {
20+
"type": "date"
21+
},
22+
"message": {
23+
"type": "text"
24+
},
25+
"level": {
26+
"type": "constant_keyword",
27+
"value": "debug"
28+
}
29+
}
30+
}
31+
}
32+
--------------------------------
33+
34+
`constant_keyword` supports the same queries and aggregations as `keyword`
35+
fields do, but takes advantage of the fact that all documents have the same
36+
value per index to execute queries more efficiently.
37+
38+
It is both allowed to submit documents that don't have a value for the field or
39+
that have a value equal to the value configured in mappings. The two below
40+
indexing requests are equivalent:
41+
42+
[source,console]
43+
--------------------------------
44+
POST logs-debug/_doc
45+
{
46+
"date": "2019-12-12",
47+
"message": "Starting up Elasticsearch",
48+
"level": "debug"
49+
}
50+
51+
POST logs-debug/_doc
52+
{
53+
"date": "2019-12-12",
54+
"message": "Starting up Elasticsearch"
55+
}
56+
--------------------------------
57+
//TEST[continued]
58+
59+
However providing a value that is different from the one configured in the
60+
mapping is disallowed.
61+
62+
In case no `value` is provided in the mappings, the field will automatically
63+
configure itself based on the value contained in the first indexed document.
64+
While this behavior can be convenient, note that it means that a single
65+
poisonous document can cause all other documents to be rejected if it had a
66+
wrong value.
67+
68+
The `value` of the field cannot be changed after it has been set.
69+
70+
[[constant-keyword-params]]
71+
==== Parameters for constant keyword fields
72+
73+
The following mapping parameters are accepted:
74+
75+
[horizontal]
76+
77+
<<mapping-field-meta,`meta`>>::
78+
79+
Metadata about the field.
80+
81+
`value`::
82+
83+
The value to associate with all documents in the index. If this parameter
84+
is not provided, it is set based on the first document that gets indexed.
85+

docs/reference/rest-api/info.asciidoc

+4
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,10 @@ Example response:
7171
"available" : true,
7272
"enabled" : true
7373
},
74+
"constant_keyword" : {
75+
"available" : true,
76+
"enabled" : true
77+
},
7478
"enrich" : {
7579
"available" : true,
7680
"enabled" : true

server/src/main/java/org/elasticsearch/index/fielddata/plain/ConstantIndexFieldData.java

+3
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,9 @@ public Collection<Accountable> getChildResources() {
9191

9292
@Override
9393
public SortedSetDocValues getOrdinalsValues() {
94+
if (value == null) {
95+
return DocValues.emptySortedSet();
96+
}
9497
final BytesRef term = new BytesRef(value);
9598
final SortedDocValues sortedValues = new AbstractSortedDocValues() {
9699

x-pack/plugin/core/src/main/java/org/elasticsearch/license/XPackLicenseState.java

+4
Original file line numberDiff line numberDiff line change
@@ -626,6 +626,10 @@ public boolean isAnalyticsAllowed() {
626626
return allowForAllLicenses();
627627
}
628628

629+
public boolean isConstantKeywordAllowed() {
630+
return allowForAllLicenses();
631+
}
632+
629633
/**
630634
* @return true if security is available to be used with the current license type
631635
*/
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
/*
2+
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
3+
* or more contributor license agreements. Licensed under the Elastic License;
4+
* you may not use this file except in compliance with the Elastic License.
5+
*/
6+
7+
package org.elasticsearch.xpack.core;
8+
9+
import org.elasticsearch.common.io.stream.StreamInput;
10+
import org.elasticsearch.xpack.core.flattened.FlattenedFeatureSetUsage;
11+
12+
import java.io.IOException;
13+
import java.util.Objects;
14+
15+
public class ConstantKeywordFeatureSetUsage extends XPackFeatureSet.Usage {
16+
17+
public ConstantKeywordFeatureSetUsage(StreamInput input) throws IOException {
18+
super(input);
19+
}
20+
21+
public ConstantKeywordFeatureSetUsage(boolean available, boolean enabled) {
22+
super(XPackField.CONSTANT_KEYWORD, available, enabled);
23+
}
24+
25+
@Override
26+
public boolean equals(Object o) {
27+
if (this == o) return true;
28+
if (o == null || getClass() != o.getClass()) return false;
29+
FlattenedFeatureSetUsage that = (FlattenedFeatureSetUsage) o;
30+
return available == that.available && enabled == that.enabled;
31+
}
32+
33+
@Override
34+
public int hashCode() {
35+
return Objects.hash(available, enabled);
36+
}
37+
38+
}

x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/XPackField.java

+2
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,8 @@ public final class XPackField {
5555
public static final String ANALYTICS = "analytics";
5656
/** Name constant for the enrich plugin. */
5757
public static final String ENRICH = "enrich";
58+
/** Name constant for the constant-keyword plugin. */
59+
public static final String CONSTANT_KEYWORD = "constant_keyword";
5860

5961
private XPackField() {}
6062

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
/*
2+
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
3+
* or more contributor license agreements. Licensed under the Elastic License;
4+
* you may not use this file except in compliance with the Elastic License.
5+
*/
6+
7+
evaluationDependsOn(xpackModule('core'))
8+
9+
apply plugin: 'elasticsearch.esplugin'
10+
11+
esplugin {
12+
name 'constant-keyword'
13+
description 'Module for the constant-keyword field type, which is a specialization of keyword for the case when all documents have the same value.'
14+
classname 'org.elasticsearch.xpack.constantkeyword.ConstantKeywordMapperPlugin'
15+
extendedPlugins = ['x-pack-core']
16+
}
17+
archivesBaseName = 'x-pack-constant-keyword'
18+
19+
dependencies {
20+
compileOnly project(path: xpackModule('core'), configuration: 'default')
21+
testCompile project(path: xpackModule('core'), configuration: 'testArtifacts')
22+
}
23+
24+
integTest.enabled = false
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
/*
2+
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
3+
* or more contributor license agreements. Licensed under the Elastic License;
4+
* you may not use this file except in compliance with the Elastic License.
5+
*/
6+
7+
package org.elasticsearch.xpack.constantkeyword;
8+
9+
import org.elasticsearch.action.ActionListener;
10+
import org.elasticsearch.common.inject.Inject;
11+
import org.elasticsearch.license.XPackLicenseState;
12+
import org.elasticsearch.xpack.core.ConstantKeywordFeatureSetUsage;
13+
import org.elasticsearch.xpack.core.XPackFeatureSet;
14+
import org.elasticsearch.xpack.core.XPackField;
15+
16+
import java.util.Map;
17+
18+
public class ConstantKeywordFeatureSet implements XPackFeatureSet {
19+
20+
private final XPackLicenseState licenseState;
21+
22+
@Inject
23+
public ConstantKeywordFeatureSet(XPackLicenseState licenseState) {
24+
this.licenseState = licenseState;
25+
}
26+
27+
@Override
28+
public String name() {
29+
return XPackField.CONSTANT_KEYWORD;
30+
}
31+
32+
@Override
33+
public boolean available() {
34+
return licenseState.isConstantKeywordAllowed();
35+
}
36+
37+
@Override
38+
public boolean enabled() {
39+
return true;
40+
}
41+
42+
@Override
43+
public Map<String, Object> nativeCodeInfo() {
44+
return null;
45+
}
46+
47+
@Override
48+
public void usage(ActionListener<Usage> listener) {
49+
listener.onResponse(new ConstantKeywordFeatureSetUsage(available(), enabled()));
50+
}
51+
52+
}

0 commit comments

Comments
 (0)