Skip to content

Commit f3317eb

Browse files
authored
Add support for 'flattened object' fields. (#42541)
This commit merges the `object-fields` feature branch. The new 'flattened object' field type allows an entire JSON object to be indexed into a field, and provides limited search functionality over the field's contents.
1 parent b92de28 commit f3317eb

File tree

47 files changed

+3891
-44
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

47 files changed

+3891
-44
lines changed

docs/reference/mapping/types.asciidoc

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -42,8 +42,6 @@ string:: <<text,`text`>> and <<keyword,`keyword`>>
4242

4343
<<parent-join>>:: Defines parent/child relation for documents within the same index
4444

45-
<<alias>>:: Defines an alias to an existing field.
46-
4745
<<rank-feature>>:: Record numeric feature to boost hits at query time.
4846

4947
<<rank-features>>:: Record numeric features to boost hits at query time.
@@ -54,6 +52,11 @@ string:: <<text,`text`>> and <<keyword,`keyword`>>
5452

5553
<<search-as-you-type>>:: A text-like field optimized for queries to implement as-you-type completion
5654

55+
<<alias>>:: Defines an alias to an existing field.
56+
57+
<<flattened>>:: Allows an entire JSON object to be indexed as a single field.
58+
59+
5760
[float]
5861
=== Multi-fields
5962

@@ -82,6 +85,8 @@ include::types/date.asciidoc[]
8285

8386
include::types/date_nanos.asciidoc[]
8487

88+
include::types/flattened.asciidoc[]
89+
8590
include::types/geo-point.asciidoc[]
8691

8792
include::types/geo-shape.asciidoc[]
Lines changed: 188 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
[role="xpack"]
2+
[testenv="basic"]
3+
4+
[[flattened]]
5+
=== Flattened datatype
6+
7+
By default, each subfield in an object is mapped and indexed separately. If
8+
the names or types of the subfields are not known in advance, then they are
9+
<<dynamic-mapping, mapped dynamically>>.
10+
11+
The `flattened` type provides an alternative approach, where the entire
12+
object is mapped as a single field. Given an object, the `flattened`
13+
mapping will parse out its leaf values and index them into one field as
14+
keywords. The object's contents can then be searched through simple queries
15+
and aggregations.
16+
17+
This data type can be useful for indexing objects with a large or unknown
18+
number of unique keys. Only one field mapping is created for the whole JSON
19+
object, which can help prevent a <<mapping-limit-settings, mappings explosion>>
20+
from having too many distinct field mappings.
21+
22+
On the other hand, flattened object fields present a trade-off in terms of
23+
search functionality. Only basic queries are allowed, with no support for
24+
numeric range queries or highlighting. Further information on the limitations
25+
can be found in the <<supported-operations, Supported operations>> section.
26+
27+
NOTE: The `flattened` mapping type should **not** be used for indexing all
28+
document content, as it treats all values as keywords and does not provide full
29+
search functionality. The default approach, where each subfield has its own
30+
entry in the mappings, works well in the majority of cases.
31+
32+
An flattened object field can be created as follows:
33+
[source,js]
34+
--------------------------------
35+
PUT bug_reports
36+
{
37+
"mappings": {
38+
"properties": {
39+
"title": {
40+
"type": "text"
41+
},
42+
"labels": {
43+
"type": "flattened"
44+
}
45+
}
46+
}
47+
}
48+
49+
POST bug_reports/_doc/1
50+
{
51+
"title": "Results are not sorted correctly.",
52+
"labels": {
53+
"priority": "urgent",
54+
"release": ["v1.2.5", "v1.3.0"],
55+
"timestamp": {
56+
"created": 1541458026,
57+
"closed": 1541457010
58+
}
59+
}
60+
}
61+
--------------------------------
62+
// CONSOLE
63+
// TESTSETUP
64+
65+
During indexing, tokens are created for each leaf value in the JSON object. The
66+
values are indexed as string keywords, without analysis or special handling for
67+
numbers or dates.
68+
69+
Querying the top-level `flattened` field searches all leaf values in the
70+
object:
71+
72+
[source,js]
73+
--------------------------------
74+
POST bug_reports/_search
75+
{
76+
"query": {
77+
"term": {"labels": "urgent"}
78+
}
79+
}
80+
--------------------------------
81+
// CONSOLE
82+
83+
To query on a specific key in the flattened object, object dot notation is used:
84+
[source,js]
85+
--------------------------------
86+
POST bug_reports/_search
87+
{
88+
"query": {
89+
"term": {"labels.release": "v1.3.0"}
90+
}
91+
}
92+
--------------------------------
93+
// CONSOLE
94+
95+
[[supported-operations]]
96+
==== Supported operations
97+
98+
Because of the similarities in the way values are indexed, `flattened`
99+
fields share much of the same mapping and search functionality as
100+
<<keyword, `keyword`>> fields.
101+
102+
Currently, flattened object fields can be used with the following query types:
103+
104+
- `term`, `terms`, and `terms_set`
105+
- `prefix`
106+
- `range`
107+
- `match` and `multi_match`
108+
- `query_string` and `simple_query_string`
109+
- `exists`
110+
111+
When querying, it is not possible to refer to field keys using wildcards, as in
112+
`{ "term": {"labels.time*": 1541457010}}`. Note that all queries, including
113+
`range`, treat the values as string keywords. Highlighting is not supported on
114+
`flattened` fields.
115+
116+
It is possible to sort on an flattened object field, as well as perform simple
117+
keyword-style aggregations such as `terms`. As with queries, there is no
118+
special support for numerics -- all values in the JSON object are treated as
119+
keywords. When sorting, this implies that values are compared
120+
lexicographically.
121+
122+
Flattened object fields currently cannot be stored. It is not possible to
123+
specify the <<mapping-store, `store`>> parameter in the mapping.
124+
125+
[[flattened-params]]
126+
==== Parameters for flattened object fields
127+
128+
The following mapping parameters are accepted:
129+
130+
[horizontal]
131+
132+
<<mapping-boost,`boost`>>::
133+
134+
Mapping field-level query time boosting. Accepts a floating point number,
135+
defaults to `1.0`.
136+
137+
`depth_limit`::
138+
139+
The maximum allowed depth of the flattened object field, in terms of nested
140+
inner objects. If a flattened object field exceeds this limit, then an
141+
error will be thrown. Defaults to `20`.
142+
143+
<<doc-values,`doc_values`>>::
144+
145+
Should the field be stored on disk in a column-stride fashion, so that it
146+
can later be used for sorting, aggregations, or scripting? Accepts `true`
147+
(default) or `false`.
148+
149+
<<eager-global-ordinals,`eager_global_ordinals`>>::
150+
151+
Should global ordinals be loaded eagerly on refresh? Accepts `true` or
152+
`false` (default). Enabling this is a good idea on fields that are
153+
frequently used for terms aggregations.
154+
155+
<<ignore-above,`ignore_above`>>::
156+
157+
Leaf values longer than this limit will not be indexed. By default, there
158+
is no limit and all values will be indexed. Note that this limit applies
159+
to the leaf values within the flattened object field, and not the length of
160+
the entire field.
161+
162+
<<mapping-index,`index`>>::
163+
164+
Determines if the field should be searchable. Accepts `true` (default) or
165+
`false`.
166+
167+
<<index-options,`index_options`>>::
168+
169+
What information should be stored in the index for scoring purposes.
170+
Defaults to `docs` but can also be set to `freqs` to take term frequency
171+
into account when computing scores.
172+
173+
<<null-value,`null_value`>>::
174+
175+
A string value which is substituted for any explicit `null` values within
176+
the flattened object field. Defaults to `null`, which means null sields are
177+
treated as if it were missing.
178+
179+
<<similarity,`similarity`>>::
180+
181+
Which scoring algorithm or _similarity_ should be used. Defaults
182+
to `BM25`.
183+
184+
`split_queries_on_whitespace`::
185+
186+
Whether <<full-text-queries,full text queries>> should split the input on
187+
whitespace when building a query for this field. Accepts `true` or `false`
188+
(default).

docs/reference/rest-api/info.asciidoc

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,10 @@ Example response:
7171
"available" : true,
7272
"enabled" : true
7373
},
74+
"flattened" : {
75+
"available" : true,
76+
"enabled" : true
77+
},
7478
"graph" : {
7579
"available" : true,
7680
"enabled" : true

server/src/main/java/org/elasticsearch/index/fielddata/IndexOrdinalsFieldData.java

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,4 +47,11 @@ public interface IndexOrdinalsFieldData extends IndexFieldData.Global<AtomicOrdi
4747
* or null if global ordinals are not needed (constant value or single segment).
4848
*/
4949
OrdinalMap getOrdinalMap();
50+
51+
/**
52+
* Whether this field data is able to provide a mapping between global and segment ordinals,
53+
* by returning the underlying {@link OrdinalMap}. If this method returns false, then calling
54+
* {@link #getOrdinalMap} will result in an {@link UnsupportedOperationException}.
55+
*/
56+
boolean supportsGlobalOrdinalsMapping();
5057
}

server/src/main/java/org/elasticsearch/index/fielddata/ordinals/GlobalOrdinalsIndexFieldData.java

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -126,6 +126,11 @@ public OrdinalMap getOrdinalMap() {
126126
return ordinalMap;
127127
}
128128

129+
@Override
130+
public boolean supportsGlobalOrdinalsMapping() {
131+
return true;
132+
}
133+
129134
/**
130135
* A non-thread safe {@link IndexOrdinalsFieldData} for global ordinals that creates the {@link TermsEnum} of each
131136
* segment once and use them to provide a single lookup per segment.
@@ -225,9 +230,15 @@ public void close() {}
225230
};
226231
}
227232

233+
@Override
234+
public boolean supportsGlobalOrdinalsMapping() {
235+
return true;
236+
}
237+
228238
@Override
229239
public OrdinalMap getOrdinalMap() {
230240
return ordinalMap;
231241
}
242+
232243
}
233244
}

server/src/main/java/org/elasticsearch/index/fielddata/plain/AbstractIndexOrdinalsFieldData.java

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -138,6 +138,11 @@ protected TermsEnum filter(Terms terms, TermsEnum iterator, LeafReader reader) t
138138
return iterator;
139139
}
140140

141+
@Override
142+
public boolean supportsGlobalOrdinalsMapping() {
143+
return false;
144+
}
145+
141146
private static final class FrequencyFilter extends FilteredTermsEnum {
142147

143148
private int minFreq;

server/src/main/java/org/elasticsearch/index/fielddata/plain/SortedSetDVOrdinalsIndexFieldData.java

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -146,4 +146,9 @@ public IndexOrdinalsFieldData localGlobalDirect(DirectoryReader indexReader) thr
146146
public OrdinalMap getOrdinalMap() {
147147
return null;
148148
}
149+
150+
@Override
151+
public boolean supportsGlobalOrdinalsMapping() {
152+
return true;
153+
}
149154
}

server/src/main/java/org/elasticsearch/index/mapper/ContentPath.java

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,4 +66,8 @@ public String pathAsText(String name) {
6666
sb.append(name);
6767
return sb.toString();
6868
}
69+
70+
public int length() {
71+
return index;
72+
}
6973
}
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
/*
2+
* Licensed to Elasticsearch under one or more contributor
3+
* license agreements. See the NOTICE file distributed with
4+
* this work for additional information regarding copyright
5+
* ownership. Elasticsearch licenses this file to you under
6+
* the Apache License, Version 2.0 (the "License"); you may
7+
* not use this file except in compliance with the License.
8+
* You may obtain a copy of the License at
9+
*
10+
* http://www.apache.org/licenses/LICENSE-2.0
11+
*
12+
* Unless required by applicable law or agreed to in writing,
13+
* software distributed under the License is distributed on an
14+
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
* KIND, either express or implied. See the License for the
16+
* specific language governing permissions and limitations
17+
* under the License.
18+
*/
19+
20+
package org.elasticsearch.index.mapper;
21+
22+
import org.elasticsearch.common.settings.Settings;
23+
24+
/**
25+
* A field mapper that supports lookup of dynamic sub-keys. If the field mapper is named 'my_field',
26+
* then a user is able to search on the field in both of the following ways:
27+
* - Using the field name 'my_field', which will delegate to the field type
28+
* {@link DynamicKeyFieldMapper#fieldType()} as usual.
29+
* - Using any sub-key, for example 'my_field.some_key'. In this case, the search is delegated
30+
* to {@link DynamicKeyFieldMapper#keyedFieldType(String)}, with 'some_key' passed as the
31+
* argument. The field mapper is allowed to create a new field type dynamically in order
32+
* to handle the search.
33+
*
34+
* To prevent conflicts between these dynamic sub-keys and multi-fields, any field mappers
35+
* implementing this interface should explicitly disallow multi-fields. The constructor makes
36+
* sure to passes an empty multi-fields list to help prevent conflicting sub-keys from being
37+
* registered.
38+
*
39+
* Note: we anticipate that 'flattened' fields will be the only implementation of this
40+
* interface. Flattened object fields live in the 'mapper-flattened' module.
41+
*/
42+
public abstract class DynamicKeyFieldMapper extends FieldMapper {
43+
44+
public DynamicKeyFieldMapper(String simpleName,
45+
MappedFieldType fieldType,
46+
MappedFieldType defaultFieldType,
47+
Settings indexSettings,
48+
CopyTo copyTo) {
49+
super(simpleName, fieldType, defaultFieldType, indexSettings, MultiFields.empty(), copyTo);
50+
}
51+
52+
public abstract MappedFieldType keyedFieldType(String key);
53+
54+
}

server/src/main/java/org/elasticsearch/index/mapper/FieldMapper.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -193,7 +193,7 @@ public Builder nullValue(Object nullValue) {
193193
return this;
194194
}
195195

196-
public T addMultiField(Mapper.Builder mapperBuilder) {
196+
public T addMultiField(Mapper.Builder<?, ?> mapperBuilder) {
197197
multiFieldsBuilder.add(mapperBuilder);
198198
return builder;
199199
}

0 commit comments

Comments
 (0)