Skip to content

Add support for 'flattened object' fields. #42541

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 33 commits into from
Jun 28, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
93997ab
Add a simple JSON field mapper. (#33923)
jtibshirani Sep 28, 2018
09b68e7
When parsing JSON fields, also create tokens prefixed with the field …
jtibshirani Oct 4, 2018
9eb4bd1
Add support for querying JSON fields based on key. (#34621)
jtibshirani Oct 29, 2018
7daf406
Add support for storing JSON fields. (#34942)
jtibshirani Oct 30, 2018
624d9ca
Enforce a limit on the depth of the JSON object. (#35063)
jtibshirani Oct 31, 2018
133f554
Disallow doc_values in the JSON field mapping. (#35282)
jtibshirani Nov 7, 2018
9183cbd
Make sure stored JSON fields are properly decoded. (#35279)
jtibshirani Nov 7, 2018
4507b85
Add tests for the supported query types. (#35319)
jtibshirani Nov 8, 2018
fbadb62
Add documentation for JSON fields. (#35281)
jtibshirani Nov 9, 2018
d0ec58a
Add a test around JSON fields and source filtering. (#35399)
jtibshirani Nov 10, 2018
400bf0c
Prevent slow field lookups when JSON fields are present. (#39872)
jtibshirani Mar 12, 2019
f4d7929
Add doc values support for JSON fields. (#40069)
jtibshirani Apr 1, 2019
42f6af6
Rename 'json' to 'embedded_json'. (#40712)
jtibshirani Apr 2, 2019
78da6c5
Rename embedded_json.asciidoc for consistency.
jtibshirani Apr 2, 2019
9126b04
Rebase keyed JSON ordinals to start from zero. (#41282)
jtibshirani Apr 17, 2019
2b71ed8
Make sure to return keyed JSON field types in FieldTypeLookup#iterato…
jtibshirani Apr 18, 2019
bf5c457
Allow non-exact query types on the root JSON field. (#41290)
jtibshirani Apr 17, 2019
7237b2b
Remove the 'experimental' tag.
jtibshirani May 29, 2019
31d3a58
Address code review feedback.
jtibshirani May 29, 2019
c9420a3
Remove support for stored fields.
jtibshirani May 29, 2019
74ebc19
Improve the field type documentation.
jtibshirani May 29, 2019
9cefddf
Rename 'embedded JSON' to 'flattened object'.
jtibshirani Jun 7, 2019
58533cf
Merge remote-tracking branch 'upstream/master' into object-fields
jtibshirani Jun 7, 2019
ad35339
Fix full text queries, which were broken after the merge.
jtibshirani Jun 7, 2019
8241627
Address code review feedback and remove leftover references to 'json'.
jtibshirani Jun 11, 2019
e47467b
Merge remote-tracking branch 'upstream/master' into object-fields
jtibshirani Jun 11, 2019
daa200b
Ensure all JSON field types are available for lookup. (#41914)
jtibshirani May 7, 2019
c995f68
Merge remote-tracking branch 'upstream/master' into object-fields
jtibshirani Jun 14, 2019
232f494
Pull the flattened field type into a mapper plugin. (#43250)
jtibshirani Jun 27, 2019
2c18194
Merge remote-tracking branch 'upstream/master' into object-fields
jtibshirani Jun 27, 2019
6f33b65
License the flattened mapper as basic. (#43690)
jtibshirani Jun 28, 2019
ac0a80b
Merge remote-tracking branch 'upstream/master' into object-fields
jtibshirani Jun 28, 2019
c344200
Merge remote-tracking branch 'upstream/master' into object-fields
jtibshirani Jun 28, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 7 additions & 2 deletions docs/reference/mapping/types.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,6 @@ string:: <<text,`text`>> and <<keyword,`keyword`>>

<<parent-join>>:: Defines parent/child relation for documents within the same index

<<alias>>:: Defines an alias to an existing field.

<<rank-feature>>:: Record numeric feature to boost hits at query time.

<<rank-features>>:: Record numeric features to boost hits at query time.
Expand All @@ -54,6 +52,11 @@ string:: <<text,`text`>> and <<keyword,`keyword`>>

<<search-as-you-type>>:: A text-like field optimized for queries to implement as-you-type completion

<<alias>>:: Defines an alias to an existing field.

<<flattened>>:: Allows an entire JSON object to be indexed as a single field.


[float]
=== Multi-fields

Expand Down Expand Up @@ -82,6 +85,8 @@ include::types/date.asciidoc[]

include::types/date_nanos.asciidoc[]

include::types/flattened.asciidoc[]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could we move it next to the object and keyword fields that it relates to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these includes are just alphabetized. For the actual links to individual field types, I put it under "Specialised datatypes" to encourage users to think through whether it's appropriate for their data.

include::types/geo-point.asciidoc[]

include::types/geo-shape.asciidoc[]
Expand Down
188 changes: 188 additions & 0 deletions docs/reference/mapping/types/flattened.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,188 @@
[role="xpack"]
[testenv="basic"]

[[flattened]]
=== Flattened datatype

By default, each subfield in an object is mapped and indexed separately. If
the names or types of the subfields are not known in advance, then they are
<<dynamic-mapping, mapped dynamically>>.

The `flattened` type provides an alternative approach, where the entire
object is mapped as a single field. Given an object, the `flattened`
mapping will parse out its leaf values and index them into one field as
keywords. The object's contents can then be searched through simple queries
and aggregations.

This data type can be useful for indexing objects with a large or unknown
number of unique keys. Only one field mapping is created for the whole JSON
object, which can help prevent a <<mapping-limit-settings, mappings explosion>>
from having too many distinct field mappings.

On the other hand, flattened object fields present a trade-off in terms of
search functionality. Only basic queries are allowed, with no support for
numeric range queries or highlighting. Further information on the limitations
can be found in the <<supported-operations, Supported operations>> section.

NOTE: The `flattened` mapping type should **not** be used for indexing all
document content, as it treats all values as keywords and does not provide full
search functionality. The default approach, where each subfield has its own
entry in the mappings, works well in the majority of cases.

An flattened object field can be created as follows:
[source,js]
--------------------------------
PUT bug_reports
{
"mappings": {
"properties": {
"title": {
"type": "text"
},
"labels": {
"type": "flattened"
}
}
}
}

POST bug_reports/_doc/1
{
"title": "Results are not sorted correctly.",
"labels": {
"priority": "urgent",
"release": ["v1.2.5", "v1.3.0"],
"timestamp": {
"created": 1541458026,
"closed": 1541457010
}
}
}
--------------------------------
// CONSOLE
// TESTSETUP

During indexing, tokens are created for each leaf value in the JSON object. The
values are indexed as string keywords, without analysis or special handling for
numbers or dates.

Querying the top-level `flattened` field searches all leaf values in the
object:

[source,js]
--------------------------------
POST bug_reports/_search
{
"query": {
"term": {"labels": "urgent"}
}
}
--------------------------------
// CONSOLE

To query on a specific key in the flattened object, object dot notation is used:
[source,js]
--------------------------------
POST bug_reports/_search
{
"query": {
"term": {"labels.release": "v1.3.0"}
}
}
--------------------------------
// CONSOLE

[[supported-operations]]
==== Supported operations

Because of the similarities in the way values are indexed, `flattened`
fields share much of the same mapping and search functionality as
<<keyword, `keyword`>> fields.

Currently, flattened object fields can be used with the following query types:

- `term`, `terms`, and `terms_set`
- `prefix`
- `range`
- `match` and `multi_match`
- `query_string` and `simple_query_string`
- `exists`

When querying, it is not possible to refer to field keys using wildcards, as in
`{ "term": {"labels.time*": 1541457010}}`. Note that all queries, including
`range`, treat the values as string keywords. Highlighting is not supported on
`flattened` fields.

It is possible to sort on an flattened object field, as well as perform simple
keyword-style aggregations such as `terms`. As with queries, there is no
special support for numerics -- all values in the JSON object are treated as
keywords. When sorting, this implies that values are compared
lexicographically.

Flattened object fields currently cannot be stored. It is not possible to
specify the <<mapping-store, `store`>> parameter in the mapping.

[[flattened-params]]
==== Parameters for flattened object fields

The following mapping parameters are accepted:

[horizontal]

<<mapping-boost,`boost`>>::

Mapping field-level query time boosting. Accepts a floating point number,
defaults to `1.0`.

`depth_limit`::

The maximum allowed depth of the flattened object field, in terms of nested
inner objects. If a flattened object field exceeds this limit, then an
error will be thrown. Defaults to `20`.

<<doc-values,`doc_values`>>::

Should the field be stored on disk in a column-stride fashion, so that it
can later be used for sorting, aggregations, or scripting? Accepts `true`
(default) or `false`.

<<eager-global-ordinals,`eager_global_ordinals`>>::

Should global ordinals be loaded eagerly on refresh? Accepts `true` or
`false` (default). Enabling this is a good idea on fields that are
frequently used for terms aggregations.

<<ignore-above,`ignore_above`>>::

Leaf values longer than this limit will not be indexed. By default, there
is no limit and all values will be indexed. Note that this limit applies
to the leaf values within the flattened object field, and not the length of
the entire field.

<<mapping-index,`index`>>::

Determines if the field should be searchable. Accepts `true` (default) or
`false`.

<<index-options,`index_options`>>::

What information should be stored in the index for scoring purposes.
Defaults to `docs` but can also be set to `freqs` to take term frequency
into account when computing scores.

<<null-value,`null_value`>>::

A string value which is substituted for any explicit `null` values within
the flattened object field. Defaults to `null`, which means null sields are
treated as if it were missing.

<<similarity,`similarity`>>::

Which scoring algorithm or _similarity_ should be used. Defaults
to `BM25`.

`split_queries_on_whitespace`::

Whether <<full-text-queries,full text queries>> should split the input on
whitespace when building a query for this field. Accepts `true` or `false`
(default).
4 changes: 4 additions & 0 deletions docs/reference/rest-api/info.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,10 @@ Example response:
"available" : true,
"enabled" : true
},
"flattened" : {
"available" : true,
"enabled" : true
},
"graph" : {
"available" : true,
"enabled" : true
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,4 +47,11 @@ public interface IndexOrdinalsFieldData extends IndexFieldData.Global<AtomicOrdi
* or null if global ordinals are not needed (constant value or single segment).
*/
OrdinalMap getOrdinalMap();

/**
* Whether this field data is able to provide a mapping between global and segment ordinals,
* by returning the underlying {@link OrdinalMap}. If this method returns false, then calling
* {@link #getOrdinalMap} will result in an {@link UnsupportedOperationException}.
*/
boolean supportsGlobalOrdinalsMapping();
}
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,11 @@ public OrdinalMap getOrdinalMap() {
return ordinalMap;
}

@Override
public boolean supportsGlobalOrdinalsMapping() {
return true;
}

/**
* A non-thread safe {@link IndexOrdinalsFieldData} for global ordinals that creates the {@link TermsEnum} of each
* segment once and use them to provide a single lookup per segment.
Expand Down Expand Up @@ -225,9 +230,15 @@ public void close() {}
};
}

@Override
public boolean supportsGlobalOrdinalsMapping() {
return true;
}

@Override
public OrdinalMap getOrdinalMap() {
return ordinalMap;
}

}
}
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,11 @@ protected TermsEnum filter(Terms terms, TermsEnum iterator, LeafReader reader) t
return iterator;
}

@Override
public boolean supportsGlobalOrdinalsMapping() {
return false;
}

private static final class FrequencyFilter extends FilteredTermsEnum {

private int minFreq;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -146,4 +146,9 @@ public IndexOrdinalsFieldData localGlobalDirect(DirectoryReader indexReader) thr
public OrdinalMap getOrdinalMap() {
return null;
}

@Override
public boolean supportsGlobalOrdinalsMapping() {
return true;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -66,4 +66,8 @@ public String pathAsText(String name) {
sb.append(name);
return sb.toString();
}

public int length() {
return index;
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
/*
* Licensed to Elasticsearch under one or more contributor
* license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright
* ownership. Elasticsearch licenses this file to you under
* the Apache License, Version 2.0 (the "License"); you may
* not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package org.elasticsearch.index.mapper;

import org.elasticsearch.common.settings.Settings;

/**
* A field mapper that supports lookup of dynamic sub-keys. If the field mapper is named 'my_field',
* then a user is able to search on the field in both of the following ways:
* - Using the field name 'my_field', which will delegate to the field type
* {@link DynamicKeyFieldMapper#fieldType()} as usual.
* - Using any sub-key, for example 'my_field.some_key'. In this case, the search is delegated
* to {@link DynamicKeyFieldMapper#keyedFieldType(String)}, with 'some_key' passed as the
* argument. The field mapper is allowed to create a new field type dynamically in order
* to handle the search.
*
* To prevent conflicts between these dynamic sub-keys and multi-fields, any field mappers
* implementing this interface should explicitly disallow multi-fields. The constructor makes
* sure to passes an empty multi-fields list to help prevent conflicting sub-keys from being
* registered.
*
* Note: we anticipate that 'flattened' fields will be the only implementation of this
* interface. Flattened object fields live in the 'mapper-flattened' module.
*/
public abstract class DynamicKeyFieldMapper extends FieldMapper {

public DynamicKeyFieldMapper(String simpleName,
MappedFieldType fieldType,
MappedFieldType defaultFieldType,
Settings indexSettings,
CopyTo copyTo) {
super(simpleName, fieldType, defaultFieldType, indexSettings, MultiFields.empty(), copyTo);
}

public abstract MappedFieldType keyedFieldType(String key);

}
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,7 @@ public Builder nullValue(Object nullValue) {
return this;
}

public T addMultiField(Mapper.Builder mapperBuilder) {
public T addMultiField(Mapper.Builder<?, ?> mapperBuilder) {
multiFieldsBuilder.add(mapperBuilder);
return builder;
}
Expand Down
Loading