Skip to content

Commit b703e1d

Browse files
committed
Add doc values support for JSON fields. (#40069)
When `doc_values` are enabled, we now add two `SortedSetDocValuesFields` for each token: one containing the raw `value`, and another with `key\0value`. The root JSON field uses the standard `SortedSetDVOrdinalsIndexFieldData`. For keyed fields, this PR introduces a new type ` KeyedJsonIndexFieldData` that wraps the standard ordinals field data and filters out values that do not match the right prefix. This gives support for sorting on JSON fields, as well as simple keyword-style aggregations like `terms`. One slightly tricky aspect is caching of these doc values. Given a keyed JSON field, we need to make sure we don't store values filtered on a certain prefix under the same cache key as ones filtered on a different prefix. However, we also want to load and cache global ordinals only once per keyed JSON field, as opposed to having a separate cache entry per prefix.
1 parent b68acea commit b703e1d

File tree

13 files changed

+1065
-93
lines changed

13 files changed

+1065
-93
lines changed

docs/reference/mapping/types/json.asciidoc

Lines changed: 19 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ the following advantages:
2424

2525
However, `json` fields present a trade-off in terms of search functionality.
2626
Only basic queries are allowed, with no support for numeric range queries or
27-
aggregations. Further information on the limitations can be found in the
27+
highlighting. Further information on the limitations can be found in the
2828
<<supported-operations, Supported operations>> section.
2929

3030
NOTE: The `json` mapping type should **not** be used for indexing all JSON
@@ -107,9 +107,13 @@ Currently, `json` fields can be used with the following query types:
107107

108108
When querying, it is not possible to refer to field keys using wildcards, as in
109109
`{ "term": {"labels.time*": 1541457010}}`. Note that all queries, including
110-
`range`, treat the values as string keywords.
110+
`range`, treat the values as string keywords. Highlighting is not supported on
111+
`json` fields.
111112

112-
Aggregating, highlighting, or sorting on a `json` field is not supported.
113+
It is possible to sort on a `json` field, as well as perform simple
114+
keyword-style aggregations such as `terms`. As with queries, there is no
115+
special support for numerics -- all values in the JSON object are treated as
116+
keywords. When sorting, this implies that values are compared lexicographically.
113117

114118
Finally, because of the way leaf values are stored in the index, the null
115119
character `\0` is not allowed to appear in the keys of the JSON object.
@@ -154,6 +158,18 @@ parameters are accepted:
154158
objects. If a JSON field exceeds this limit, then an error will be
155159
thrown. Defaults to `20`.
156160

161+
<<doc-values,`doc_values`>>::
162+
163+
Should the field be stored on disk in a column-stride fashion, so that it
164+
can later be used for sorting, aggregations, or scripting? Accepts `true`
165+
(default) or `false`.
166+
167+
<<eager-global-ordinals,`eager_global_ordinals`>>::
168+
169+
Should global ordinals be loaded eagerly on refresh? Accepts `true` or `false`
170+
(default). Enabling this is a good idea on fields that are frequently used for
171+
terms aggregations.
172+
157173
<<ignore-above,`ignore_above`>>::
158174

159175
Leaf values longer than this limit will not be indexed. By default, there

server/src/main/java/org/elasticsearch/index/mapper/JsonFieldMapper.java

Lines changed: 158 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -21,15 +21,19 @@
2121

2222
import org.apache.lucene.analysis.core.WhitespaceAnalyzer;
2323
import org.apache.lucene.document.StoredField;
24+
import org.apache.lucene.index.DirectoryReader;
2425
import org.apache.lucene.index.IndexOptions;
2526
import org.apache.lucene.index.IndexableField;
27+
import org.apache.lucene.index.LeafReaderContext;
28+
import org.apache.lucene.index.OrdinalMap;
2629
import org.apache.lucene.index.Term;
30+
import org.apache.lucene.search.DocValuesFieldExistsQuery;
2731
import org.apache.lucene.search.MultiTermQuery;
2832
import org.apache.lucene.search.PrefixQuery;
2933
import org.apache.lucene.search.Query;
34+
import org.apache.lucene.search.SortField;
3035
import org.apache.lucene.search.TermQuery;
3136
import org.apache.lucene.util.BytesRef;
32-
import org.elasticsearch.Version;
3337
import org.elasticsearch.common.bytes.BytesReference;
3438
import org.elasticsearch.common.lucene.Lucene;
3539
import org.elasticsearch.common.settings.Settings;
@@ -39,9 +43,21 @@
3943
import org.elasticsearch.common.xcontent.XContentParser;
4044
import org.elasticsearch.common.xcontent.json.JsonXContent;
4145
import org.elasticsearch.common.xcontent.support.XContentMapValues;
46+
import org.elasticsearch.index.Index;
47+
import org.elasticsearch.index.IndexSettings;
4248
import org.elasticsearch.index.analysis.AnalyzerScope;
4349
import org.elasticsearch.index.analysis.NamedAnalyzer;
50+
import org.elasticsearch.index.fielddata.AtomicOrdinalsFieldData;
51+
import org.elasticsearch.index.fielddata.IndexFieldData;
52+
import org.elasticsearch.index.fielddata.IndexFieldDataCache;
53+
import org.elasticsearch.index.fielddata.IndexOrdinalsFieldData;
54+
import org.elasticsearch.index.fielddata.fieldcomparator.BytesRefFieldComparatorSource;
55+
import org.elasticsearch.index.fielddata.plain.AbstractAtomicOrdinalsFieldData;
56+
import org.elasticsearch.index.fielddata.plain.DocValuesIndexFieldData;
57+
import org.elasticsearch.index.fielddata.plain.SortedSetDVOrdinalsIndexFieldData;
4458
import org.elasticsearch.index.query.QueryShardContext;
59+
import org.elasticsearch.indices.breaker.CircuitBreakerService;
60+
import org.elasticsearch.search.MultiValueMode;
4561

4662
import java.io.IOException;
4763
import java.util.Iterator;
@@ -93,7 +109,7 @@ private static class Defaults {
93109
static {
94110
FIELD_TYPE.setTokenized(false);
95111
FIELD_TYPE.setStored(false);
96-
FIELD_TYPE.setHasDocValues(false);
112+
FIELD_TYPE.setHasDocValues(true);
97113
FIELD_TYPE.setIndexOptions(IndexOptions.DOCS);
98114
FIELD_TYPE.setOmitNorms(true);
99115
FIELD_TYPE.freeze();
@@ -127,14 +143,6 @@ public Builder indexOptions(IndexOptions indexOptions) {
127143
return super.indexOptions(indexOptions);
128144
}
129145

130-
@Override
131-
public Builder docValues(boolean docValues) {
132-
if (docValues) {
133-
throw new IllegalArgumentException("[" + CONTENT_TYPE + "] fields do not support doc values");
134-
}
135-
return super.docValues(docValues);
136-
}
137-
138146
public Builder depthLimit(int depthLimit) {
139147
if (depthLimit < 0) {
140148
throw new IllegalArgumentException("[depth_limit] must be positive, got " + depthLimit);
@@ -143,6 +151,11 @@ public Builder depthLimit(int depthLimit) {
143151
return this;
144152
}
145153

154+
public Builder eagerGlobalOrdinals(boolean eagerGlobalOrdinals) {
155+
fieldType().setEagerGlobalOrdinals(eagerGlobalOrdinals);
156+
return builder;
157+
}
158+
146159
public Builder ignoreAbove(int ignoreAbove) {
147160
if (ignoreAbove < 0) {
148161
throw new IllegalArgumentException("[ignore_above] must be positive, got " + ignoreAbove);
@@ -166,11 +179,6 @@ public Builder copyTo(CopyTo copyTo) {
166179
throw new UnsupportedOperationException("[copy_to] is not supported for [" + CONTENT_TYPE + "] fields.");
167180
}
168181

169-
@Override
170-
protected boolean defaultDocValues(Version indexCreated) {
171-
return false;
172-
}
173-
174182
@Override
175183
public JsonFieldMapper build(BuilderContext context) {
176184
setupFieldType(context);
@@ -194,6 +202,9 @@ public Mapper.Builder<?,?> parse(String name, Map<String, Object> node, ParserCo
194202
if (propName.equals("depth_limit")) {
195203
builder.depthLimit(XContentMapValues.nodeIntegerValue(propNode, -1));
196204
iterator.remove();
205+
} else if (propName.equals("eager_global_ordinals")) {
206+
builder.eagerGlobalOrdinals(XContentMapValues.nodeBooleanValue(propNode, "eager_global_ordinals"));
207+
iterator.remove();
197208
} else if (propName.equals("ignore_above")) {
198209
builder.ignoreAbove(XContentMapValues.nodeIntegerValue(propNode, -1));
199210
iterator.remove();
@@ -221,7 +232,7 @@ public static final class KeyedJsonFieldType extends StringFieldType {
221232
private final String key;
222233
private boolean splitQueriesOnWhitespace;
223234

224-
KeyedJsonFieldType(String key) {
235+
public KeyedJsonFieldType(String key) {
225236
setIndexAnalyzer(Lucene.KEYWORD_ANALYZER);
226237
setSearchAnalyzer(Lucene.KEYWORD_ANALYZER);
227238
this.key = key;
@@ -323,6 +334,7 @@ public Query wildcardQuery(String value,
323334
CONTENT_TYPE + "] fields.");
324335
}
325336

337+
@Override
326338
public BytesRef indexedValueForSearch(Object value) {
327339
if (value == null) {
328340
return null;
@@ -334,6 +346,108 @@ public BytesRef indexedValueForSearch(Object value) {
334346
String keyedValue = JsonFieldParser.createKeyedValue(key, stringValue);
335347
return new BytesRef(keyedValue);
336348
}
349+
350+
@Override
351+
public IndexFieldData.Builder fielddataBuilder(String fullyQualifiedIndexName) {
352+
failIfNoDocValues();
353+
return new KeyedJsonIndexFieldData.Builder(key);
354+
}
355+
}
356+
357+
/**
358+
* A field data implementation that gives access to the values associated with
359+
* a particular JSON key.
360+
*
361+
* This class wraps the field data that is built directly on the keyed JSON field, and
362+
* filters out values whose prefix doesn't match the requested key. Loading and caching
363+
* is fully delegated to the wrapped field data, so that different {@link KeyedJsonIndexFieldData}
364+
* for the same JSON field share the same global ordinals.
365+
*/
366+
public static class KeyedJsonIndexFieldData implements IndexOrdinalsFieldData {
367+
private final String key;
368+
private final IndexOrdinalsFieldData delegate;
369+
370+
private KeyedJsonIndexFieldData(String key, IndexOrdinalsFieldData delegate) {
371+
this.delegate = delegate;
372+
this.key = key;
373+
}
374+
375+
public String getKey() {
376+
return key;
377+
}
378+
379+
@Override
380+
public String getFieldName() {
381+
return delegate.getFieldName();
382+
}
383+
384+
@Override
385+
public SortField sortField(Object missingValue,
386+
MultiValueMode sortMode,
387+
XFieldComparatorSource.Nested nested,
388+
boolean reverse) {
389+
XFieldComparatorSource source = new BytesRefFieldComparatorSource(this, missingValue, sortMode, nested);
390+
return new SortField(getFieldName(), source, reverse);
391+
}
392+
393+
@Override
394+
public void clear() {
395+
delegate.clear();
396+
}
397+
398+
@Override
399+
public AtomicOrdinalsFieldData load(LeafReaderContext context) {
400+
AtomicOrdinalsFieldData fieldData = delegate.load(context);
401+
return new KeyedJsonAtomicFieldData(key, fieldData);
402+
}
403+
404+
@Override
405+
public AtomicOrdinalsFieldData loadDirect(LeafReaderContext context) throws Exception {
406+
AtomicOrdinalsFieldData fieldData = delegate.loadDirect(context);
407+
return new KeyedJsonAtomicFieldData(key, fieldData);
408+
}
409+
410+
@Override
411+
public IndexOrdinalsFieldData loadGlobal(DirectoryReader indexReader) {
412+
IndexOrdinalsFieldData fieldData = delegate.loadGlobal(indexReader);
413+
return new KeyedJsonIndexFieldData(key, fieldData);
414+
}
415+
416+
@Override
417+
public IndexOrdinalsFieldData localGlobalDirect(DirectoryReader indexReader) throws Exception {
418+
IndexOrdinalsFieldData fieldData = delegate.localGlobalDirect(indexReader);
419+
return new KeyedJsonIndexFieldData(key, fieldData);
420+
}
421+
422+
@Override
423+
public OrdinalMap getOrdinalMap() {
424+
return delegate.getOrdinalMap();
425+
}
426+
427+
@Override
428+
public Index index() {
429+
return delegate.index();
430+
}
431+
432+
public static class Builder implements IndexFieldData.Builder {
433+
private final String key;
434+
435+
Builder(String key) {
436+
this.key = key;
437+
}
438+
439+
@Override
440+
public IndexFieldData<?> build(IndexSettings indexSettings,
441+
MappedFieldType fieldType,
442+
IndexFieldDataCache cache,
443+
CircuitBreakerService breakerService,
444+
MapperService mapperService) {
445+
String fieldName = fieldType.name();
446+
IndexOrdinalsFieldData delegate = new SortedSetDVOrdinalsIndexFieldData(indexSettings,
447+
cache, fieldName, breakerService, AbstractAtomicOrdinalsFieldData.DEFAULT_SCRIPT_FUNCTION);
448+
return new KeyedJsonIndexFieldData(key, delegate);
449+
}
450+
}
337451
}
338452

339453
/**
@@ -396,7 +510,11 @@ public Object valueForDisplay(Object value) {
396510

397511
@Override
398512
public Query existsQuery(QueryShardContext context) {
399-
return new TermQuery(new Term(FieldNamesFieldMapper.NAME, name()));
513+
if (hasDocValues()) {
514+
return new DocValuesFieldExistsQuery(name());
515+
} else {
516+
return new TermQuery(new Term(FieldNamesFieldMapper.NAME, name()));
517+
}
400518
}
401519

402520
@Override
@@ -420,6 +538,12 @@ public Query wildcardQuery(String value,
420538
throw new UnsupportedOperationException("[wildcard] queries are not currently supported on [" +
421539
CONTENT_TYPE + "] fields.");
422540
}
541+
542+
@Override
543+
public IndexFieldData.Builder fielddataBuilder(String fullyQualifiedIndexName) {
544+
failIfNoDocValues();
545+
return new DocValuesIndexFieldData.Builder();
546+
}
423547
}
424548

425549
private final JsonFieldParser fieldParser;
@@ -438,7 +562,7 @@ private JsonFieldMapper(String simpleName,
438562
this.depthLimit = depthLimit;
439563
this.ignoreAbove = ignoreAbove;
440564
this.fieldParser = new JsonFieldParser(fieldType.name(), keyedFieldName(),
441-
depthLimit, ignoreAbove, fieldType.nullValueAsString());
565+
fieldType, depthLimit, ignoreAbove);
442566
}
443567

444568
@Override
@@ -476,7 +600,9 @@ protected void parseCreateField(ParseContext context, List<IndexableField> field
476600
return;
477601
}
478602

479-
if (fieldType.indexOptions() == IndexOptions.NONE && !fieldType.stored()) {
603+
if (fieldType.indexOptions() == IndexOptions.NONE
604+
&& !fieldType.hasDocValues()
605+
&& !fieldType.stored()) {
480606
context.parser().skipChildren();
481607
return;
482608
}
@@ -490,22 +616,22 @@ protected void parseCreateField(ParseContext context, List<IndexableField> field
490616
fields.add(new StoredField(fieldType.name(), storedValue));
491617
}
492618

493-
if (fieldType().indexOptions() != IndexOptions.NONE) {
494-
XContentParser indexedFieldsParser = context.parser();
619+
XContentParser indexedFieldsParser = context.parser();
495620

496-
// If store is enabled, we've already consumed the content to produce the stored field. Here we
497-
// 'reset' the parser, so that we can traverse the content again.
498-
if (storedValue != null) {
499-
indexedFieldsParser = JsonXContent.jsonXContent.createParser(context.parser().getXContentRegistry(),
500-
context.parser().getDeprecationHandler(),
501-
storedValue.bytes);
502-
indexedFieldsParser.nextToken();
503-
}
504-
505-
fields.addAll(fieldParser.parse(indexedFieldsParser));
621+
// If store is enabled, we've already consumed the content to produce the stored field. Here we
622+
// 'reset' the parser, so that we can traverse the content again.
623+
if (storedValue != null) {
624+
indexedFieldsParser = JsonXContent.jsonXContent.createParser(context.parser().getXContentRegistry(),
625+
context.parser().getDeprecationHandler(),
626+
storedValue.bytes);
627+
indexedFieldsParser.nextToken();
506628
}
507629

508-
createFieldNamesField(context, fields);
630+
fields.addAll(fieldParser.parse(indexedFieldsParser));
631+
632+
if (!fieldType.hasDocValues()) {
633+
createFieldNamesField(context, fields);
634+
}
509635
}
510636

511637
@Override

0 commit comments

Comments
 (0)