Skip to content

Commit b2a9fd6

Browse files
committed
Add a new _ignored meta field. (#29658)
This adds a new `_ignored` meta field which indexes and stores fields that have been ignored at index time because of the `ignore_malformed` option. It makes malformed documents easier to identify by using `exists` or `term(s)` queries on the `_ignored` field. Closes #29494
1 parent 37d0371 commit b2a9fd6

21 files changed

+418
-10
lines changed

docs/CHANGELOG.asciidoc

+6-2
Original file line numberDiff line numberDiff line change
@@ -28,8 +28,12 @@ This section summarizes the changes in each release.
2828
//[float]
2929
//=== Deprecations
3030

31-
//[float]
32-
//=== New Features
31+
[float]
32+
=== New Features
33+
34+
The new <<mapping-ignored-field,`_ignored`>> field allows to know which fields
35+
got ignored at index time because of the <<ignore-malformed,`ignore_malformed`>>
36+
option. ({pull}30140[#29658])
3337

3438
[float]
3539
=== Enhancements

docs/reference/mapping/fields.asciidoc

+10
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,14 @@ can be customised when a mapping type is created.
4848

4949
All fields in the document which contain non-null values.
5050

51+
[float]
52+
=== Indexing meta-fields
53+
54+
<<mapping-ignored-field,`_ignored`>>::
55+
56+
All fields in the document that have been ignored at index time because of
57+
<<ignore-malformed,`ignore_malformed`>>.
58+
5159
[float]
5260
=== Routing meta-field
5361

@@ -67,6 +75,8 @@ include::fields/all-field.asciidoc[]
6775

6876
include::fields/field-names-field.asciidoc[]
6977

78+
include::fields/ignored-field.asciidoc[]
79+
7080
include::fields/id-field.asciidoc[]
7181

7282
include::fields/index-field.asciidoc[]
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
[[mapping-ignored-field]]
2+
=== `_ignored` field
3+
4+
added[6.4.0]
5+
6+
The `_ignored` field indexes and stores the names of every field in a document
7+
that has been ignored because it was malformed and
8+
<<ignore-malformed,`ignore_malformed`>> was turned on.
9+
10+
This field is searchable with <<query-dsl-term-query,`term`>>,
11+
<<query-dsl-terms-query,`terms`>> and <<query-dsl-exists-query,`exists`>>
12+
queries, and is returned as part of the search hits.
13+
14+
For instance the below query matches all documents that have one or more fields
15+
that got ignored:
16+
17+
[source,js]
18+
--------------------------------------------------
19+
GET _search
20+
{
21+
"query": {
22+
"exists": {
23+
"field": "_ignored"
24+
}
25+
}
26+
}
27+
--------------------------------------------------
28+
// CONSOLE
29+
30+
Similarly, the below query finds all documents whose `@timestamp` field was
31+
ignored at index time:
32+
33+
[source,js]
34+
--------------------------------------------------
35+
GET _search
36+
{
37+
"query": {
38+
"term": {
39+
"_ignored": "@timestamp"
40+
}
41+
}
42+
}
43+
--------------------------------------------------
44+
// CONSOLE
45+

docs/reference/mapping/params/ignore-malformed.asciidoc

+10
Original file line numberDiff line numberDiff line change
@@ -85,3 +85,13 @@ PUT my_index
8585

8686
<1> The `number_one` field inherits the index-level setting.
8787
<2> The `number_two` field overrides the index-level setting to turn off `ignore_malformed`.
88+
89+
==== Dealing with malformed fields
90+
91+
Malformed fields are silently ignored at indexing time when `ignore_malformed`
92+
is turned on. Whenever possible it is recommended to keep the number of
93+
documents that have a malformed field contained, or queries on this field will
94+
become meaningless. Elasticsearch makes it easy to check how many documents
95+
have malformed fields by using `exist` or `term` queries on the special
96+
<<mapping-ignored-field,`_ignored`>> field.
97+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
---
2+
setup:
3+
- skip:
4+
version: " - 6.3.99"
5+
reason: _ignored was added in 6.4.0
6+
7+
- do:
8+
indices.create:
9+
index: test
10+
body:
11+
mappings:
12+
_doc:
13+
properties:
14+
my_date:
15+
type: date
16+
ignore_malformed: true
17+
store: true
18+
my_ip:
19+
type: ip
20+
ignore_malformed: true
21+
22+
- do:
23+
index:
24+
index: test
25+
type: _doc
26+
id: 1
27+
body: { "my_date": "2018-05-11", "my_ip": ":::1" }
28+
29+
- do:
30+
index:
31+
index: test
32+
type: _doc
33+
id: 2
34+
body: { "my_date": "bar", "my_ip": "192.168.1.42" }
35+
36+
- do:
37+
index:
38+
index: test
39+
type: _doc
40+
id: 3
41+
body: { "my_date": "bar", "my_ip": "quux" }
42+
43+
- do:
44+
indices.refresh: {}
45+
46+
---
47+
"Exists on _ignored":
48+
49+
- do:
50+
search:
51+
body: { query: { exists: { "field": "_ignored" } } }
52+
53+
- length: { hits.hits: 3 }
54+
55+
---
56+
"Search on _ignored with term":
57+
58+
- do:
59+
search:
60+
body: { query: { term: { "_ignored": "my_date" } } }
61+
62+
- length: { hits.hits: 2 }
63+
64+
---
65+
"Search on _ignored with terms":
66+
67+
- do:
68+
search:
69+
body: { query: { terms: { "_ignored": [ "my_date", "my_ip" ] } } }
70+
71+
- length: { hits.hits: 3 }
72+
73+
---
74+
"_ignored is returned by default":
75+
76+
- do:
77+
search:
78+
body: { query: { ids: { "values": [ "3" ] } } }
79+
80+
- length: { hits.hits: 1 }
81+
- length: { hits.hits.0._ignored: 2}
82+
83+
---
84+
"_ignored is still returned with explicit list of stored fields":
85+
86+
- do:
87+
search:
88+
stored_fields: [ "my_date" ]
89+
body: { query: { ids: { "values": [ "3" ] } } }
90+
91+
- length: { hits.hits: 1 }
92+
- is_true: hits.hits.0._ignored

server/src/main/java/org/elasticsearch/index/fieldvisitor/FieldsVisitor.java

+7
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
import org.elasticsearch.common.bytes.BytesArray;
2525
import org.elasticsearch.common.bytes.BytesReference;
2626
import org.elasticsearch.index.mapper.IdFieldMapper;
27+
import org.elasticsearch.index.mapper.IgnoredFieldMapper;
2728
import org.elasticsearch.index.mapper.MappedFieldType;
2829
import org.elasticsearch.index.mapper.MapperService;
2930
import org.elasticsearch.index.mapper.ParentFieldMapper;
@@ -73,6 +74,12 @@ public Status needsField(FieldInfo fieldInfo) throws IOException {
7374
if (requiredFields.remove(fieldInfo.name)) {
7475
return Status.YES;
7576
}
77+
// Always load _ignored to be explicit about ignored fields
78+
// This works because _ignored is added as the first metadata mapper,
79+
// so its stored fields always appear first in the list.
80+
if (IgnoredFieldMapper.NAME.equals(fieldInfo.name)) {
81+
return Status.YES;
82+
}
7683
// All these fields are single-valued so we can stop when the set is
7784
// empty
7885
return requiredFields.isEmpty()

server/src/main/java/org/elasticsearch/index/mapper/DateFieldMapper.java

+1
Original file line numberDiff line numberDiff line change
@@ -456,6 +456,7 @@ protected void parseCreateField(ParseContext context, List<IndexableField> field
456456
timestamp = fieldType().parse(dateAsString);
457457
} catch (IllegalArgumentException e) {
458458
if (ignoreMalformed.value()) {
459+
context.addIgnoredField(fieldType.name());
459460
return;
460461
} else {
461462
throw e;

server/src/main/java/org/elasticsearch/index/mapper/GeoPointFieldMapper.java

+2
Original file line numberDiff line numberDiff line change
@@ -306,6 +306,7 @@ public Mapper parse(ParseContext context) throws IOException {
306306
if (ignoreMalformed.value() == false) {
307307
throw e;
308308
}
309+
context.addIgnoredField(fieldType.name());
309310
}
310311
token = context.parser().nextToken();
311312
}
@@ -353,6 +354,7 @@ public Mapper parse(ParseContext context) throws IOException {
353354
if (ignoreMalformed.value() == false) {
354355
throw e;
355356
}
357+
context.addIgnoredField(fieldType.name());
356358
}
357359
}
358360
}

server/src/main/java/org/elasticsearch/index/mapper/GeoShapeFieldMapper.java

+1
Original file line numberDiff line numberDiff line change
@@ -516,6 +516,7 @@ public Mapper parse(ParseContext context) throws IOException {
516516
if (ignoreMalformed.value() == false) {
517517
throw new MapperParsingException("failed to parse [" + fieldType().name() + "]", e);
518518
}
519+
context.addIgnoredField(fieldType.name());
519520
}
520521
return null;
521522
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
/*
2+
* Licensed to Elasticsearch under one or more contributor
3+
* license agreements. See the NOTICE file distributed with
4+
* this work for additional information regarding copyright
5+
* ownership. Elasticsearch licenses this file to you under
6+
* the Apache License, Version 2.0 (the "License"); you may
7+
* not use this file except in compliance with the License.
8+
* You may obtain a copy of the License at
9+
*
10+
* http://www.apache.org/licenses/LICENSE-2.0
11+
*
12+
* Unless required by applicable law or agreed to in writing,
13+
* software distributed under the License is distributed on an
14+
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
* KIND, either express or implied. See the License for the
16+
* specific language governing permissions and limitations
17+
* under the License.
18+
*/
19+
20+
package org.elasticsearch.index.mapper;
21+
22+
import org.apache.lucene.document.Field;
23+
import org.apache.lucene.index.IndexOptions;
24+
import org.apache.lucene.index.IndexableField;
25+
import org.apache.lucene.search.Query;
26+
import org.apache.lucene.search.TermRangeQuery;
27+
import org.elasticsearch.common.lucene.Lucene;
28+
import org.elasticsearch.common.settings.Settings;
29+
import org.elasticsearch.common.xcontent.XContentBuilder;
30+
import org.elasticsearch.index.query.QueryShardContext;
31+
32+
import java.io.IOException;
33+
import java.util.List;
34+
import java.util.Map;
35+
36+
/**
37+
* A field mapper that records fields that have been ignored because they were malformed.
38+
*/
39+
public final class IgnoredFieldMapper extends MetadataFieldMapper {
40+
41+
public static final String NAME = "_ignored";
42+
43+
public static final String CONTENT_TYPE = "_ignored";
44+
45+
public static class Defaults {
46+
public static final String NAME = IgnoredFieldMapper.NAME;
47+
48+
public static final MappedFieldType FIELD_TYPE = new IgnoredFieldType();
49+
50+
static {
51+
FIELD_TYPE.setIndexOptions(IndexOptions.DOCS);
52+
FIELD_TYPE.setTokenized(false);
53+
FIELD_TYPE.setStored(true);
54+
FIELD_TYPE.setOmitNorms(true);
55+
FIELD_TYPE.setIndexAnalyzer(Lucene.KEYWORD_ANALYZER);
56+
FIELD_TYPE.setSearchAnalyzer(Lucene.KEYWORD_ANALYZER);
57+
FIELD_TYPE.setName(NAME);
58+
FIELD_TYPE.freeze();
59+
}
60+
}
61+
62+
public static class Builder extends MetadataFieldMapper.Builder<Builder, IgnoredFieldMapper> {
63+
64+
public Builder(MappedFieldType existing) {
65+
super(Defaults.NAME, existing == null ? Defaults.FIELD_TYPE : existing, Defaults.FIELD_TYPE);
66+
}
67+
68+
@Override
69+
public IgnoredFieldMapper build(BuilderContext context) {
70+
return new IgnoredFieldMapper(context.indexSettings());
71+
}
72+
}
73+
74+
public static class TypeParser implements MetadataFieldMapper.TypeParser {
75+
@Override
76+
public MetadataFieldMapper.Builder<?,?> parse(String name, Map<String, Object> node,
77+
ParserContext parserContext) throws MapperParsingException {
78+
return new Builder(parserContext.mapperService().fullName(NAME));
79+
}
80+
81+
@Override
82+
public MetadataFieldMapper getDefault(MappedFieldType fieldType, ParserContext context) {
83+
final Settings indexSettings = context.mapperService().getIndexSettings().getSettings();
84+
return new IgnoredFieldMapper(indexSettings);
85+
}
86+
}
87+
88+
public static final class IgnoredFieldType extends TermBasedFieldType {
89+
90+
public IgnoredFieldType() {
91+
}
92+
93+
protected IgnoredFieldType(IgnoredFieldType ref) {
94+
super(ref);
95+
}
96+
97+
@Override
98+
public IgnoredFieldType clone() {
99+
return new IgnoredFieldType(this);
100+
}
101+
102+
@Override
103+
public String typeName() {
104+
return CONTENT_TYPE;
105+
}
106+
107+
@Override
108+
public Query existsQuery(QueryShardContext context) {
109+
// This query is not performance sensitive, it only helps assess
110+
// quality of the data, so we may use a slow query. It shouldn't
111+
// be too slow in practice since the number of unique terms in this
112+
// field is bounded by the number of fields in the mappings.
113+
return new TermRangeQuery(name(), null, null, true, true);
114+
}
115+
116+
}
117+
118+
private IgnoredFieldMapper(Settings indexSettings) {
119+
super(NAME, Defaults.FIELD_TYPE, Defaults.FIELD_TYPE, indexSettings);
120+
}
121+
122+
@Override
123+
public void preParse(ParseContext context) throws IOException {
124+
}
125+
126+
@Override
127+
public void postParse(ParseContext context) throws IOException {
128+
super.parse(context);
129+
}
130+
131+
@Override
132+
public Mapper parse(ParseContext context) throws IOException {
133+
// done in post-parse
134+
return null;
135+
}
136+
137+
@Override
138+
protected void parseCreateField(ParseContext context, List<IndexableField> fields) throws IOException {
139+
for (String field : context.getIgnoredFields()) {
140+
context.doc().add(new Field(NAME, field, fieldType()));
141+
}
142+
}
143+
144+
@Override
145+
protected String contentType() {
146+
return CONTENT_TYPE;
147+
}
148+
149+
@Override
150+
public XContentBuilder toXContent(XContentBuilder builder, Params params) throws IOException {
151+
return builder;
152+
}
153+
154+
}

0 commit comments

Comments
 (0)