MappedFieldType should not extend FieldType #57666

romseygeek · 2020-06-04T13:22:10Z

MappedFieldType is a combination of two concerns:

an extension of lucene's FieldType, defining how a field should be indexed
a set of query factory methods, defining how a field should be searched

We want to break these two concerns apart. This commit is a first step to doing this, breaking
the inheritance relationship between MappedFieldType and FieldType. MappedFieldType instead
has a series of boolean flags defining whether or not the field is searchable or aggregatable,
and FieldMapper has a separate FieldType passed to its constructor defining how indexing
should be done.

Relates to #56814

elasticmachine · 2020-06-04T13:22:12Z

Pinging @elastic/es-search (:Search/Mapping)

romseygeek · 2020-06-04T13:23:10Z

Note that this only changes mappers within server/ - I want to get some feedback before embarking on the rest of the codebase. All tests within server/ pass.

nik9000

I'm very excited! I didn't look at the whole thing but I did leave a couple of comments.

Thanks for all the extra final stuff!

nik9000 · 2020-06-04T13:36:37Z

server/src/main/java/org/elasticsearch/action/admin/indices/analyze/TransportAnalyzeAction.java

@@ -179,7 +179,7 @@ private static Analyzer getAnalyzer(AnalyzeAction.Request request, AnalysisRegis
            }
            MappedFieldType fieldType = indexService.mapperService().fieldType(request.field());
            if (fieldType != null) {
-                if (fieldType.tokenized() || fieldType instanceof KeywordFieldMapper.KeywordFieldType) {


I super hate all of these instanceofs!

But you are making it better.

nik9000 · 2020-06-04T13:37:51Z

server/src/main/java/org/elasticsearch/index/mapper/MappedFieldType.java


-    private String name;
+    private final String name;


nik9000 · 2020-06-04T13:39:21Z

server/src/main/java/org/elasticsearch/index/mapper/MappedFieldType.java

@@ -130,7 +121,9 @@ public ValuesSourceType getValuesSourceType() {

    @Override
    public boolean equals(Object o) {
-        if (!super.equals(o)) return false;
+        if (o instanceof MappedFieldType == false) {


I've always been a fan of if (o == null || getClass() != o.getClass()) for this. Every equals implementation is ugly, but at least mine bails early if the subtypes don't line up.

+1 to check classes directly instead of using instanceof, I expect that MappedFieldType instances should never be considered equal if they are different implementations?

nik9000 · 2020-06-04T13:41:34Z

server/src/main/java/org/elasticsearch/index/mapper/MapperService.java

+        if (mapper == null) {
+            return null;
+        }
+        if (mapper instanceof FieldMapper == false) {


Could Mapper has a method that returns null if it isn't a FieldMapper?

nik9000

I don't really have the background to truly approve of this direction, but it feels pretty good to me.

nik9000 · 2020-06-04T13:44:38Z

server/src/main/java/org/elasticsearch/index/mapper/LegacyGeoShapeFieldMapper.java

-                               Explicit<Boolean> ignoreZValue, Settings indexSettings,
-                               MultiFields multiFields, CopyTo copyTo) {
-        super(simpleName, fieldType, defaultFieldType, ignoreMalformed, coerce, ignoreZValue, orientation, indexSettings,
+    public LegacyGeoShapeFieldMapper(String simpleName, FieldType fieldType, MappedFieldType mappedFieldType,


Maybe just run the formatter on the method declaration while you are here. It isn't really my favorite formatter, but we're going to hit the whole code base with it eventually.

romseygeek · 2020-06-04T13:28:53Z

server/src/main/java/org/elasticsearch/index/mapper/FieldMapper.java

-                    fieldType.setIndexOptions(options);
-                }
-            } else {
-                fieldType.setIndexOptions(IndexOptions.NONE);


The fact that we only have a single type helps simplify a lot here

romseygeek · 2020-06-04T13:34:26Z

server/src/main/java/org/elasticsearch/index/mapper/KeywordFieldMapper.java

+        public KeywordField(String field, BytesRef term, FieldType ft) {
+            super(field, term, ft);
+        }
+


We have a few places that were doing instanceof checks on lucene IndexableField fieldtypes - this allows us to continue doing those for now, as we can instead do an instanceof check for KeywordField

romseygeek · 2020-06-04T13:36:32Z

server/src/main/java/org/elasticsearch/index/mapper/MappedFieldType.java

    private NamedAnalyzer indexAnalyzer;
    private NamedAnalyzer searchAnalyzer;
    private NamedAnalyzer searchQuoteAnalyzer;
    private SimilarityProvider similarity;
-    private Object nullValue;


nullValue and nullValueAsString are now dealt with directly by FieldMappers that require it

romseygeek · 2020-06-04T13:39:10Z

server/src/main/java/org/elasticsearch/index/mapper/MapperService.java

@@ -166,6 +167,17 @@ public DocumentMapperParser documentMapperParser() {
        return this.documentParser;
    }

+    public FieldType getLuceneFieldType(String field) {


This is a bit of an unfortunate hack - there are still places that require access directly to the lucene field type, particularly in the highlighter code. They can almost certainly be refactored to use either information from the MappedFieldType or from the Mapper, but this PR is big enough already.

romseygeek · 2020-06-04T13:45:45Z

server/src/test/java/org/elasticsearch/index/mapper/DateFieldTypeTests.java


-    @Before


Formatter and Resolution are non-modifiable, so the tests on DateFieldMapperTests replace these bits

romseygeek · 2020-06-04T13:47:42Z

.../java/org/elasticsearch/search/aggregations/bucket/terms/SignificantTextAggregatorTests.java

@@ -76,10 +77,7 @@ protected AggregationBuilder createAggBuilderForTypeTest(MappedFieldType fieldTy
    @Override
    protected List<ValuesSourceType> getSupportedValuesSourceTypes() {
        // TODO it is likely accidental that SigText supports anything other than Bytes, and then only text fields
-        return List.of(CoreValuesSourceType.NUMERIC,


This ends up making this TODO official

romseygeek · 2020-06-04T13:48:58Z

...c/test/java/org/elasticsearch/search/aggregations/bucket/range/DateRangeAggregatorTests.java

@@ -70,6 +71,7 @@ public void testNoMatchingField() throws IOException {
        });
    }

+    @AwaitsFix(bugUrl="https://github.com/elastic/elasticsearch/issues/57651")


So it turns out that these tests didn't test what they thought they were testing, and the functionality they should have been testing for is broken. I opened #57651 to deal with them separately.

romseygeek · 2020-06-04T13:53:19Z

test/framework/src/main/java/org/elasticsearch/index/mapper/FieldMapperTestCase.java

+
+    protected void assertSerializes(String indexname, T builder) throws IOException {
+
+        // TODO can we do this without building an entire index?


This is a bit slow, unfortunately - the idea is to check that serializing a mapper and then using it as a merge input always ends up with the same mapper, for each possible modification. But we need a new MapperService for each test, because some modifications are unmergeable. And that means we have to rebuild a whole new index each time, so that we load mappers from plugins, etc. Which is slow.

romseygeek · 2020-06-04T16:23:29Z

I think eventually we will want to remove the notion of a FieldMapper necessarily having a lucene FieldType entirely - it should just say whether things should be indexed, stored, or have doc-values, and then things like termvectors and indexoptions would sit on TextFieldMapper directly (or a TermsIndexFieldMapper abstract base class). But that will require some fairly major refactoring elsewhere, particularly in the MapperService, and I'd like to keep this as targeted as possible (for a PR that currently changes 169 files!). So if we're happy with the change here as a start, then I'll begin to work on other modules and work towards a green CI.

jpountz · 2020-06-09T10:14:07Z

server/src/main/java/org/elasticsearch/index/mapper/FieldMapper.java

    protected MultiFields multiFields;
    protected CopyTo copyTo;

-    protected FieldMapper(String simpleName, MappedFieldType fieldType, MappedFieldType defaultFieldType,
+    protected FieldMapper(String simpleName, FieldType fieldType, MappedFieldType mappedFieldType,


How hard would it be to get rid of FieldType here? It's a bit annoying because usually there isn't a single FieldType that applies, e.g. numeric fields leverage both points and numeric doc values, but it's impossible to configure both on a FieldType, which is why we create different Fields for points and doc values.

Oh apologies, I just saw your previous message where you explain exactly this. +1 for progress over perfection and keeping it for a follow-up refactoring.

javanna · 2020-06-15T18:58:42Z

modules/mapper-extras/src/main/java/org/elasticsearch/index/mapper/TokenCountFieldMapper.java

-        this.enablePositionIncrements = ((TokenCountFieldMapper) other).enablePositionIncrements;
+        // TODO we should ban updating analyzers and null values as well
+        if (this.enablePositionIncrements != ((TokenCountFieldMapper)other).enablePositionIncrements) {
+            conflicts.add("mapper [" + name() + "] has a different [enable_position_increments] setting");


this seems like a new error that was not returned before?

You're right. I think this is the right choice (if you've indexed some docs while disabling increments, and then you index further docs after they have been enabled, the numbers stored in the two docs won't be comparable), but it should be a warning in 7x instead of an error.

javanna · 2020-06-15T19:37:07Z

server/src/main/java/org/elasticsearch/index/mapper/AbstractGeometryFieldMapper.java

@@ -328,13 +306,13 @@ public void parse(ParseContext context) throws IOException {
            }

            List<IndexableField> fields = new ArrayList<>();
-            if (fieldType.indexOptions() != IndexOptions.NONE || fieldType.hasDocValues()) {
+            if (mappedFieldType.isSearchable() || mappedFieldType.hasDocValues()) {


what happens with these checks once isSearchable returns true yet there is no index? Do we need to distinguish between the two?

We will need to distinguish between them in the future, yes. This is something of a shim though - the plan is to eventually move everything to parametrized mappers (see #58663) which will change how this works again.

javanna · 2020-06-15T19:38:09Z

server/src/main/java/org/elasticsearch/index/mapper/BinaryFieldMapper.java

+        @Override
+        public Builder index(boolean index) {
+            if (index) {
+                throw new MapperParsingException("Binary field [" + name() + "] cannot be indexed");


is this a new error that gets returned compared to before?

It would have previously been caught by the rather byzantine logic in the base class.

javanna · 2020-06-15T19:39:11Z

server/src/main/java/org/elasticsearch/index/mapper/CompletionFieldMapper.java

+        @Override
+        public Builder index(boolean index) {
+            if (index == false) {
+                throw new MapperParsingException("Completion field type must be indexed");


It was ignored before, should be a warning in 7x

javanna · 2020-06-15T19:44:38Z

server/src/main/java/org/elasticsearch/index/mapper/FieldMapper.java

        if (indexed != mergeWithIndexed) {
            conflicts.add("mapper [" + name() + "] has different [index] values");
        }
+        // TODO: should be validating if index options go "up" (but "down" is ok)
+        if (fieldType.indexOptions() != other.indexOptions()) {
+            conflicts.add("mapper [" + name() + "] has different [index_options] values");


new or existing error?

Hm, I thought this was an existing error but it looks like it's a new one. I'll add to the 'warnings in 7x' list.

javanna · 2020-06-15T19:46:22Z

server/src/main/java/org/elasticsearch/index/mapper/IdFieldMapper.java

@@ -160,9 +169,6 @@ public Query termsQuery(List<?> values, QueryShardContext context) {

        @Override
        public IndexFieldData.Builder fielddataBuilder(String fullyQualifiedIndexName) {
-            if (indexOptions() == IndexOptions.NONE) {
-                throw new IllegalArgumentException("Fielddata access on the _id field is disallowed");


have we changed behaviour here?

No, the ID field is always indexed (you can't configure metadata).

javanna · 2020-06-15T19:51:57Z

server/src/main/java/org/elasticsearch/index/mapper/MetadataFieldMapper.java

+        @Override
+        public T index(boolean index) {
+            if (index == false) {
+                throw new IllegalArgumentException("Metadata fields must be indexed");


It should be an assertion really - you can't configure metadata fields, so this should never be called.

javanna · 2020-06-15T19:52:55Z

server/src/main/java/org/elasticsearch/index/mapper/NumberFieldMapper.java

@@ -1114,8 +1129,8 @@ protected void doXContentBody(XContentBuilder builder, boolean includeDefaults,
            builder.field("coerce", coerce.value());
        }

-        if (includeDefaults || fieldType().nullValue() != null) {
-            builder.field("null_value", fieldType().nullValue());
+        if (nullValue != null) {


is it ok that includeDefaults is no longer checked?

javanna · 2020-06-15T20:04:31Z

server/src/main/java/org/elasticsearch/index/termvectors/TermVectorsService.java

+                    result.add(field.binaryValue().utf8ToString());
+                } else {
+                    result.add(field.stringValue());
+                }


I got lost trying to figure out if the updated if is equivalent to the previous one.

Point fields all have IndexOptions.NONE set, so we should only encounter keyword (binary) fields and text fields here. Text fields return null from field.binaryValue(), so we check that first to see if it's a keyword field and if so extract the binary value; otherwise it's a text field and so we get the string value. The tests all seem happy...

javanna · 2020-06-15T20:22:45Z

I did another round and left a bunch of comments. Most of them are around errors that I am not sure we were returning before this change.

I was also wondering if you plan on opening issues to address the newly added TODOs, it seems that there are quite some added and I fear that some may get forgotten if we don't track them somewhere.

Last but not least: in some places we replaced checking for indexOptions with calls to isSearchable. Based on that we make decisions like indexing a field, and although that works today I worry that the distinction between indexing a field and that field being searchable was good to have. Once we will make fields that are not indexed searchable, those conditionals will no longer work, though they would have worked fine looking at indexOptions.

MappedFieldType is a combination of two concerns: * an extension of lucene's FieldType, defining how a field should be indexed * a set of query factory methods, defining how a field should be searched We want to break these two concerns apart. This commit is a first step to doing this, breaking the inheritance relationship between MappedFieldType and FieldType. MappedFieldType instead has a series of boolean flags defining whether or not the field is searchable or aggregatable, and FieldMapper has a separate FieldType passed to its constructor defining how indexing should be done. Relates to elastic#56814

This was picked up in the backport of #57666 but missed in the original commit on master, leading to failures such as #58282

romseygeek · 2020-07-01T11:03:23Z

Last but not least: in some places we replaced checking for indexOptions with calls to isSearchable. Based on that we make decisions like indexing a field, and although that works today I worry that the distinction between indexing a field and that field being searchable was good to have. Once we will make fields that are not indexed searchable, those conditionals will no longer work, though they would have worked fine looking at indexOptions.

I agree, this distinction is important. One of the steps in #56814 is to make the fieldType method on FieldMapper abstract, and stop pre-building the MappedFieldType. When we do this, we'll want to change the indexing code so that it checks an internal isIndexed value, rather than calling isSearchable. #58663 should also help here.

The refactoring in #57666 inadvertently enabled norms on two of the percolator subfields, leading to an increase in memory usage. This commit disables norms on these fields again.

In #57666 we changed when null_value was parsed for ip and date fields. Previously, the null value was stored as a string, and parsed into a date or InetAddress whenever a document containing a null value was encountered. Now, the values are parsed when the mappings are built, which means that bad values are detected up front; if you try and add a mapping with a badly-parsed ip or date for a null_value, the mapping will be rejected. This causes problems for upgrades in the case when you have a badly-formed null_value in a pre-7.9 cluster. This commit fixes the upgrade case by changing the logic to only fail on indexes created in 8x and later. For earlier indexes, we log a warning on the badly formed value and ignore it, replicating the earlier behaviour. Fixes #62363

In #57666 we changed when null_value was parsed for ip and date fields. Previously, the null value was stored as a string, and parsed into a date or InetAddress whenever a document containing a null value was encountered. Now, the values are parsed when the mappings are built, which means that bad values are detected up front; if you try and add a mapping with a badly-parsed ip or date for a null_value, the mapping will be rejected. This causes problems for upgrades in the case when you have a badly-formed null_value in a pre-7.9 cluster. This commit fixes the upgrade case by changing the logic to only logging a warning on the badly formed value, replicating the earlier behaviour. Fixes #62363

romseygeek added 2 commits June 3, 2020 15:22

Cut 1: /server compiles, mapping tests pass

23a35eb

server/ tests passing

ac2cc4a

romseygeek added :Search Foundations/Mapping Index mappings, including merging and defining field types >breaking-java >refactoring v8.0.0 v7.9.0 labels Jun 4, 2020

romseygeek requested review from nik9000, jpountz and javanna June 4, 2020 13:22

romseygeek self-assigned this Jun 4, 2020

elasticmachine added the Team:Search Meta label for search team label Jun 4, 2020

romseygeek marked this pull request as draft June 4, 2020 13:22

nik9000 requested a review from jtibshirani June 4, 2020 13:36

nik9000 reviewed Jun 4, 2020

View reviewed changes

romseygeek commented Jun 4, 2020

View reviewed changes

Merge remote-tracking branch 'origin/master' into mapper/fieldtype

b18c1b9

romseygeek added 8 commits June 7, 2020 16:41

compilation

344520b

Merge remote-tracking branch 'origin/master' into mapper/fieldtype

b26fd80

mapper-extras, analytics

f331482

percolator

a146b6f

server integtests

fc79105

checkstyle

3772c31

flattened; precommit

bb8a244

tests

5e6ce42

jpountz reviewed Jun 9, 2020

View reviewed changes

javanna reviewed Jun 15, 2020

View reviewed changes

romseygeek mentioned this pull request Jun 16, 2020

MappedFieldType should not extend FieldType (#57666) #58160

Merged

jtibshirani mentioned this pull request Jun 16, 2020

Field mapping API unexpectedly returns 404 response #58188

Closed

This was referenced Jun 17, 2020

Add serialization test for FieldMappers when include_defaults=true #58235

Merged

Watcher never fuly starts in rolling upgrade tests #58282

Closed

Correct default formatting of binary fields #58338

Merged

romseygeek added a commit that referenced this pull request Jun 18, 2020

Correct default formatting of binary fields (#58338)

409306e

This was picked up in the backport of #57666 but missed in the original commit on master, leading to failures such as #58282

romseygeek mentioned this pull request Jun 23, 2020

Rework FieldMapper and MappedFieldType #56814

Closed

10 tasks

This was referenced Jul 1, 2020

FieldMapper null values, serialization and include_defaults #58823

Closed

Percolator keyword fields should not store norms #58899

Merged

romseygeek mentioned this pull request Aug 25, 2020

Regression in 7.9: using PUT mapping API to update the settings on an existing field no longer works #61393

Closed

romseygeek mentioned this pull request Sep 16, 2020

Allow empty null values for date and IP field mappers #62487

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021


		protected void assertSerializes(String indexname, T builder) throws IOException {

		// TODO can we do this without building an entire index?

MappedFieldType should not extend FieldType #57666

MappedFieldType should not extend FieldType #57666

Conversation

romseygeek commented Jun 4, 2020 • edited Loading

elasticmachine commented Jun 4, 2020

romseygeek commented Jun 4, 2020

nik9000 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nik9000 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

romseygeek commented Jun 4, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

javanna commented Jun 15, 2020

romseygeek commented Jul 1, 2020

romseygeek commented Jun 4, 2020 •

edited

Loading