Skip to content

MappedFieldType should not extend FieldType #57666

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Jun 15, 2020

Conversation

romseygeek
Copy link
Contributor

@romseygeek romseygeek commented Jun 4, 2020

MappedFieldType is a combination of two concerns:

  • an extension of lucene's FieldType, defining how a field should be indexed
  • a set of query factory methods, defining how a field should be searched

We want to break these two concerns apart. This commit is a first step to doing this, breaking
the inheritance relationship between MappedFieldType and FieldType. MappedFieldType instead
has a series of boolean flags defining whether or not the field is searchable or aggregatable,
and FieldMapper has a separate FieldType passed to its constructor defining how indexing
should be done.

Relates to #56814

@romseygeek romseygeek added :Search Foundations/Mapping Index mappings, including merging and defining field types >breaking-java >refactoring v8.0.0 v7.9.0 labels Jun 4, 2020
@romseygeek romseygeek requested review from nik9000, jpountz and javanna June 4, 2020 13:22
@romseygeek romseygeek self-assigned this Jun 4, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Mapping)

@elasticmachine elasticmachine added the Team:Search Meta label for search team label Jun 4, 2020
@romseygeek romseygeek marked this pull request as draft June 4, 2020 13:22
@romseygeek
Copy link
Contributor Author

Note that this only changes mappers within server/ - I want to get some feedback before embarking on the rest of the codebase. All tests within server/ pass.

@nik9000 nik9000 requested a review from jtibshirani June 4, 2020 13:36
Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very excited! I didn't look at the whole thing but I did leave a couple of comments.

Thanks for all the extra final stuff!

@@ -179,7 +179,7 @@ private static Analyzer getAnalyzer(AnalyzeAction.Request request, AnalysisRegis
}
MappedFieldType fieldType = indexService.mapperService().fieldType(request.field());
if (fieldType != null) {
if (fieldType.tokenized() || fieldType instanceof KeywordFieldMapper.KeywordFieldType) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I super hate all of these instanceofs!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you are making it better.


private String name;
private final String name;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@@ -130,7 +121,9 @@ public ValuesSourceType getValuesSourceType() {

@Override
public boolean equals(Object o) {
if (!super.equals(o)) return false;
if (o instanceof MappedFieldType == false) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've always been a fan of if (o == null || getClass() != o.getClass()) for this. Every equals implementation is ugly, but at least mine bails early if the subtypes don't line up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to check classes directly instead of using instanceof, I expect that MappedFieldType instances should never be considered equal if they are different implementations?

if (mapper == null) {
return null;
}
if (mapper instanceof FieldMapper == false) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could Mapper has a method that returns null if it isn't a FieldMapper?

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really have the background to truly approve of this direction, but it feels pretty good to me.

Explicit<Boolean> ignoreZValue, Settings indexSettings,
MultiFields multiFields, CopyTo copyTo) {
super(simpleName, fieldType, defaultFieldType, ignoreMalformed, coerce, ignoreZValue, orientation, indexSettings,
public LegacyGeoShapeFieldMapper(String simpleName, FieldType fieldType, MappedFieldType mappedFieldType,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just run the formatter on the method declaration while you are here. It isn't really my favorite formatter, but we're going to hit the whole code base with it eventually.

fieldType.setIndexOptions(options);
}
} else {
fieldType.setIndexOptions(IndexOptions.NONE);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that we only have a single type helps simplify a lot here

public KeywordField(String field, BytesRef term, FieldType ft) {
super(field, term, ft);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a few places that were doing instanceof checks on lucene IndexableField fieldtypes - this allows us to continue doing those for now, as we can instead do an instanceof check for KeywordField

private NamedAnalyzer indexAnalyzer;
private NamedAnalyzer searchAnalyzer;
private NamedAnalyzer searchQuoteAnalyzer;
private SimilarityProvider similarity;
private Object nullValue;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nullValue and nullValueAsString are now dealt with directly by FieldMappers that require it

@@ -166,6 +167,17 @@ public DocumentMapperParser documentMapperParser() {
return this.documentParser;
}

public FieldType getLuceneFieldType(String field) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit of an unfortunate hack - there are still places that require access directly to the lucene field type, particularly in the highlighter code. They can almost certainly be refactored to use either information from the MappedFieldType or from the Mapper, but this PR is big enough already.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++.


@Before
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatter and Resolution are non-modifiable, so the tests on DateFieldMapperTests replace these bits

@@ -76,10 +77,7 @@ protected AggregationBuilder createAggBuilderForTypeTest(MappedFieldType fieldTy
@Override
protected List<ValuesSourceType> getSupportedValuesSourceTypes() {
// TODO it is likely accidental that SigText supports anything other than Bytes, and then only text fields
return List.of(CoreValuesSourceType.NUMERIC,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ends up making this TODO official

@@ -70,6 +71,7 @@ public void testNoMatchingField() throws IOException {
});
}

@AwaitsFix(bugUrl="https://github.com/elastic/elasticsearch/issues/57651")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it turns out that these tests didn't test what they thought they were testing, and the functionality they should have been testing for is broken. I opened #57651 to deal with them separately.


protected void assertSerializes(String indexname, T builder) throws IOException {

// TODO can we do this without building an entire index?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit slow, unfortunately - the idea is to check that serializing a mapper and then using it as a merge input always ends up with the same mapper, for each possible modification. But we need a new MapperService for each test, because some modifications are unmergeable. And that means we have to rebuild a whole new index each time, so that we load mappers from plugins, etc. Which is slow.

@romseygeek
Copy link
Contributor Author

I think eventually we will want to remove the notion of a FieldMapper necessarily having a lucene FieldType entirely - it should just say whether things should be indexed, stored, or have doc-values, and then things like termvectors and indexoptions would sit on TextFieldMapper directly (or a TermsIndexFieldMapper abstract base class). But that will require some fairly major refactoring elsewhere, particularly in the MapperService, and I'd like to keep this as targeted as possible (for a PR that currently changes 169 files!). So if we're happy with the change here as a start, then I'll begin to work on other modules and work towards a green CI.

protected MultiFields multiFields;
protected CopyTo copyTo;

protected FieldMapper(String simpleName, MappedFieldType fieldType, MappedFieldType defaultFieldType,
protected FieldMapper(String simpleName, FieldType fieldType, MappedFieldType mappedFieldType,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How hard would it be to get rid of FieldType here? It's a bit annoying because usually there isn't a single FieldType that applies, e.g. numeric fields leverage both points and numeric doc values, but it's impossible to configure both on a FieldType, which is why we create different Fields for points and doc values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh apologies, I just saw your previous message where you explain exactly this. +1 for progress over perfection and keeping it for a follow-up refactoring.

this.enablePositionIncrements = ((TokenCountFieldMapper) other).enablePositionIncrements;
// TODO we should ban updating analyzers and null values as well
if (this.enablePositionIncrements != ((TokenCountFieldMapper)other).enablePositionIncrements) {
conflicts.add("mapper [" + name() + "] has a different [enable_position_increments] setting");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like a new error that was not returned before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I think this is the right choice (if you've indexed some docs while disabling increments, and then you index further docs after they have been enabled, the numbers stored in the two docs won't be comparable), but it should be a warning in 7x instead of an error.

@@ -328,13 +306,13 @@ public void parse(ParseContext context) throws IOException {
}

List<IndexableField> fields = new ArrayList<>();
if (fieldType.indexOptions() != IndexOptions.NONE || fieldType.hasDocValues()) {
if (mappedFieldType.isSearchable() || mappedFieldType.hasDocValues()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens with these checks once isSearchable returns true yet there is no index? Do we need to distinguish between the two?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need to distinguish between them in the future, yes. This is something of a shim though - the plan is to eventually move everything to parametrized mappers (see #58663) which will change how this works again.

@Override
public Builder index(boolean index) {
if (index) {
throw new MapperParsingException("Binary field [" + name() + "] cannot be indexed");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a new error that gets returned compared to before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would have previously been caught by the rather byzantine logic in the base class.

@Override
public Builder index(boolean index) {
if (index == false) {
throw new MapperParsingException("Completion field type must be indexed");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was ignored before, should be a warning in 7x

if (indexed != mergeWithIndexed) {
conflicts.add("mapper [" + name() + "] has different [index] values");
}
// TODO: should be validating if index options go "up" (but "down" is ok)
if (fieldType.indexOptions() != other.indexOptions()) {
conflicts.add("mapper [" + name() + "] has different [index_options] values");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new or existing error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I thought this was an existing error but it looks like it's a new one. I'll add to the 'warnings in 7x' list.

@@ -160,9 +169,6 @@ public Query termsQuery(List<?> values, QueryShardContext context) {

@Override
public IndexFieldData.Builder fielddataBuilder(String fullyQualifiedIndexName) {
if (indexOptions() == IndexOptions.NONE) {
throw new IllegalArgumentException("Fielddata access on the _id field is disallowed");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have we changed behaviour here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the ID field is always indexed (you can't configure metadata).

@Override
public T index(boolean index) {
if (index == false) {
throw new IllegalArgumentException("Metadata fields must be indexed");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be an assertion really - you can't configure metadata fields, so this should never be called.

@@ -1114,8 +1129,8 @@ protected void doXContentBody(XContentBuilder builder, boolean includeDefaults,
builder.field("coerce", coerce.value());
}

if (includeDefaults || fieldType().nullValue() != null) {
builder.field("null_value", fieldType().nullValue());
if (nullValue != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it ok that includeDefaults is no longer checked?

result.add(field.binaryValue().utf8ToString());
} else {
result.add(field.stringValue());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got lost trying to figure out if the updated if is equivalent to the previous one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Point fields all have IndexOptions.NONE set, so we should only encounter keyword (binary) fields and text fields here. Text fields return null from field.binaryValue(), so we check that first to see if it's a keyword field and if so extract the binary value; otherwise it's a text field and so we get the string value. The tests all seem happy...

@javanna
Copy link
Member

javanna commented Jun 15, 2020

I did another round and left a bunch of comments. Most of them are around errors that I am not sure we were returning before this change.

I was also wondering if you plan on opening issues to address the newly added TODOs, it seems that there are quite some added and I fear that some may get forgotten if we don't track them somewhere.

Last but not least: in some places we replaced checking for indexOptions with calls to isSearchable. Based on that we make decisions like indexing a field, and although that works today I worry that the distinction between indexing a field and that field being searchable was good to have. Once we will make fields that are not indexed searchable, those conditionals will no longer work, though they would have worked fine looking at indexOptions.

romseygeek added a commit to romseygeek/elasticsearch that referenced this pull request Jun 16, 2020
MappedFieldType is a combination of two concerns:

* an extension of lucene's FieldType, defining how a field should be indexed
* a set of query factory methods, defining how a field should be searched

We want to break these two concerns apart. This commit is a first step to doing this, breaking
the inheritance relationship between MappedFieldType and FieldType. MappedFieldType
instead has a series of boolean flags defining whether or not the field is searchable or
aggregatable, and FieldMapper has a separate FieldType passed to its constructor defining
how indexing should be done.

Relates to elastic#56814
romseygeek added a commit that referenced this pull request Jun 18, 2020
This was picked up in the backport of #57666 but missed in the original commit
on master, leading to failures such as #58282
@romseygeek
Copy link
Contributor Author

Last but not least: in some places we replaced checking for indexOptions with calls to isSearchable. Based on that we make decisions like indexing a field, and although that works today I worry that the distinction between indexing a field and that field being searchable was good to have. Once we will make fields that are not indexed searchable, those conditionals will no longer work, though they would have worked fine looking at indexOptions.

I agree, this distinction is important. One of the steps in #56814 is to make the fieldType method on FieldMapper abstract, and stop pre-building the MappedFieldType. When we do this, we'll want to change the indexing code so that it checks an internal isIndexed value, rather than calling isSearchable. #58663 should also help here.

romseygeek added a commit that referenced this pull request Jul 2, 2020
The refactoring in #57666 inadvertently enabled norms on two of the percolator subfields,
leading to an increase in memory usage. This commit disables norms on these fields again.
romseygeek added a commit that referenced this pull request Jul 2, 2020
The refactoring in #57666 inadvertently enabled norms on two of the percolator subfields,
leading to an increase in memory usage. This commit disables norms on these fields again.
romseygeek added a commit that referenced this pull request Sep 17, 2020
In #57666 we changed when null_value was parsed for ip and date fields. Previously,
the null value was stored as a string, and parsed into a date or InetAddress whenever
a document containing a null value was encountered. Now, the values are parsed when
the mappings are built, which means that bad values are detected up front; if you try and
add a mapping with a badly-parsed ip or date for a null_value, the mapping will be
rejected.

This causes problems for upgrades in the case when you have a badly-formed null_value
in a pre-7.9 cluster. This commit fixes the upgrade case by changing the logic to only
fail on indexes created in 8x and later. For earlier indexes, we log a warning on the
badly formed value and ignore it, replicating the earlier behaviour.

Fixes #62363
romseygeek added a commit that referenced this pull request Sep 17, 2020
In #57666 we changed when null_value was parsed for ip and date fields. Previously,
the null value was stored as a string, and parsed into a date or InetAddress whenever
a document containing a null value was encountered. Now, the values are parsed when
the mappings are built, which means that bad values are detected up front; if you try and
add a mapping with a badly-parsed ip or date for a null_value, the mapping will be
rejected.

This causes problems for upgrades in the case when you have a badly-formed null_value
in a pre-7.9 cluster. This commit fixes the upgrade case by changing the logic to only
logging a warning on the badly formed value, replicating the earlier behaviour.

Fixes #62363
romseygeek added a commit that referenced this pull request Sep 17, 2020
In #57666 we changed when null_value was parsed for ip and date fields. Previously,
the null value was stored as a string, and parsed into a date or InetAddress whenever
a document containing a null value was encountered. Now, the values are parsed when
the mappings are built, which means that bad values are detected up front; if you try and
add a mapping with a badly-parsed ip or date for a null_value, the mapping will be
rejected.

This causes problems for upgrades in the case when you have a badly-formed null_value
in a pre-7.9 cluster. This commit fixes the upgrade case by changing the logic to only
logging a warning on the badly formed value, replicating the earlier behaviour.

Fixes #62363
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>breaking-java >refactoring :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team v7.9.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants