-
Notifications
You must be signed in to change notification settings - Fork 25.2k
MappedFieldType should not extend FieldType #57666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pinging @elastic/es-search (:Search/Mapping) |
Note that this only changes mappers within |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very excited! I didn't look at the whole thing but I did leave a couple of comments.
Thanks for all the extra final
stuff!
@@ -179,7 +179,7 @@ private static Analyzer getAnalyzer(AnalyzeAction.Request request, AnalysisRegis | |||
} | |||
MappedFieldType fieldType = indexService.mapperService().fieldType(request.field()); | |||
if (fieldType != null) { | |||
if (fieldType.tokenized() || fieldType instanceof KeywordFieldMapper.KeywordFieldType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I super hate all of these instanceof
s!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But you are making it better.
|
||
private String name; | ||
private final String name; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
@@ -130,7 +121,9 @@ public ValuesSourceType getValuesSourceType() { | |||
|
|||
@Override | |||
public boolean equals(Object o) { | |||
if (!super.equals(o)) return false; | |||
if (o instanceof MappedFieldType == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've always been a fan of if (o == null || getClass() != o.getClass())
for this. Every equals
implementation is ugly, but at least mine bails early if the subtypes don't line up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to check classes directly instead of using instanceof, I expect that MappedFieldType instances should never be considered equal if they are different implementations?
if (mapper == null) { | ||
return null; | ||
} | ||
if (mapper instanceof FieldMapper == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could Mapper
has a method that returns null if it isn't a FieldMapper
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really have the background to truly approve of this direction, but it feels pretty good to me.
Explicit<Boolean> ignoreZValue, Settings indexSettings, | ||
MultiFields multiFields, CopyTo copyTo) { | ||
super(simpleName, fieldType, defaultFieldType, ignoreMalformed, coerce, ignoreZValue, orientation, indexSettings, | ||
public LegacyGeoShapeFieldMapper(String simpleName, FieldType fieldType, MappedFieldType mappedFieldType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just run the formatter on the method declaration while you are here. It isn't really my favorite formatter, but we're going to hit the whole code base with it eventually.
fieldType.setIndexOptions(options); | ||
} | ||
} else { | ||
fieldType.setIndexOptions(IndexOptions.NONE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that we only have a single type helps simplify a lot here
public KeywordField(String field, BytesRef term, FieldType ft) { | ||
super(field, term, ft); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a few places that were doing instanceof checks on lucene IndexableField fieldtypes - this allows us to continue doing those for now, as we can instead do an instanceof check for KeywordField
private NamedAnalyzer indexAnalyzer; | ||
private NamedAnalyzer searchAnalyzer; | ||
private NamedAnalyzer searchQuoteAnalyzer; | ||
private SimilarityProvider similarity; | ||
private Object nullValue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nullValue and nullValueAsString are now dealt with directly by FieldMappers that require it
@@ -166,6 +167,17 @@ public DocumentMapperParser documentMapperParser() { | |||
return this.documentParser; | |||
} | |||
|
|||
public FieldType getLuceneFieldType(String field) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit of an unfortunate hack - there are still places that require access directly to the lucene field type, particularly in the highlighter code. They can almost certainly be refactored to use either information from the MappedFieldType or from the Mapper, but this PR is big enough already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++.
|
||
@Before |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Formatter and Resolution are non-modifiable, so the tests on DateFieldMapperTests replace these bits
@@ -76,10 +77,7 @@ protected AggregationBuilder createAggBuilderForTypeTest(MappedFieldType fieldTy | |||
@Override | |||
protected List<ValuesSourceType> getSupportedValuesSourceTypes() { | |||
// TODO it is likely accidental that SigText supports anything other than Bytes, and then only text fields | |||
return List.of(CoreValuesSourceType.NUMERIC, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ends up making this TODO official
@@ -70,6 +71,7 @@ public void testNoMatchingField() throws IOException { | |||
}); | |||
} | |||
|
|||
@AwaitsFix(bugUrl="https://github.com/elastic/elasticsearch/issues/57651") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it turns out that these tests didn't test what they thought they were testing, and the functionality they should have been testing for is broken. I opened #57651 to deal with them separately.
|
||
protected void assertSerializes(String indexname, T builder) throws IOException { | ||
|
||
// TODO can we do this without building an entire index? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit slow, unfortunately - the idea is to check that serializing a mapper and then using it as a merge input always ends up with the same mapper, for each possible modification. But we need a new MapperService for each test, because some modifications are unmergeable. And that means we have to rebuild a whole new index each time, so that we load mappers from plugins, etc. Which is slow.
I think eventually we will want to remove the notion of a FieldMapper necessarily having a lucene |
protected MultiFields multiFields; | ||
protected CopyTo copyTo; | ||
|
||
protected FieldMapper(String simpleName, MappedFieldType fieldType, MappedFieldType defaultFieldType, | ||
protected FieldMapper(String simpleName, FieldType fieldType, MappedFieldType mappedFieldType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How hard would it be to get rid of FieldType
here? It's a bit annoying because usually there isn't a single FieldType
that applies, e.g. numeric fields leverage both points and numeric doc values, but it's impossible to configure both on a FieldType
, which is why we create different Field
s for points and doc values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh apologies, I just saw your previous message where you explain exactly this. +1 for progress over perfection and keeping it for a follow-up refactoring.
this.enablePositionIncrements = ((TokenCountFieldMapper) other).enablePositionIncrements; | ||
// TODO we should ban updating analyzers and null values as well | ||
if (this.enablePositionIncrements != ((TokenCountFieldMapper)other).enablePositionIncrements) { | ||
conflicts.add("mapper [" + name() + "] has a different [enable_position_increments] setting"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems like a new error that was not returned before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. I think this is the right choice (if you've indexed some docs while disabling increments, and then you index further docs after they have been enabled, the numbers stored in the two docs won't be comparable), but it should be a warning in 7x instead of an error.
@@ -328,13 +306,13 @@ public void parse(ParseContext context) throws IOException { | |||
} | |||
|
|||
List<IndexableField> fields = new ArrayList<>(); | |||
if (fieldType.indexOptions() != IndexOptions.NONE || fieldType.hasDocValues()) { | |||
if (mappedFieldType.isSearchable() || mappedFieldType.hasDocValues()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens with these checks once isSearchable returns true yet there is no index? Do we need to distinguish between the two?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will need to distinguish between them in the future, yes. This is something of a shim though - the plan is to eventually move everything to parametrized mappers (see #58663) which will change how this works again.
@Override | ||
public Builder index(boolean index) { | ||
if (index) { | ||
throw new MapperParsingException("Binary field [" + name() + "] cannot be indexed"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a new error that gets returned compared to before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would have previously been caught by the rather byzantine logic in the base class.
@Override | ||
public Builder index(boolean index) { | ||
if (index == false) { | ||
throw new MapperParsingException("Completion field type must be indexed"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was ignored before, should be a warning in 7x
if (indexed != mergeWithIndexed) { | ||
conflicts.add("mapper [" + name() + "] has different [index] values"); | ||
} | ||
// TODO: should be validating if index options go "up" (but "down" is ok) | ||
if (fieldType.indexOptions() != other.indexOptions()) { | ||
conflicts.add("mapper [" + name() + "] has different [index_options] values"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new or existing error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I thought this was an existing error but it looks like it's a new one. I'll add to the 'warnings in 7x' list.
@@ -160,9 +169,6 @@ public Query termsQuery(List<?> values, QueryShardContext context) { | |||
|
|||
@Override | |||
public IndexFieldData.Builder fielddataBuilder(String fullyQualifiedIndexName) { | |||
if (indexOptions() == IndexOptions.NONE) { | |||
throw new IllegalArgumentException("Fielddata access on the _id field is disallowed"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have we changed behaviour here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the ID field is always indexed (you can't configure metadata).
@Override | ||
public T index(boolean index) { | ||
if (index == false) { | ||
throw new IllegalArgumentException("Metadata fields must be indexed"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be an assertion really - you can't configure metadata fields, so this should never be called.
@@ -1114,8 +1129,8 @@ protected void doXContentBody(XContentBuilder builder, boolean includeDefaults, | |||
builder.field("coerce", coerce.value()); | |||
} | |||
|
|||
if (includeDefaults || fieldType().nullValue() != null) { | |||
builder.field("null_value", fieldType().nullValue()); | |||
if (nullValue != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it ok that includeDefaults is no longer checked?
result.add(field.binaryValue().utf8ToString()); | ||
} else { | ||
result.add(field.stringValue()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got lost trying to figure out if the updated if is equivalent to the previous one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Point fields all have IndexOptions.NONE set, so we should only encounter keyword (binary) fields and text fields here. Text fields return null
from field.binaryValue()
, so we check that first to see if it's a keyword field and if so extract the binary value; otherwise it's a text field and so we get the string value. The tests all seem happy...
I did another round and left a bunch of comments. Most of them are around errors that I am not sure we were returning before this change. I was also wondering if you plan on opening issues to address the newly added TODOs, it seems that there are quite some added and I fear that some may get forgotten if we don't track them somewhere. Last but not least: in some places we replaced checking for |
MappedFieldType is a combination of two concerns: * an extension of lucene's FieldType, defining how a field should be indexed * a set of query factory methods, defining how a field should be searched We want to break these two concerns apart. This commit is a first step to doing this, breaking the inheritance relationship between MappedFieldType and FieldType. MappedFieldType instead has a series of boolean flags defining whether or not the field is searchable or aggregatable, and FieldMapper has a separate FieldType passed to its constructor defining how indexing should be done. Relates to elastic#56814
I agree, this distinction is important. One of the steps in #56814 is to make the |
The refactoring in #57666 inadvertently enabled norms on two of the percolator subfields, leading to an increase in memory usage. This commit disables norms on these fields again.
The refactoring in #57666 inadvertently enabled norms on two of the percolator subfields, leading to an increase in memory usage. This commit disables norms on these fields again.
In #57666 we changed when null_value was parsed for ip and date fields. Previously, the null value was stored as a string, and parsed into a date or InetAddress whenever a document containing a null value was encountered. Now, the values are parsed when the mappings are built, which means that bad values are detected up front; if you try and add a mapping with a badly-parsed ip or date for a null_value, the mapping will be rejected. This causes problems for upgrades in the case when you have a badly-formed null_value in a pre-7.9 cluster. This commit fixes the upgrade case by changing the logic to only fail on indexes created in 8x and later. For earlier indexes, we log a warning on the badly formed value and ignore it, replicating the earlier behaviour. Fixes #62363
In #57666 we changed when null_value was parsed for ip and date fields. Previously, the null value was stored as a string, and parsed into a date or InetAddress whenever a document containing a null value was encountered. Now, the values are parsed when the mappings are built, which means that bad values are detected up front; if you try and add a mapping with a badly-parsed ip or date for a null_value, the mapping will be rejected. This causes problems for upgrades in the case when you have a badly-formed null_value in a pre-7.9 cluster. This commit fixes the upgrade case by changing the logic to only logging a warning on the badly formed value, replicating the earlier behaviour. Fixes #62363
In #57666 we changed when null_value was parsed for ip and date fields. Previously, the null value was stored as a string, and parsed into a date or InetAddress whenever a document containing a null value was encountered. Now, the values are parsed when the mappings are built, which means that bad values are detected up front; if you try and add a mapping with a badly-parsed ip or date for a null_value, the mapping will be rejected. This causes problems for upgrades in the case when you have a badly-formed null_value in a pre-7.9 cluster. This commit fixes the upgrade case by changing the logic to only logging a warning on the badly formed value, replicating the earlier behaviour. Fixes #62363
MappedFieldType is a combination of two concerns:
We want to break these two concerns apart. This commit is a first step to doing this, breaking
the inheritance relationship between MappedFieldType and FieldType. MappedFieldType instead
has a series of boolean flags defining whether or not the field is searchable or aggregatable,
and FieldMapper has a separate FieldType passed to its constructor defining how indexing
should be done.
Relates to #56814