-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Remove 'external values', and replace with swapped out XContentParsers #72203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pinging @elastic/es-search (Team:Search) |
This doesn't remove the special handling in GeoPointFieldMapper and CompletionFieldMapper, but it preserves that behaviour while cleaning up all the other field mappers so I think it's worth it as a first step. We can look at better ways of dealing with multiple completion setups separately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a followup I think it would be a good idea to always pass a wrapped XContentParser to multifields.parse() that throws an exception on nextToken(), to ensure that subfields don't inadvertently advance the parent parser.
@@ -117,9 +117,6 @@ public RankFeaturesFieldType fieldType() { | |||
|
|||
@Override | |||
public void parse(ParseContext context) throws IOException { | |||
if (context.externalValueSet()) { | |||
throw new IllegalArgumentException("[rank_features] fields can't be used in multi-fields"); | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved these checks (where they existed) to the TypeParser so they are now caught at mapping time, instead of when you try and index your first document. There are probably more places where we could do this, but for now I've limited it to just those mappers that were already checking. I've added tests for them as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice clean-up!
Thanks @romseygeek for tackling #56063, it is an important simplification.
For me an important goal of #56063 is to actually remove support for this entirely, whereas this PR proposes to still support it (but in a more solid/ elegant way). The problem is that this flexibility makes it hard to understand what data ended up getting indexed for a field, since a parent field is allowed to supply literally anything as an 'external value'. In fact the When the analytics+geo team discussed the issue, I think there was support for removing the ability for geopoints to pass geohashes to their subfields. And in our discussion with @jimczi he clarified that we don't have a strong use case for completion multi-fields (#34081) -- we added it preemptively, not in response to user feedback. So since all current uses of external values aren't necessary, we could just entirely remove the ability to set special data to be passed to a multi-field. |
I agree! But I don't think we can remove it in one shot, we need to keep supporting this for indexes created in 7x and forbid new indexes created using it, etc. So I view this PR as a first step, clearing up the general case and narrowing down the specific exceptions to only affect the two classes in question. We can follow up by preventing multifields on geo_point fields for new indexes. Removing it from completion field mappers might be a bit more complicated, but that's why I think this PR is worth doing to begin with, as it lets us consider that in isolation from the broader cleanup. |
It sounds like we're on the same page. I was just confused because I saw "Closes #56063" which will close the issue automatically. This refactor is really nice in that it removes the need for each field mapper to check external values. Only the parent mappers geopoint and completion need to consider special multi-field parsing. Is this the rough plan? Maybe we could add a list like this to #56063 to clarify.
|
@@ -117,9 +117,6 @@ public RankFeaturesFieldType fieldType() { | |||
|
|||
@Override | |||
public void parse(ParseContext context) throws IOException { | |||
if (context.externalValueSet()) { | |||
throw new IllegalArgumentException("[rank_features] fields can't be used in multi-fields"); | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice clean-up!
@@ -304,6 +306,11 @@ public void testCompletionWithContextAndSubCompletion() throws Exception { | |||
contextSuggestField("timmy"), | |||
contextSuggestField("starbucks") | |||
)); | |||
try (TokenStream ts = indexableFields.getFields("field.subsuggest")[0].tokenStream(Lucene.WHITESPACE_ANALYZER, null)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain what this tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my initial implementation contexts weren't getting transferred correctly to subfields, and iterating over the field tokenstream would throw an exception. This was picked up by rest tests, but we didn't have a unit test for it. As it's not easy to check that the field has been correctly built via introspection, I used this as a way to trigger the bug in a unit test so I could track it down more easily. I'll add a comment, I think it's probably worth keeping here just because it fails faster than a rest test.
@@ -572,6 +562,15 @@ public ContentPath path() { | |||
}; | |||
} | |||
|
|||
public final ParseContext switchParser(XContentParser parser) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should mark this as deprecated, with a note we are actively deprecating + removing usages of it. That will prevent us from adding more usages accidentally. If there are any plugin authors using external values (which I hope there aren't), this will discourage them from just switching onto this method, which we hope to eventually remove.
|
||
private static class CompletionParser extends FilterXContentParser { | ||
|
||
boolean advanced = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is trickier than I realized! I really hope we can eventually remove it.
@@ -596,6 +596,12 @@ public ContentPath path() { | |||
}; | |||
} | |||
|
|||
/** | |||
* We are actively deprecating and removing the ability to pass complex objects |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small comment, I think we usually style it like @deprecated <explanation text>
elastic#72203) The majority of field mappers read a single value from their positioned XContentParser, and do not need to call nextToken. There is a general assumption that the same holds for any multifields defined on them, and so the XContentParser is passed down to their multifields builder as-is. This assumption does not hold for mappers that accept json objects, and so we have a second mechanism for passing values around called 'external values', where a mapper can set a specific value on its context and child mappers can then check for these external values before reading from xcontent. The disadvantage of this is that every field mapper now needs to check its context for external values. Because the values are defined by their java class, we can also know that in the vast majority of cases this functionality is unused. We have only two mappers that actually make use of this, CompletionFieldMapper and GeoPointFieldMapper. This commit removes external values entirely, and replaces it with the ability to pass a modified XContentParser to multifields. FieldMappers can just check the parser attached to their context for data and don't need to worry about multiple sources. Plugins implementing field mappers will need to take the removal of external values into account. Implementations that are passing structured objects as external values should instead use ParseContext.switchParser and wrap the objects using MapXContentParser.wrapObject(). GeoPointFieldMapper passes on a fake parser that just wraps its input data formatted as a geohash; CompletionFieldMapper has a slightly more complicated parser that in general wraps its metadata, but if textOrNull() is called without the parser being advanced just returns its text input. Relates to elastic#56063
#72203) (#72448) The majority of field mappers read a single value from their positioned XContentParser, and do not need to call nextToken. There is a general assumption that the same holds for any multifields defined on them, and so the XContentParser is passed down to their multifields builder as-is. This assumption does not hold for mappers that accept json objects, and so we have a second mechanism for passing values around called 'external values', where a mapper can set a specific value on its context and child mappers can then check for these external values before reading from xcontent. The disadvantage of this is that every field mapper now needs to check its context for external values. Because the values are defined by their java class, we can also know that in the vast majority of cases this functionality is unused. We have only two mappers that actually make use of this, CompletionFieldMapper and GeoPointFieldMapper. This commit removes external values entirely, and replaces it with the ability to pass a modified XContentParser to multifields. FieldMappers can just check the parser attached to their context for data and don't need to worry about multiple sources. Plugins implementing field mappers will need to take the removal of external values into account. Implementations that are passing structured objects as external values should instead use ParseContext.switchParser and wrap the objects using MapXContentParser.wrapObject(). GeoPointFieldMapper passes on a fake parser that just wraps its input data formatted as a geohash; CompletionFieldMapper has a slightly more complicated parser that in general wraps its metadata, but if textOrNull() is called without the parser being advanced just returns its text input. Relates to #56063
The majority of field mappers read a single value from their positioned
XContentParser, and do not need to call
nextToken
. There is a generalassumption that the same holds for any multifields defined on them, and
so the XContentParser is passed down to their multifields builder as-is.
This assumption does not hold for mappers that accept json objects,
and so we have a second mechanism for passing values around called
'external values', where a mapper can set a specific value on its context
and child mappers can then check for these external values before reading
from xcontent. The disadvantage of this is that every field mapper now
needs to check its context for external values. Because the values are
defined by their java class, we can also know that in the vast majority of
cases this functionality is unused. We have only two mappers that actually
make use of this, CompletionFieldMapper and GeoPointFieldMapper.
This commit removes external values entirely, and replaces it with the ability
to pass a modified XContentParser to multifields. FieldMappers can just check
the parser attached to their context for data and don't need to worry about
multiple sources.
Plugins implementing field mappers will need to take the removal of external
values into account. Implementations that are passing structured objects
as external values should instead use
ParseContext.switchParser
andwrap the objects using
MapXContentParser.wrapObject()
.GeoPointFieldMapper passes on a fake parser that just wraps its input data
formatted as a geohash; CompletionFieldMapper has a slightly more complicated
parser that in general wraps its metadata, but if
textOrNull()
is called withoutthe parser being advanced just returns its text input.
Relates to #56063