Skip to content

Remove 'external values', and replace with swapped out XContentParsers #72203

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 29, 2021

Conversation

romseygeek
Copy link
Contributor

@romseygeek romseygeek commented Apr 26, 2021

The majority of field mappers read a single value from their positioned
XContentParser, and do not need to call nextToken. There is a general
assumption that the same holds for any multifields defined on them, and
so the XContentParser is passed down to their multifields builder as-is.
This assumption does not hold for mappers that accept json objects,
and so we have a second mechanism for passing values around called
'external values', where a mapper can set a specific value on its context
and child mappers can then check for these external values before reading
from xcontent. The disadvantage of this is that every field mapper now
needs to check its context for external values. Because the values are
defined by their java class, we can also know that in the vast majority of
cases this functionality is unused. We have only two mappers that actually
make use of this, CompletionFieldMapper and GeoPointFieldMapper.

This commit removes external values entirely, and replaces it with the ability
to pass a modified XContentParser to multifields. FieldMappers can just check
the parser attached to their context for data and don't need to worry about
multiple sources.

Plugins implementing field mappers will need to take the removal of external
values into account. Implementations that are passing structured objects
as external values should instead use ParseContext.switchParser and
wrap the objects using MapXContentParser.wrapObject().

GeoPointFieldMapper passes on a fake parser that just wraps its input data
formatted as a geohash; CompletionFieldMapper has a slightly more complicated
parser that in general wraps its metadata, but if textOrNull() is called without
the parser being advanced just returns its text input.

Relates to #56063

@romseygeek romseygeek added :Search Foundations/Mapping Index mappings, including merging and defining field types >breaking-java >refactoring v8.0.0 v7.14.0 labels Apr 26, 2021
@romseygeek romseygeek self-assigned this Apr 26, 2021
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Apr 26, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@romseygeek
Copy link
Contributor Author

This doesn't remove the special handling in GeoPointFieldMapper and CompletionFieldMapper, but it preserves that behaviour while cleaning up all the other field mappers so I think it's worth it as a first step. We can look at better ways of dealing with multiple completion setups separately.

Copy link
Contributor Author

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a followup I think it would be a good idea to always pass a wrapped XContentParser to multifields.parse() that throws an exception on nextToken(), to ensure that subfields don't inadvertently advance the parent parser.

@@ -117,9 +117,6 @@ public RankFeaturesFieldType fieldType() {

@Override
public void parse(ParseContext context) throws IOException {
if (context.externalValueSet()) {
throw new IllegalArgumentException("[rank_features] fields can't be used in multi-fields");
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved these checks (where they existed) to the TypeParser so they are now caught at mapping time, instead of when you try and index your first document. There are probably more places where we could do this, but for now I've limited it to just those mappers that were already checking. I've added tests for them as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice clean-up!

@jtibshirani
Copy link
Contributor

jtibshirani commented Apr 26, 2021

Thanks @romseygeek for tackling #56063, it is an important simplification.

GeoPointFieldMapper passes on a fake parser that just wraps its input data formatted as a geohash, CompletionFieldMapper has a slightly more complicated parser that in general wraps its metadata

For me an important goal of #56063 is to actually remove support for this entirely, whereas this PR proposes to still support it (but in a more solid/ elegant way). The problem is that this flexibility makes it hard to understand what data ended up getting indexed for a field, since a parent field is allowed to supply literally anything as an 'external value'. In fact the fields option doesn't currently work correctly on a multi-field with a geopoint as a parent.

When the analytics+geo team discussed the issue, I think there was support for removing the ability for geopoints to pass geohashes to their subfields. And in our discussion with @jimczi he clarified that we don't have a strong use case for completion multi-fields (#34081) -- we added it preemptively, not in response to user feedback. So since all current uses of external values aren't necessary, we could just entirely remove the ability to set special data to be passed to a multi-field.

@romseygeek
Copy link
Contributor Author

For me an important goal of #56063 is to actually remove support for this entirely, whereas this PR proposes to still support it (but in a more solid/ elegant way)

I agree! But I don't think we can remove it in one shot, we need to keep supporting this for indexes created in 7x and forbid new indexes created using it, etc. So I view this PR as a first step, clearing up the general case and narrowing down the specific exceptions to only affect the two classes in question. We can follow up by preventing multifields on geo_point fields for new indexes. Removing it from completion field mappers might be a bit more complicated, but that's why I think this PR is worth doing to begin with, as it lets us consider that in isolation from the broader cleanup.

@jtibshirani
Copy link
Contributor

It sounds like we're on the same page. I was just confused because I saw "Closes #56063" which will close the issue automatically. This refactor is really nice in that it removes the need for each field mapper to check external values. Only the parent mappers geopoint and completion need to consider special multi-field parsing.

Is this the rough plan? Maybe we could add a list like this to #56063 to clarify.

  1. Merge this PR, which cleans up external value parsing, making it easier to maintain going forward. It also forces plugin authors to stop using external values.
  2. Deprecate ability to have string-like field as subfields on geopoints. Or maybe deprecate multi-fields on geopoints entirely?
  3. Deprecate having completion multi-fields, as these aren't helpful.
  4. Eventually remove support for ParseContext.switchParser once 2 and 3 are fully removed.

@@ -117,9 +117,6 @@ public RankFeaturesFieldType fieldType() {

@Override
public void parse(ParseContext context) throws IOException {
if (context.externalValueSet()) {
throw new IllegalArgumentException("[rank_features] fields can't be used in multi-fields");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice clean-up!

@@ -304,6 +306,11 @@ public void testCompletionWithContextAndSubCompletion() throws Exception {
contextSuggestField("timmy"),
contextSuggestField("starbucks")
));
try (TokenStream ts = indexableFields.getFields("field.subsuggest")[0].tokenStream(Lucene.WHITESPACE_ANALYZER, null)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain what this tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my initial implementation contexts weren't getting transferred correctly to subfields, and iterating over the field tokenstream would throw an exception. This was picked up by rest tests, but we didn't have a unit test for it. As it's not easy to check that the field has been correctly built via introspection, I used this as a way to trigger the bug in a unit test so I could track it down more easily. I'll add a comment, I think it's probably worth keeping here just because it fails faster than a rest test.

@@ -572,6 +562,15 @@ public ContentPath path() {
};
}

public final ParseContext switchParser(XContentParser parser) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should mark this as deprecated, with a note we are actively deprecating + removing usages of it. That will prevent us from adding more usages accidentally. If there are any plugin authors using external values (which I hope there aren't), this will discourage them from just switching onto this method, which we hope to eventually remove.


private static class CompletionParser extends FilterXContentParser {

boolean advanced = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is trickier than I realized! I really hope we can eventually remove it.

@romseygeek
Copy link
Contributor Author

I saw "Closes #56063" which will close the issue

That was definitely an error! I've changed it to relates to so that we keep the parent issue open.

Is this the rough plan?

Precisely so, yes. I'll update #56063 with your steps so that we can keep track of things.

@@ -596,6 +596,12 @@ public ContentPath path() {
};
}

/**
* We are actively deprecating and removing the ability to pass complex objects
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comment, I think we usually style it like @deprecated <explanation text>

@romseygeek romseygeek merged commit b27eaa3 into elastic:master Apr 29, 2021
@romseygeek romseygeek deleted the mapper/externalvalues branch April 29, 2021 08:17
romseygeek added a commit to romseygeek/elasticsearch that referenced this pull request Apr 29, 2021
elastic#72203)

The majority of field mappers read a single value from their positioned
XContentParser, and do not need to call nextToken. There is a general
assumption that the same holds for any multifields defined on them, and
so the XContentParser is passed down to their multifields builder as-is.
This assumption does not hold for mappers that accept json objects,
and so we have a second mechanism for passing values around called
'external values', where a mapper can set a specific value on its context
and child mappers can then check for these external values before reading
from xcontent. The disadvantage of this is that every field mapper now
needs to check its context for external values. Because the values are
defined by their java class, we can also know that in the vast majority of
cases this functionality is unused. We have only two mappers that actually
make use of this, CompletionFieldMapper and GeoPointFieldMapper.

This commit removes external values entirely, and replaces it with the ability
to pass a modified XContentParser to multifields. FieldMappers can just check
the parser attached to their context for data and don't need to worry about
multiple sources.

Plugins implementing field mappers will need to take the removal of external
values into account. Implementations that are passing structured objects
as external values should instead use ParseContext.switchParser and
wrap the objects using MapXContentParser.wrapObject().

GeoPointFieldMapper passes on a fake parser that just wraps its input data
formatted as a geohash; CompletionFieldMapper has a slightly more complicated
parser that in general wraps its metadata, but if textOrNull() is called without
the parser being advanced just returns its text input.

Relates to elastic#56063
romseygeek added a commit that referenced this pull request Apr 29, 2021
#72203) (#72448)

The majority of field mappers read a single value from their positioned
XContentParser, and do not need to call nextToken. There is a general
assumption that the same holds for any multifields defined on them, and
so the XContentParser is passed down to their multifields builder as-is.
This assumption does not hold for mappers that accept json objects,
and so we have a second mechanism for passing values around called
'external values', where a mapper can set a specific value on its context
and child mappers can then check for these external values before reading
from xcontent. The disadvantage of this is that every field mapper now
needs to check its context for external values. Because the values are
defined by their java class, we can also know that in the vast majority of
cases this functionality is unused. We have only two mappers that actually
make use of this, CompletionFieldMapper and GeoPointFieldMapper.

This commit removes external values entirely, and replaces it with the ability
to pass a modified XContentParser to multifields. FieldMappers can just check
the parser attached to their context for data and don't need to worry about
multiple sources.

Plugins implementing field mappers will need to take the removal of external
values into account. Implementations that are passing structured objects
as external values should instead use ParseContext.switchParser and
wrap the objects using MapXContentParser.wrapObject().

GeoPointFieldMapper passes on a fake parser that just wraps its input data
formatted as a geohash; CompletionFieldMapper has a slightly more complicated
parser that in general wraps its metadata, but if textOrNull() is called without
the parser being advanced just returns its text input.

Relates to #56063
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>breaking-java >refactoring :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team v7.14.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants