Skip to content

Rework geo mappers to index value by value #71696

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 19, 2021

Conversation

romseygeek
Copy link
Contributor

@romseygeek romseygeek commented Apr 14, 2021

The various geo field mappers are organised in a hierarchy that shares
parsing and indexing code. This ends up over-complicating things,
particularly when we have some mappers that accept multiple values
and others that only accept singletons. It also leads to confusing
behaviour around ignore_malformed behaviour: geo fields will ignore
all values if a single one is badly formed, while all other field mappers
will only ignore the problem value and index the rest. Finally, this
structure makes adding index-time scripts to geo_point needlessly
complex.

This commit refactors the indexing logic of the hierarchy to move the
individual value indexing logic into the concrete implementations,
and aligns the ignore_malformed behaviour with that of other mappers.

It contains two breaking changes:

  • the geo field mappers no longer check for external field values on the
    parse context. This added considerable complication to the refactored
    parse methods, and is unused anywhere in our codebase, but may
    impact plugin-based field mappers which expect to use geo fields
    as multifields
  • the geo_point field mapper now passes geohashes to its multifields
    one-by-one, instead of formatting them into a comma-delimited
    string and passing them all at once. Completion multifields using
    this as an input should still behave as normal because by default
    they would split this combined geohash string on the commas in any
    case, but keyword subfields may look different.

Fixes #69601

@romseygeek romseygeek added :Search Foundations/Mapping Index mappings, including merging and defining field types >refactoring v8.0.0 v7.13.0 labels Apr 14, 2021
@romseygeek romseygeek self-assigned this Apr 14, 2021
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Apr 14, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

Copy link
Contributor Author

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

geo_point is the only mapper that actually parses multifields (passing a geohash String down as an external value) and all of the other mappers just silently ignore any defined multifields. As a followup we should deprecate adding multifields to those mappers that do not support it, and throw an error in v8.

}
// TODO phase out geohash (which is currently used in the CompletionSuggester)
multiFields.parse(this, context.createExternalValueContext(geometry.geohash()));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this changes the way that multiple values are sent to sub fields, in that we now call multifields.parse() once for each value, instead of combining them all together. This should not cause any change in behaviour for completion fields because the completion field mapper by default uses an analyzer that splits the combined values up; however, it will change the output of keyword subfields. The new behaviour is in line with what all other field mappers do so I think it's worth the change, but we should document it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think this new behavior is more intuitive. It seemed weird before that we would pass the string [geohash1, geohash2, ...] to subfields. Could we add a quick test to document/ solidify the new behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a test to GeoPointFieldMapperTests

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This could also be considered a breaking change, I agree we should document it.

@@ -35,12 +33,14 @@ public void testExternalValues() throws Exception {
assertThat(doc.rootDoc().getField("field.bool"), notNullValue());
assertThat(doc.rootDoc().getField("field.bool").stringValue(), is("T"));

/*
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ExternalFieldMapper tests that values set by a parent field mapper are propagated down to its multifields for a few field types, including a point and shape mapper. However, no other mapper sets points or shapes as an external field value, so this test was the only place that was using this logic for geo fields. Given that it is never used in production code and adds a fair amount of complexity to the indexing logic, I've removed it and commented out the tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think this logic is okay to remove. Like @romseygeek, I couldn't find any user-facing behavior this would change. I guess plugin authors could have been relying on this, but it seems very unlikely. In general, we'd like to remove support for external value parsing (#56063), and this is in line with that goal.

One note: could we just remove these test checks? There's not a strong reason to leave the logic here if we don't plan to restore it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still this is a breaking change so maybe we need to document it accordingly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've marked it as >breaking and will add notes to the migration docs in the backport PR

Copy link
Contributor

@jtibshirani jtibshirani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not deeply familiar with the geo mapping code, but I looked over the the changes we discussed around external value parsing and they make sense to me.


/**
* Parses the given value, then formats it according to the 'format' string.
*
* Used by value fetchers to validate and format geo objects
*/
public Object parseAndFormatObject(Object value, String format) {
Parsed geometry;
Object[] formatted = new Object[1];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it'd be more straightforward to use SetOnce instead of an array.

@@ -1068,14 +1068,14 @@ public void testWithDynamicTemplates() throws Exception {
ParsedDocument doc = mapper.parse(source("1", b -> b.field(field, "41.12,-71.34"), null, Map.of(field, "points")));
IndexableField[] fields = doc.rootDoc().getFields(field);
assertThat(fields, arrayWithSize(2));
assertThat(fields[0].fieldType(), sameInstance(LatLonDocValuesField.TYPE));
assertThat(fields[1].fieldType(), sameInstance(LatLonPoint.TYPE));
assertThat(fields[1].fieldType(), sameInstance(LatLonDocValuesField.TYPE));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comment, probably clearest to keep the order where we check field[0] then field[1]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++, I've swapped these around so that they're in order

@@ -35,12 +33,14 @@ public void testExternalValues() throws Exception {
assertThat(doc.rootDoc().getField("field.bool"), notNullValue());
assertThat(doc.rootDoc().getField("field.bool").stringValue(), is("T"));

/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think this logic is okay to remove. Like @romseygeek, I couldn't find any user-facing behavior this would change. I guess plugin authors could have been relying on this, but it seems very unlikely. In general, we'd like to remove support for external value parsing (#56063), and this is in line with that goal.

One note: could we just remove these test checks? There's not a strong reason to leave the logic here if we don't plan to restore it.

}
// TODO phase out geohash (which is currently used in the CompletionSuggester)
multiFields.parse(this, context.createExternalValueContext(geometry.geohash()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think this new behavior is more intuitive. It seemed weird before that we would pass the string [geohash1, geohash2, ...] to subfields. Could we add a quick test to document/ solidify the new behavior?

}
context.addIgnoredField(mappedFieldType.name());
}
public final void parse(ParseContext context) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should remove the comment on this method as there is not parsing logic and add documentation ton the new index() method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

Copy link
Contributor

@jtibshirani jtibshirani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me! I'll defer to @iverase for the final approval since I'm not an expert on this code.

One suggestion: I think it'd help to describe the two breaking changes in a PR comment (or in the description) -- this makes it much easier to track down for release notes.

@romseygeek
Copy link
Contributor Author

Thanks @jtibshirani. I've updated the PR description to include the two breaking changes.

@romseygeek romseygeek requested a review from iverase April 19, 2021 07:54
Copy link
Contributor

@iverase iverase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, I think this changes make easier to understand the mappers. Thanks!

@romseygeek romseygeek merged commit 289d202 into elastic:master Apr 19, 2021
@romseygeek romseygeek deleted the mapper/geo-indexing branch April 19, 2021 11:38
romseygeek added a commit to romseygeek/elasticsearch that referenced this pull request Apr 19, 2021
The various geo field mappers are organised in a hierarchy that shares
parsing and indexing code. This ends up over-complicating things,
particularly when we have some mappers that accept multiple values
and others that only accept singletons. It also leads to confusing
behaviour around ignore_malformed behaviour: geo fields will ignore
all values if a single one is badly formed, while all other field mappers
will only ignore the problem value and index the rest. Finally, this
structure makes adding index-time scripts to geo_point needlessly
complex.

This commit refactors the indexing logic of the hierarchy to move the
individual value indexing logic into the concrete implementations,
and aligns the ignore_malformed behaviour with that of other mappers.

It contains two breaking changes:

* The geo field mappers no longer check for external field values on the
  parse context. This added considerable complication to the refactored
  parse methods, and is unused anywhere in our codebase, but may
  impact plugin-based field mappers which expect to use geo fields
  as multifields
* The geo_point field mapper now passes geohashes to its multifields
  one-by-one, instead of formatting them into a comma-delimited
  string and passing them all at once. Completion multifields using
  this as an input should still behave as normal because by default
  they would split this combined geohash string on the commas in any
  case, but keyword subfields may look different.

Fixes elastic#69601
romseygeek added a commit that referenced this pull request Apr 19, 2021
The various geo field mappers are organised in a hierarchy that shares
parsing and indexing code. This ends up over-complicating things,
particularly when we have some mappers that accept multiple values
and others that only accept singletons. It also leads to confusing
behaviour around ignore_malformed behaviour: geo fields will ignore
all values if a single one is badly formed, while all other field mappers
will only ignore the problem value and index the rest. Finally, this
structure makes adding index-time scripts to geo_point needlessly
complex.

This commit refactors the indexing logic of the hierarchy to move the
individual value indexing logic into the concrete implementations,
and aligns the ignore_malformed behaviour with that of other mappers.

It contains two breaking changes:

* The geo field mappers no longer check for external field values on the
  parse context. This added considerable complication to the refactored
  parse methods, and is unused anywhere in our codebase, but may
  impact plugin-based field mappers which expect to use geo fields
  as multifields
* The geo_point field mapper now passes geohashes to its multifields
  one-by-one, instead of formatting them into a comma-delimited
  string and passing them all at once. Completion multifields using
  this as an input should still behave as normal because by default
  they would split this combined geohash string on the commas in any
  case, but keyword subfields may look different.

Fixes #69601
romseygeek added a commit to romseygeek/elasticsearch that referenced this pull request Apr 20, 2021
romseygeek added a commit that referenced this pull request Apr 20, 2021
romseygeek added a commit that referenced this pull request Apr 21, 2021
#71696 introduced a regression to the various shape field mappers,
where they would no longer handle null values. This commit fixes
that regression and adds a testNullValues method to MapperTestCase
to ensure that all field mappers correctly handle nulls.

Fixes #71874
romseygeek added a commit to romseygeek/elasticsearch that referenced this pull request Apr 21, 2021
elastic#71696 introduced a regression to the various shape field mappers,
where they would no longer handle null values. This commit fixes
that regression and adds a testNullValues method to MapperTestCase
to ensure that all field mappers correctly handle nulls.

Fixes elastic#71874
romseygeek added a commit that referenced this pull request Apr 22, 2021
#71696 introduced a regression to the various shape field mappers,
where they would no longer handle null values. This commit fixes
that regression and adds a testNullValues method to MapperTestCase
to ensure that all field mappers correctly handle nulls.

Fixes #71874
romseygeek added a commit that referenced this pull request Apr 22, 2021
#71696 introduced a regression to the various shape field mappers,
where they would no longer handle null values. This commit fixes
that regression and adds a testNullValues method to MapperTestCase
to ensure that all field mappers correctly handle nulls.

Fixes #71874
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>breaking >refactoring :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team v7.13.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Point field mappers have odd ignore_malformed semantics
5 participants