-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Rework geo mappers to index value by value #71696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pinging @elastic/es-search (Team:Search) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
geo_point is the only mapper that actually parses multifields (passing a geohash String down as an external value) and all of the other mappers just silently ignore any defined multifields. As a followup we should deprecate adding multifields to those mappers that do not support it, and throw an error in v8.
} | ||
// TODO phase out geohash (which is currently used in the CompletionSuggester) | ||
multiFields.parse(this, context.createExternalValueContext(geometry.geohash())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: this changes the way that multiple values are sent to sub fields, in that we now call multifields.parse() once for each value, instead of combining them all together. This should not cause any change in behaviour for completion fields because the completion field mapper by default uses an analyzer that splits the combined values up; however, it will change the output of keyword subfields. The new behaviour is in line with what all other field mappers do so I think it's worth the change, but we should document it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think this new behavior is more intuitive. It seemed weird before that we would pass the string [geohash1, geohash2, ...]
to subfields. Could we add a quick test to document/ solidify the new behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a test to GeoPointFieldMapperTests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This could also be considered a breaking change, I agree we should document it.
@@ -35,12 +33,14 @@ public void testExternalValues() throws Exception { | |||
assertThat(doc.rootDoc().getField("field.bool"), notNullValue()); | |||
assertThat(doc.rootDoc().getField("field.bool").stringValue(), is("T")); | |||
|
|||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ExternalFieldMapper tests that values set by a parent field mapper are propagated down to its multifields for a few field types, including a point and shape mapper. However, no other mapper sets points or shapes as an external field value, so this test was the only place that was using this logic for geo fields. Given that it is never used in production code and adds a fair amount of complexity to the indexing logic, I've removed it and commented out the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think this logic is okay to remove. Like @romseygeek, I couldn't find any user-facing behavior this would change. I guess plugin authors could have been relying on this, but it seems very unlikely. In general, we'd like to remove support for external value parsing (#56063), and this is in line with that goal.
One note: could we just remove these test checks? There's not a strong reason to leave the logic here if we don't plan to restore it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still this is a breaking change so maybe we need to document it accordingly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've marked it as >breaking and will add notes to the migration docs in the backport PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not deeply familiar with the geo mapping code, but I looked over the the changes we discussed around external value parsing and they make sense to me.
|
||
/** | ||
* Parses the given value, then formats it according to the 'format' string. | ||
* | ||
* Used by value fetchers to validate and format geo objects | ||
*/ | ||
public Object parseAndFormatObject(Object value, String format) { | ||
Parsed geometry; | ||
Object[] formatted = new Object[1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it'd be more straightforward to use SetOnce
instead of an array.
@@ -1068,14 +1068,14 @@ public void testWithDynamicTemplates() throws Exception { | |||
ParsedDocument doc = mapper.parse(source("1", b -> b.field(field, "41.12,-71.34"), null, Map.of(field, "points"))); | |||
IndexableField[] fields = doc.rootDoc().getFields(field); | |||
assertThat(fields, arrayWithSize(2)); | |||
assertThat(fields[0].fieldType(), sameInstance(LatLonDocValuesField.TYPE)); | |||
assertThat(fields[1].fieldType(), sameInstance(LatLonPoint.TYPE)); | |||
assertThat(fields[1].fieldType(), sameInstance(LatLonDocValuesField.TYPE)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small comment, probably clearest to keep the order where we check field[0]
then field[1]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++, I've swapped these around so that they're in order
@@ -35,12 +33,14 @@ public void testExternalValues() throws Exception { | |||
assertThat(doc.rootDoc().getField("field.bool"), notNullValue()); | |||
assertThat(doc.rootDoc().getField("field.bool").stringValue(), is("T")); | |||
|
|||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think this logic is okay to remove. Like @romseygeek, I couldn't find any user-facing behavior this would change. I guess plugin authors could have been relying on this, but it seems very unlikely. In general, we'd like to remove support for external value parsing (#56063), and this is in line with that goal.
One note: could we just remove these test checks? There's not a strong reason to leave the logic here if we don't plan to restore it.
} | ||
// TODO phase out geohash (which is currently used in the CompletionSuggester) | ||
multiFields.parse(this, context.createExternalValueContext(geometry.geohash())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think this new behavior is more intuitive. It seemed weird before that we would pass the string [geohash1, geohash2, ...]
to subfields. Could we add a quick test to document/ solidify the new behavior?
} | ||
context.addIgnoredField(mappedFieldType.name()); | ||
} | ||
public final void parse(ParseContext context) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should remove the comment on this method as there is not parsing logic and add documentation ton the new index() method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me! I'll defer to @iverase for the final approval since I'm not an expert on this code.
One suggestion: I think it'd help to describe the two breaking changes in a PR comment (or in the description) -- this makes it much easier to track down for release notes.
Thanks @jtibshirani. I've updated the PR description to include the two breaking changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, I think this changes make easier to understand the mappers. Thanks!
The various geo field mappers are organised in a hierarchy that shares parsing and indexing code. This ends up over-complicating things, particularly when we have some mappers that accept multiple values and others that only accept singletons. It also leads to confusing behaviour around ignore_malformed behaviour: geo fields will ignore all values if a single one is badly formed, while all other field mappers will only ignore the problem value and index the rest. Finally, this structure makes adding index-time scripts to geo_point needlessly complex. This commit refactors the indexing logic of the hierarchy to move the individual value indexing logic into the concrete implementations, and aligns the ignore_malformed behaviour with that of other mappers. It contains two breaking changes: * The geo field mappers no longer check for external field values on the parse context. This added considerable complication to the refactored parse methods, and is unused anywhere in our codebase, but may impact plugin-based field mappers which expect to use geo fields as multifields * The geo_point field mapper now passes geohashes to its multifields one-by-one, instead of formatting them into a comma-delimited string and passing them all at once. Completion multifields using this as an input should still behave as normal because by default they would split this combined geohash string on the commas in any case, but keyword subfields may look different. Fixes elastic#69601
The various geo field mappers are organised in a hierarchy that shares parsing and indexing code. This ends up over-complicating things, particularly when we have some mappers that accept multiple values and others that only accept singletons. It also leads to confusing behaviour around ignore_malformed behaviour: geo fields will ignore all values if a single one is badly formed, while all other field mappers will only ignore the problem value and index the rest. Finally, this structure makes adding index-time scripts to geo_point needlessly complex. This commit refactors the indexing logic of the hierarchy to move the individual value indexing logic into the concrete implementations, and aligns the ignore_malformed behaviour with that of other mappers. It contains two breaking changes: * The geo field mappers no longer check for external field values on the parse context. This added considerable complication to the refactored parse methods, and is unused anywhere in our codebase, but may impact plugin-based field mappers which expect to use geo fields as multifields * The geo_point field mapper now passes geohashes to its multifields one-by-one, instead of formatting them into a comma-delimited string and passing them all at once. Completion multifields using this as an input should still behave as normal because by default they would split this combined geohash string on the commas in any case, but keyword subfields may look different. Fixes #69601
elastic#71696 introduced a regression to the various shape field mappers, where they would no longer handle null values. This commit fixes that regression and adds a testNullValues method to MapperTestCase to ensure that all field mappers correctly handle nulls. Fixes elastic#71874
The various geo field mappers are organised in a hierarchy that shares
parsing and indexing code. This ends up over-complicating things,
particularly when we have some mappers that accept multiple values
and others that only accept singletons. It also leads to confusing
behaviour around
ignore_malformed
behaviour: geo fields will ignoreall values if a single one is badly formed, while all other field mappers
will only ignore the problem value and index the rest. Finally, this
structure makes adding index-time scripts to geo_point needlessly
complex.
This commit refactors the indexing logic of the hierarchy to move the
individual value indexing logic into the concrete implementations,
and aligns the
ignore_malformed
behaviour with that of other mappers.It contains two breaking changes:
parse context. This added considerable complication to the refactored
parse methods, and is unused anywhere in our codebase, but may
impact plugin-based field mappers which expect to use geo fields
as multifields
one-by-one, instead of formatting them into a comma-delimited
string and passing them all at once. Completion multifields using
this as an input should still behave as normal because by default
they would split this combined geohash string on the commas in any
case, but keyword subfields may look different.
Fixes #69601