Skip to content

Correct how field retrieval handles multifields and copy_to. #61309

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Aug 19, 2020

Conversation

jtibshirani
Copy link
Contributor

@jtibshirani jtibshirani commented Aug 19, 2020

Before when a value was copied to a field through a parent field or copy_to,
we parsed it using the FieldMapper from the source field. Instead we should
parse it using the target FieldMapper. This ensures that we apply the
appropriate mapping type and options to the copied value.

To implement the fix cleanly, this PR refactors the value parsing strategy. Now
instead of looking up values directly, field mappers produce a helper object
ValueFetcher. The value fetchers are responsible for almost all aspects of
fetching, including looking up the right paths in the _source.

The PR is fairly big but each commit can be reviewed individually.

Fixes #61033.

@jtibshirani jtibshirani added >bug :Search/Search Search-related issues that do not fall into other categories labels Aug 19, 2020
@jtibshirani jtibshirani changed the title Correct how 'fields' option handles multifields and copy_to. Correct how field retrieval handles multifields and copy_to. Aug 19, 2020
continue;
}

if (mapper instanceof FieldAliasMapper) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an unfortunate workaround. I think we should consider moving value fetching support from FieldMapper to MappedFieldType in a follow-up. That would clean this up, because when looking up a field's MappedFieldType we will already resolve aliases to their targets (and can avoid doing it manually here).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Before these cases asserted the wrong result. Now they capture the right
behavior, but currently fail.
This will help to implement the fix in a clean way.
Before when a value was copied to a field through a parent field or copy_to, we
parsed it using the FieldMapper from the source field. Instead we should parse
it using the target FieldMapper.

This fix works but is a bit messy. The next commit will clean up the approach.
Now SourceValueFetcher owns all aspects of looking up values from _source,
instead of requiring some information to be passed in externally.
This is no longer needed after the refactor to use ValueFetcher.
@jtibshirani jtibshirani marked this pull request as ready for review August 19, 2020 19:09
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Search)

@elasticmachine elasticmachine added the Team:Search Meta label for search team label Aug 19, 2020
@jtibshirani jtibshirani requested a review from nik9000 August 19, 2020 19:11
Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

return List.of();
}

if (parsesArrayValue) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be cleaner if this were handled by a differently named subclass. I know we have the boolean in the mapper, but it just feels funny to pass a boolean swapping out the guts of the implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It definitely seems cleaner in the abstract, but this approach matches the logic DocumentParser where we check FieldMapper#parsesArrayValue to decide what value to pass. I also think using parsesArrayValue in both places encourages field mappers to have a consistent approach between parseCreateField and parseSourceValue.

I agree this is not ideal, I really wish there were a solid way to tie together the index-time parsing and value fetching logic...

* format. This parsing logic should closely mirror the value parsing in
* {@link FieldMapper#parseCreateField} or {@link FieldMapper#parse}.
*/
protected abstract Object parseSourceValue(Object value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really think this is cleaner than passing the CheckedFunction but if you like it better that is cool with me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I think passing a function argument is readable too. It seems totally fine if the implementation evolved in that direction.

I had a slight preference for this approach since it associates a name with the function argument: when you look at an implementation like KeywordFieldMapper#valueFetcher, the purpose of the function is very obvious.

continue;
}

if (mapper instanceof FieldAliasMapper) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

if (format != null) {
throw new IllegalArgumentException("Field [" + name() + "] of type [" + typeName() + "] doesn't support formats.");
}

return fieldType().value == null
return lookup -> fieldType().value == null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fieldType.value == null ? lookup -> List.of() : lookup -> List.of(fieldType.value)? It isn't really important, but it'd make me feel good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 will update this, feeling good is worth it !

@jtibshirani
Copy link
Contributor Author

Thanks @nik9000 for the feedback! I incorporated some of your suggestions -- I'm going to merge but can happily make follow-up changes if you feel strongly about some point.

@jtibshirani jtibshirani merged commit 5457b34 into elastic:master Aug 19, 2020
@jtibshirani jtibshirani deleted the fields-source-path branch August 19, 2020 23:50
jtibshirani added a commit to jtibshirani/elasticsearch that referenced this pull request Aug 20, 2020
…#61309)

Before when a value was copied to a field through a parent field or `copy_to`,
we parsed it using the `FieldMapper` from the source field. Instead we should
parse it using the target `FieldMapper`. This ensures that we apply the
appropriate mapping type and options to the copied value.

To implement the fix cleanly, this PR refactors the value parsing strategy. Now
instead of looking up values directly, field mappers produce a helper object
`ValueFetcher`. The value fetchers are responsible for almost all aspects of
fetching, including looking up the right paths in the _source.

The PR is fairly big but each commit can be reviewed individually.

Fixes elastic#61033.
@jakelandis jakelandis removed the v8.0.0 label Jul 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team v7.10.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"fields" API return a text instead of a number for a numeric sub-field
4 participants