Skip to content

[ML] Exclude nested fields in data frame analytics #71400

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

dimitris-athanasiou
Copy link
Contributor

Previously, the destination index was sorted which meant it could
not have nested fields. Since this has changed, nested fields
may be present. These were handled incorrectly as the _explain API
would report that they can be included in the analysis while
that is not the case.

This commit fixes this issue by detecting nested fields and children
of those nested fields and excluding them from the analysis. A
nested field may contain multiple inner fields. To avoid the noise
in the API response, we collapse them into a single entry with the
path to the top level nested field.

Previously, the destination index was sorted which meant it could
not have `nested` fields. Since this has changed, `nested` fields
may be present. These were handled incorrectly as the _explain API
would report that they can be included in the analysis while
that is not the case.

This commit fixes this issue by detecting `nested` fields and children
of those `nested` fields and excluding them from the analysis. A
`nested` field may contain multiple inner fields. To avoid the noise
in the API response, we collapse them into a single entry with the
path to the top level nested field.
@elasticmachine elasticmachine added the Team:ML Meta label for the ML team label Apr 7, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@dimitris-athanasiou dimitris-athanasiou merged commit 9f6d3c7 into elastic:master Apr 7, 2021
@dimitris-athanasiou dimitris-athanasiou deleted the exclude-children-of-nested-fields-in-dfa branch April 7, 2021 14:48
dimitris-athanasiou added a commit that referenced this pull request Apr 7, 2021
)

Previously, the destination index was sorted which meant it could
not have `nested` fields. Since this has changed, `nested` fields
may be present. These were handled incorrectly as the _explain API
would report that they can be included in the analysis while
that is not the case.

This commit fixes this issue by detecting `nested` fields and children
of those `nested` fields and excluding them from the analysis. A
`nested` field may contain multiple inner fields. To avoid the noise
in the API response, we collapse them into a single entry with the
path to the top level nested field.

Backport of #71400
dimitris-athanasiou added a commit to dimitris-athanasiou/elasticsearch that referenced this pull request Apr 15, 2021
This fixes a bug that was introduced in elastic#71400.

The bug occurs when the _explain API is called for a
data frame analytics job and there are nested fields
that are also incompatible. In particular, we end
up calling `iterator.remove()` twice which throws
`IllegalStateException`.

I also took the chance to move the nested field check
first as I think it's more informative to explain a
field is not included due to being nested than because
it has an incompatible type in this case.

The PR is marked as `non-issue` as this has not been
released yet.
dimitris-athanasiou added a commit that referenced this pull request Apr 15, 2021
This fixes a bug that was introduced in #71400.

The bug occurs when the _explain API is called for a
data frame analytics job and there are nested fields
that are also incompatible. In particular, we end
up calling `iterator.remove()` twice which throws
`IllegalStateException`.

I also took the chance to move the nested field check
first as I think it's more informative to explain a
field is not included due to being nested than because
it has an incompatible type in this case.

The PR is marked as `non-issue` as this has not been
released yet.
dimitris-athanasiou added a commit that referenced this pull request Apr 15, 2021
…#71740)

This fixes a bug that was introduced in #71400.

The bug occurs when the _explain API is called for a
data frame analytics job and there are nested fields
that are also incompatible. In particular, we end
up calling `iterator.remove()` twice which throws
`IllegalStateException`.

I also took the chance to move the nested field check
first as I think it's more informative to explain a
field is not included due to being nested than because
it has an incompatible type in this case.

The PR is marked as `non-issue` as this has not been
released yet.

Backport of #71736
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :ml Machine learning Team:ML Meta label for the ML team v7.13.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants