Skip to content

[Ml] Validate tree feature index is within range #52460

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Feb 19, 2020

Conversation

davidkyle
Copy link
Member

This changes the tree validation code to ensure no node in the tree has a feature index that is beyond the bounds of the feature_names array.

Specifically this handles the situation where the C++ emits a tree containing a single node and an empty feature_names list. This is valid tree used to centre the data in the ensemble but the validation code would reject this as feature_names is empty.

  "tree": {
    "feature_names": [],
    "tree_structure": [
      {
        "decision_type": "lte",
        "node_index": 0,
        "leaf_value": 146.86245256196642,
        "default_left": false
      }
    ],
    "target_type": "regression"
  }

Validation is done when the model is PUT not when the model is parsed from C++. You cannot GET a valid model and PUT it somewhere else as it will be rejected due to the validation failure. That is a broken workflow, for that reason I'm labelling this a bug and backporting to 7.6.1.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

Must have missed this during one of the refactors before release :/

@davidkyle
Copy link
Member Author

@elasticmachine update branch

@davidkyle davidkyle merged commit 7722360 into elastic:master Feb 19, 2020
@davidkyle davidkyle deleted the feature-names-validation branch February 19, 2020 12:51
davidkyle added a commit to davidkyle/elasticsearch that referenced this pull request Feb 19, 2020
This changes the tree validation code to ensure no node in the tree has a
feature index that is beyond the bounds of the feature_names array.
Specifically this handles the situation where the C++ emits a tree containing
a single node and an empty feature_names list. This is valid tree used to
centre the data in the ensemble but the validation code would reject this
as feature_names is empty. This meant a broken workflow as you cannot GET
the model and PUT it back
davidkyle added a commit to davidkyle/elasticsearch that referenced this pull request Feb 19, 2020
This changes the tree validation code to ensure no node in the tree has a
feature index that is beyond the bounds of the feature_names array.
Specifically this handles the situation where the C++ emits a tree containing
a single node and an empty feature_names list. This is valid tree used to
centre the data in the ensemble but the validation code would reject this
as feature_names is empty. This meant a broken workflow as you cannot GET
the model and PUT it back
sbourke pushed a commit to sbourke/elasticsearch that referenced this pull request Feb 19, 2020
This changes the tree validation code to ensure no node in the tree has a 
feature index that is beyond the bounds of the feature_names array. 
Specifically this handles the situation where the C++ emits a tree containing
a single node and an empty feature_names list. This is valid tree used to 
centre the data in the ensemble but the validation code would reject this 
as feature_names is empty. This meant a broken workflow as you cannot GET
the model and PUT it back
davidkyle added a commit that referenced this pull request Feb 20, 2020
This changes the tree validation code to ensure no node in the tree has a
feature index that is beyond the bounds of the feature_names array.
Specifically this handles the situation where the C++ emits a tree containing
a single node and an empty feature_names list. This is valid tree used to
centre the data in the ensemble but the validation code would reject this
as feature_names is empty. This meant a broken workflow as you cannot GET
the model and PUT it back
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants