-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Expose gain p-values in new RegressionTree interface #2658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Could we expand a but on the scenario here @klausmh ? I can understand gain, since I am aware of mulltiple internal scenarios that make use of this, but p-values was something I added ... geez... eight years ago that has never proven useful as far as I'm aware, and I think my math was a little shaky to be honest, so I'm a little surprised to hear that it is important. And prior leaf values is something that is included so that certain splits can be effectively "undone," and is not intended to be public. So I might be inclined to expose gain, but p-values and prior leaf values, I'd like to understand the specific application. |
We walked through the previous version of RegressionTree (in 0.10.0) to extract all paths in the tree, stopping if the p-value was above a threshold. Gain and prior leaf values, we use(d) for debugging purposes. (We essentially render the complete tree with this information using GraphViz.) |
I see. That was not how it was intended to be used at all; rather it was just a sort of loose test for the confidence that an observed gain was "valid." To use it in the way you were doing is not really correct, given that the boosting on subsequent trees did happen under the assumption that the split did occur. So, just stopping at arbitrary points doesn't seem mathematically valid, and I'm not sure we should encourage this use. So I'm not sure I am too excited about adding it. The second part, it just seems like you were accustomed to seeing the value, but had no particular usage of it. So what I see is I see a strong reason to hide p-value since I see evidence it was being misinterpreted and misused if (which is one of the dangers we are trying to avoid in finalizing our public surface), and I see something else that can be safely hidden, that seem fair? Gains though, these seem fine. |
@TomFinley, @klausmh, so what's our decision? Only gain is required? |
For the new RegressionTree interface introduced with 0.11, can you also include gain p-values, gain, and "previous leaf value"?
The text was updated successfully, but these errors were encountered: