Skip to content

Expose gain p-values in new RegressionTree interface #2658

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
klausmh opened this issue Feb 20, 2019 · 6 comments
Closed

Expose gain p-values in new RegressionTree interface #2658

klausmh opened this issue Feb 20, 2019 · 6 comments
Assignees
Labels
API Issues pertaining the friendly API
Milestone

Comments

@klausmh
Copy link
Contributor

klausmh commented Feb 20, 2019

For the new RegressionTree interface introduced with 0.11, can you also include gain p-values, gain, and "previous leaf value"?

@eerhardt
Copy link
Member

/cc @najeeb-kazmi @wschin @Ivanidzo4ka

@Ivanidzo4ka
Copy link
Contributor

Yes we can. Looks we just lost them in #2243 but I can be wrong, @wschin can you comment?

@Ivanidzo4ka Ivanidzo4ka added the API Issues pertaining the friendly API label Feb 20, 2019
@TomFinley
Copy link
Contributor

Could we expand a but on the scenario here @klausmh ?

I can understand gain, since I am aware of mulltiple internal scenarios that make use of this, but p-values was something I added ... geez... eight years ago that has never proven useful as far as I'm aware, and I think my math was a little shaky to be honest, so I'm a little surprised to hear that it is important. And prior leaf values is something that is included so that certain splits can be effectively "undone," and is not intended to be public.

So I might be inclined to expose gain, but p-values and prior leaf values, I'd like to understand the specific application.

@klausmh
Copy link
Contributor Author

klausmh commented Feb 20, 2019

We walked through the previous version of RegressionTree (in 0.10.0) to extract all paths in the tree, stopping if the p-value was above a threshold.

Gain and prior leaf values, we use(d) for debugging purposes. (We essentially render the complete tree with this information using GraphViz.)

@TomFinley
Copy link
Contributor

I see. That was not how it was intended to be used at all; rather it was just a sort of loose test for the confidence that an observed gain was "valid." To use it in the way you were doing is not really correct, given that the boosting on subsequent trees did happen under the assumption that the split did occur. So, just stopping at arbitrary points doesn't seem mathematically valid, and I'm not sure we should encourage this use. So I'm not sure I am too excited about adding it.

The second part, it just seems like you were accustomed to seeing the value, but had no particular usage of it.

So what I see is I see a strong reason to hide p-value since I see evidence it was being misinterpreted and misused if (which is one of the dangers we are trying to avoid in finalizing our public surface), and I see something else that can be safely hidden, that seem fair?

Gains though, these seem fine.

@shauheen shauheen added this to the 0219 milestone Feb 21, 2019
@wschin
Copy link
Member

wschin commented Feb 26, 2019

@TomFinley, @klausmh, so what's our decision? Only gain is required?

@ghost ghost locked as resolved and limited conversation to collaborators Mar 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API
Projects
None yet
Development

No branches or pull requests

6 participants