Expose gain p-values in new RegressionTree interface #2658

klausmh · 2019-02-20T17:05:35Z

For the new RegressionTree interface introduced with 0.11, can you also include gain p-values, gain, and "previous leaf value"?

eerhardt · 2019-02-20T18:03:59Z

/cc @najeeb-kazmi @wschin @Ivanidzo4ka

Ivanidzo4ka · 2019-02-20T18:43:58Z

Yes we can. Looks we just lost them in #2243 but I can be wrong, @wschin can you comment?

TomFinley · 2019-02-20T21:43:34Z

Could we expand a but on the scenario here @klausmh ?

I can understand gain, since I am aware of mulltiple internal scenarios that make use of this, but p-values was something I added ... geez... eight years ago that has never proven useful as far as I'm aware, and I think my math was a little shaky to be honest, so I'm a little surprised to hear that it is important. And prior leaf values is something that is included so that certain splits can be effectively "undone," and is not intended to be public.

So I might be inclined to expose gain, but p-values and prior leaf values, I'd like to understand the specific application.

klausmh · 2019-02-20T21:49:23Z

We walked through the previous version of RegressionTree (in 0.10.0) to extract all paths in the tree, stopping if the p-value was above a threshold.

Gain and prior leaf values, we use(d) for debugging purposes. (We essentially render the complete tree with this information using GraphViz.)

TomFinley · 2019-02-20T22:47:51Z

I see. That was not how it was intended to be used at all; rather it was just a sort of loose test for the confidence that an observed gain was "valid." To use it in the way you were doing is not really correct, given that the boosting on subsequent trees did happen under the assumption that the split did occur. So, just stopping at arbitrary points doesn't seem mathematically valid, and I'm not sure we should encourage this use. So I'm not sure I am too excited about adding it.

The second part, it just seems like you were accustomed to seeing the value, but had no particular usage of it.

So what I see is I see a strong reason to hide p-value since I see evidence it was being misinterpreted and misused if (which is one of the dangers we are trying to avoid in finalizing our public surface), and I see something else that can be safely hidden, that seem fair?

Gains though, these seem fine.

wschin · 2019-02-26T00:04:41Z

@TomFinley, @klausmh, so what's our decision? Only gain is required?

Ivanidzo4ka assigned wschin Feb 20, 2019

Ivanidzo4ka added the API Issues pertaining the friendly API label Feb 20, 2019

shauheen added this to the 0219 milestone Feb 21, 2019

wschin mentioned this issue Feb 26, 2019

Expose split gains in tree models #2747

Merged

wschin closed this as completed in #2747 Feb 26, 2019

ghost locked as resolved and limited conversation to collaborators Mar 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose gain p-values in new RegressionTree interface #2658

Expose gain p-values in new RegressionTree interface #2658

klausmh commented Feb 20, 2019 •

edited

Loading

eerhardt commented Feb 20, 2019

Ivanidzo4ka commented Feb 20, 2019

TomFinley commented Feb 20, 2019

klausmh commented Feb 20, 2019

TomFinley commented Feb 20, 2019

wschin commented Feb 26, 2019

Expose gain p-values in new RegressionTree interface #2658

Expose gain p-values in new RegressionTree interface #2658

Comments

klausmh commented Feb 20, 2019 • edited Loading

eerhardt commented Feb 20, 2019

Ivanidzo4ka commented Feb 20, 2019

TomFinley commented Feb 20, 2019

klausmh commented Feb 20, 2019

TomFinley commented Feb 20, 2019

wschin commented Feb 26, 2019

klausmh commented Feb 20, 2019 •

edited

Loading