-
Notifications
You must be signed in to change notification settings - Fork 64
[ML] Logistic regression loss function for boosted tree training #713
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work! I have just minor comments on improving readability.
I've now addressed all your review comments. Can you take another look @valeriy42. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Good work on writing explanation comments. I left a couple of minor comments. No need for me to look over it again.
lib/maths/CBoostedTree.cc
Outdated
|
||
// We searching for the value x which minimises | ||
// | ||
// x^* = argmin_x{ sum_i{(a_i - (p_i + x))^2} + lambda * x^2 } | ||
// | ||
// This is convex so there is one minimum where derivative w.r.t. x is zero | ||
// and x^* = 1 / (n + lambda) sum_i{ a_i - p_i }. Denoting the mean prediction | ||
// error m = 1/n sum_i{ a_i - p_i } we have x^* = n / (n + lambda) m. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job on explaining what the function does! 👍
// This is true if and only if all the predictions were identical. In this | ||
// case we only need one pass over the data and can compute the optimal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
// zero to close to one. In particular, the idea is to minimize the leaf | ||
// weight on an interval [a, b] where if we add "a" the log-odds for all | ||
// rows <= -5, i.e. max prediction + a = -5, and if we add "b" the log-odds | ||
// for all rows >= 5, i.e. min prediction + a = 5. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice explanation! 👍
Co-Authored-By: Valeriy Khakhutskyy <[email protected]>
This implements binomial logistic regression for the boosted tree. In particular, this targets cross entropy and builds a forest to predict the class log-odds.
We should also have been including sum square leaf weight penalty in the calculation of the optimum tree leaf values, since the splits are chosen targeting the regularised objective. (Note that the regularisation applies to the log-odds for logistic regression, i.e. we'll shrink the log-odds towards zero and so the predicted probabilities towards 0.5.)
I haven't wired this in yet, since that work depends on #701.