Skip to content

[ML] Logistic regression loss function for boosted tree training #713

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Oct 9, 2019

Conversation

tveasey
Copy link
Contributor

@tveasey tveasey commented Oct 2, 2019

This implements binomial logistic regression for the boosted tree. In particular, this targets cross entropy and builds a forest to predict the class log-odds.

We should also have been including sum square leaf weight penalty in the calculation of the optimum tree leaf values, since the splits are chosen targeting the regularised objective. (Note that the regularisation applies to the log-odds for logistic regression, i.e. we'll shrink the log-odds towards zero and so the predicted probabilities towards 0.5.)

I haven't wired this in yet, since that work depends on #701.

Copy link
Contributor

@valeriy42 valeriy42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work! I have just minor comments on improving readability.

@tveasey
Copy link
Contributor Author

tveasey commented Oct 9, 2019

I've now addressed all your review comments. Can you take another look @valeriy42.

Copy link
Contributor

@valeriy42 valeriy42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Good work on writing explanation comments. I left a couple of minor comments. No need for me to look over it again.

Comment on lines 65 to 73

// We searching for the value x which minimises
//
// x^* = argmin_x{ sum_i{(a_i - (p_i + x))^2} + lambda * x^2 }
//
// This is convex so there is one minimum where derivative w.r.t. x is zero
// and x^* = 1 / (n + lambda) sum_i{ a_i - p_i }. Denoting the mean prediction
// error m = 1/n sum_i{ a_i - p_i } we have x^* = n / (n + lambda) m.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job on explaining what the function does! 👍

Comment on lines +136 to +137
// This is true if and only if all the predictions were identical. In this
// case we only need one pass over the data and can compute the optimal
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines +175 to +178
// zero to close to one. In particular, the idea is to minimize the leaf
// weight on an interval [a, b] where if we add "a" the log-odds for all
// rows <= -5, i.e. max prediction + a = -5, and if we add "b" the log-odds
// for all rows >= 5, i.e. min prediction + a = 5.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice explanation! 👍

Co-Authored-By: Valeriy Khakhutskyy <[email protected]>
@tveasey tveasey merged commit 60c9e02 into elastic:master Oct 9, 2019
@tveasey tveasey deleted the logistic-regression branch October 9, 2019 13:26
tveasey added a commit to tveasey/ml-cpp-1 that referenced this pull request Oct 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants