-
-
Notifications
You must be signed in to change notification settings - Fork 34
SLEP006 on Sample Properties #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
and (somehow) move slep004 to rejected? |
validation | ||
* (maybe in scope) passing sample properties (e.g. `sample_weight`) to some | ||
scorers and not others in a multi-metric cross-validation setup | ||
* (likley out of scope) passing sample properties to non-fit methods, for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this particularly "harder" to implement than the others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well firstly the use cases for it will need further definition; we don't currently pass around anything like weights to predict or transform methods. But yes it is hard in part because we have fused method like fit_transform
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I guess the sample_weight
usecases maybe less frequent than things which take gender
or race
into account. There a predict
may be some postprocessing on the output of another estimator based on these sample properties.
Could I somehow be of any help here @jnothman ? |
Hi, Is there any place that I can help with this SLEP to move forward so that we can accelerate AIF360 and scikit-learn integration? The issue has been referenced above. |
@adrinjalali @jnothman any updates on this will be useful - something we need in the context of AIF360 work we are doing? |
There are lots of competing proposals and I will need to find some time to
write them up.
|
I consider each of the solutions here a family of solutions, rather than an entirely specific syntax. The way forward involves defining a possible syntax for each, then coding up each of the test cases for each solution. |
Apparently I was pushing to the wrong remote... |
|
||
* we are able to consider kwargs to `fit` that are not sample-aligned, so that | ||
we can add further functionality (some that have been proposed: | ||
`with_warm_start`, `feature_names_in`, `feature_meta`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like feature_names_in_
would be through NamedArray
?
slep006/proposal.rst
Outdated
Benefits: | ||
|
||
* This will not need to affect legacy estimators, since no props will be | ||
passed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can also think about the cases where sample_weights
is not passed now, but that's because we've forgotten to pass it along (not sure how many of those we have left though).
* The implementation changes in meta-estimators may be easy to provide via a | ||
helper or two (perhaps even `call_with_props(method, target, props)`). | ||
* Easy to reconfigure what props an estimator gets in a grid search. | ||
* Could make use of existing `**fit_params` syntax rather than introducing new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is kinda orthogonal to this solution, isn't it? I still like the idea of passing things that are not sample aligned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's orthogonal, but the benefit of reusing **fit_params
syntax is that legacy estimators inheriting from BaseEstimator will automatically work; no need to require that they accept props
.
Also, because Solution 4 requires estimators to explicitly declare that a fit parameter is a sample prop, it's possible (at least in theory) for other fit parameters to not be sample-aligned, and to be passed by some other mechanism...
slep006/proposal.rst
Outdated
from sklearn.metrics import accuracy | ||
|
||
# Case A: | ||
weighted_acc = make_scorer(accuracy, request_props=['sample_weight']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't think having the same argument in the __init__
of all estimators would be a bad idea.
We can also probably call it sample_props
instead. It'd be probably more intuitive for the users, or even required_sample_props
.
Awesome, I really like solution 4. |
What's the status of this? Does it need more reviews? LMK if I can help in any way |
It needs to go from conceptual approaches to example code of each use case... I'm unlikely to find time this month. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are also important use cases for needing to pass test sample metadata to transform
calls. One use case that I’m needing it for is batch effect correction methods, where I need to know which batch each test sample is in along with the fitted train parameters to perform the test transform.
I’ve looked over the BaseSearchCV and Pipeline codebase and a possible issue with using a multiple **kwargs
argument like **transform_params
is that how do we keep it separate from the existing **predict_params
?
This might be one argument for using a single kwarg such as sample_prop
across the board.
Also forgot in relation to my comment above for needing to pass test metadata, this is also related to your bullet point on needing to pass test Though again what do we call these... |
I'm keen to push this soon towards vote, so that we can consider @adrinjalali's PR for v0.25 (2021Q2). Needs some review. Then I can review Successive Halving, and the world will be a better place. Are you with me, @adrinjalali and @hermidalc? |
Yeah I'm down. |
Having considered the above solutions, we propose: | ||
|
||
TODO | ||
|
||
* which solution? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we leaning toward solution 4 and using prop
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're leaning towards solution 4, and I personally prefer **kwargs kinda thing rather than a dict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been using an implementation based on @jnothman's solution 3 in my own sklearn extensions and code for a long time. I also personally prefer **kwargs over dict. One thing that I've found to be crucial is that whatever framework/API is implemented it needs to support which properties you want to pass to each of fit/transform/predict/score etc. Sometimes you only want it to go to fit, sometimes both fit and transform, sometimes only predict, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation in scikit-learn/scikit-learn#16079 supports this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The syntax I've currently got in this proposal for Soln 4 currently doesn't cover the "send to transform only" case. Adrin has adopted something more like:
trs = MyTrs().set_props_request(
None).set_props_request(
{'fit': {'new_param': 'my_sw'}})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My apologies, I've not had a use case yet where you need to send a param only to transform
, but I've had all the others. Though I would say, and you guy know this better than I do, that it's maybe a good idea to handle use cases not yet encountered... I'm not sure up to you. Intuitively I don't see a case where a param goes to transform
and not to fit
, but who knows.
BTW does this Solution 4 handle fit_transform
properly? This is something I've had to handle extending from @jnothman's Solution 3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally forgot how this is made easier by having kwarg only arguments! It might have actually been one of the initial motivations lol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be nice for this to be merged and rendered, and available for more incremental improvements soon. Not sure what the right time/process for merging is.
We haven't passed it, but we kinda sorta agreed that if the author and another maintainer are happy with it being merged in "draft" status. So I'm happy to merge and work on it on separate issues. |
Then let's merge before the monthly meeting?
|
No description provided.