SLEP006 on Sample Properties #16

jnothman · 2019-03-07T11:09:00Z

No description provided.

adrinjalali · 2019-03-07T13:06:14Z

and (somehow) move slep004 to rejected?

adrinjalali · 2019-03-07T13:11:41Z

slep006/proposal.rst

+  validation
+* (maybe in scope) passing sample properties (e.g. `sample_weight`) to some
+  scorers and not others in a multi-metric cross-validation setup
+* (likley out of scope) passing sample properties to non-fit methods, for


is this particularly "harder" to implement than the others?

Well firstly the use cases for it will need further definition; we don't currently pass around anything like weights to predict or transform methods. But yes it is hard in part because we have fused method like fit_transform

yeah, I guess the sample_weight usecases maybe less frequent than things which take gender or race into account. There a predict may be some postprocessing on the output of another estimator based on these sample properties.

adrinjalali · 2019-04-04T10:34:47Z

Could I somehow be of any help here @jnothman ?

SSaishruthi · 2019-06-28T20:32:41Z

Hi,

Is there any place that I can help with this SLEP to move forward so that we can accelerate AIF360 and scikit-learn integration?

The issue has been referenced above.

@animeshsingh

animeshsingh · 2019-07-01T21:07:19Z

@adrinjalali @jnothman any updates on this will be useful - something we need in the context of AIF360 work we are doing?

jnothman · 2019-07-01T21:51:40Z

There are lots of competing proposals and I will need to find some time to write them up.

jnothman · 2019-08-05T00:01:27Z

I consider each of the solutions here a family of solutions, rather than an entirely specific syntax. The way forward involves defining a possible syntax for each, then coding up each of the test cases for each solution.

jnothman · 2019-08-05T12:33:42Z

Apparently I was pushing to the wrong remote...

adrinjalali · 2019-08-06T15:37:30Z

slep006/proposal.rst

+
+* we are able to consider kwargs to `fit` that are not sample-aligned, so that
+  we can add further functionality (some that have been proposed:
+  `with_warm_start`, `feature_names_in`, `feature_meta`).


seems like feature_names_in_ would be through NamedArray?

adrinjalali · 2019-08-06T15:46:13Z

slep006/proposal.rst

+Benefits:
+
+* This will not need to affect legacy estimators, since no props will be
+  passed.


we can also think about the cases where sample_weights is not passed now, but that's because we've forgotten to pass it along (not sure how many of those we have left though).

adrinjalali · 2019-08-06T15:47:31Z

slep006/proposal.rst

+* The implementation changes in meta-estimators may be easy to provide via a
+  helper or two (perhaps even `call_with_props(method, target, props)`).
+* Easy to reconfigure what props an estimator gets in a grid search.
+* Could make use of existing `**fit_params` syntax rather than introducing new


this is kinda orthogonal to this solution, isn't it? I still like the idea of passing things that are not sample aligned.

Yes, it's orthogonal, but the benefit of reusing **fit_params syntax is that legacy estimators inheriting from BaseEstimator will automatically work; no need to require that they accept props.

Also, because Solution 4 requires estimators to explicitly declare that a fit parameter is a sample prop, it's possible (at least in theory) for other fit parameters to not be sample-aligned, and to be passed by some other mechanism...

adrinjalali · 2019-08-06T15:52:36Z

slep006/proposal.rst

+    from sklearn.metrics import accuracy
+
+    # Case A:
+    weighted_acc = make_scorer(accuracy, request_props=['sample_weight'])


I still don't think having the same argument in the __init__ of all estimators would be a bad idea.

We can also probably call it sample_props instead. It'd be probably more intuitive for the users, or even required_sample_props.

adrinjalali · 2019-08-06T15:53:43Z

Awesome, I really like solution 4.

NicolasHug · 2019-10-07T13:14:06Z

What's the status of this? Does it need more reviews? LMK if I can help in any way

jnothman · 2019-10-07T13:40:27Z

What's the status of this? Does it need more reviews? LMK if I can help in any way

It needs to go from conceptual approaches to example code of each use case... I'm unlikely to find time this month.

hermidalc

There are also important use cases for needing to pass test sample metadata to transform calls. One use case that I’m needing it for is batch effect correction methods, where I need to know which batch each test sample is in along with the fitted train parameters to perform the test transform.

I’ve looked over the BaseSearchCV and Pipeline codebase and a possible issue with using a multiple **kwargs argument like **transform_params is that how do we keep it separate from the existing **predict_params?

This might be one argument for using a single kwarg such as sample_prop across the board.

hermidalc · 2019-10-22T03:16:07Z

Also forgot in relation to my comment above for needing to pass test metadata, this is also related to your bullet point on needing to pass test sample_weight to scorers during CV. Looking at https://github.com/scikit-learn/scikit-learn/blob/5c9f0906102e4677b045744a24228b6c57a6c471/sklearn/model_selection/_validation.py#L490-L493 the information is there where the code to split **fit_params for train indices should also be done for test indices and then that passed to _score. Either way we need to pass test sample metadata through here so that scorer can pass it to all the predict-like methods.

Though again what do we call these... **predict_params or **transform_params? Unlike **fit_params the appropriate name isn’t obvious. Some test sample metadata might be used in transform only and some in predict only but all predict-like methods will need to pass them thru.

jnothman · 2020-06-18T06:04:40Z

I'm keen to push this soon towards vote, so that we can consider @adrinjalali's PR for v0.25 (2021Q2). Needs some review. Then I can review Successive Halving, and the world will be a better place. Are you with me, @adrinjalali and @hermidalc?

adrinjalali · 2020-06-18T07:04:46Z

Yeah I'm down.

thomasjpfan · 2020-06-18T14:48:39Z

slep006/proposal.rst

+Having considered the above solutions, we propose:
+
+TODO
+
+* which solution?


Are we leaning toward solution 4 and using prop?

I think we're leaning towards solution 4, and I personally prefer **kwargs kinda thing rather than a dict.

I've been using an implementation based on @jnothman's solution 3 in my own sklearn extensions and code for a long time. I also personally prefer **kwargs over dict. One thing that I've found to be crucial is that whatever framework/API is implemented it needs to support which properties you want to pass to each of fit/transform/predict/score etc. Sometimes you only want it to go to fit, sometimes both fit and transform, sometimes only predict, etc.

The implementation in scikit-learn/scikit-learn#16079 supports this.

The syntax I've currently got in this proposal for Soln 4 currently doesn't cover the "send to transform only" case. Adrin has adopted something more like:

trs = MyTrs().set_props_request( None).set_props_request( {'fit': {'new_param': 'my_sw'}})

My apologies, I've not had a use case yet where you need to send a param only to transform, but I've had all the others. Though I would say, and you guy know this better than I do, that it's maybe a good idea to handle use cases not yet encountered... I'm not sure up to you. Intuitively I don't see a case where a param goes to transform and not to fit, but who knows.

BTW does this Solution 4 handle fit_transform properly? This is something I've had to handle extending from @jnothman's Solution 3

I totally forgot how this is made easier by having kwarg only arguments! It might have actually been one of the initial motivations lol.

jnothman

It might be nice for this to be merged and rendered, and available for more incremental improvements soon. Not sure what the right time/process for merging is.

adrinjalali · 2020-06-28T16:57:04Z

We haven't passed it, but we kinda sorta agreed that if the author and another maintainer are happy with it being merged in "draft" status. So I'm happy to merge and work on it on separate issues.

jnothman · 2020-06-29T00:17:06Z

Then let's merge before the monthly meeting?

Starting to draft SLEP006 on Sample Properties

2763686

adrinjalali reviewed Mar 7, 2019

View reviewed changes

adrinjalali mentioned this pull request Apr 4, 2019

CalibratedClassifierCV doesn't support groups parameter in fit method scikit-learn/scikit-learn#12052

Closed

hoffmansc mentioned this pull request Apr 5, 2019

make estimators and scorers sklearn compatible Trusted-AI/AIF360#58

Open

iter

9c95886

animeshsingh mentioned this pull request Jun 25, 2019

Make AIF360 default bias checker and mitigator in scikit-learn scikit-learn/scikit-learn#14181

Open

11 tasks

jnothman added 2 commits July 3, 2019 20:52

WIP

00b3067

WIP

d4c3374

amueller mentioned this pull request Aug 1, 2019

CalibratedClassifierCV doesn't interact properly with Pipeline estimators scikit-learn/scikit-learn#8710

Closed

a fourth solution and a little more fleshing... still no code examples.

373aa23

Code examples using Solution 4

9060a30

adrinjalali reviewed Aug 6, 2019

View reviewed changes

A couple of cross-references

d62a3d0

jnothman mentioned this pull request Sep 3, 2019

Fitting TransformedTargetRegressor with sample_weight in Pipeline scikit-learn/scikit-learn#13349

Closed

jnothman mentioned this pull request Oct 17, 2019

How to create a custom transformer where you need some test sample metadata to transform test with learned train parameters scikit-learn/scikit-learn#15282

Closed

hermidalc reviewed Oct 22, 2019

View reviewed changes

This was referenced Oct 24, 2019

Pandas in, Pandas out? scikit-learn/scikit-learn#5523

Closed

Sample properties support in transform methods with **transform_params scikit-learn/scikit-learn#15370

Closed

amueller mentioned this pull request Nov 6, 2019

SLEP 8: Propagating feature names #18

Closed

adrinjalali mentioned this pull request Jan 9, 2020

[WIP] sample props (proposal 4) scikit-learn/scikit-learn#16079

Closed

jnothman added 3 commits June 18, 2020 14:22

WIP

99213a4

Merge branch 'master' into props06

d4c7a47

Filling out example code

c1d6d1e

jnothman marked this pull request as ready for review June 18, 2020 06:08

Note handling of misspelled keys

f5f7f03

thomasjpfan reviewed Jun 18, 2020

View reviewed changes

jnothman added 4 commits June 19, 2020 13:37

Note the status quo hacks

dfe7d66

new code

8bbcb72

Small additions including section on nomenclature

4dd69fd

Some more thoughts on backwards compatibility

fa06f4a

jnothman commented Jun 27, 2020

View reviewed changes

Note on potential for mixed keys

ac09c64

adrinjalali merged commit 6772931 into master Jun 29, 2020

jnothman mentioned this pull request Jul 30, 2020

Support fit_params in stacking scikit-learn/scikit-learn#18028

Closed

NicolasHug mentioned this pull request Nov 30, 2021

slep000, slep workflow #30

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SLEP006 on Sample Properties #16

SLEP006 on Sample Properties #16

jnothman commented Mar 7, 2019

adrinjalali commented Mar 7, 2019

adrinjalali Mar 7, 2019

jnothman Mar 7, 2019

adrinjalali Mar 8, 2019

adrinjalali commented Apr 4, 2019

SSaishruthi commented Jun 28, 2019

animeshsingh commented Jul 1, 2019

jnothman commented Jul 1, 2019 via email

jnothman commented Aug 5, 2019

jnothman commented Aug 5, 2019

adrinjalali Aug 6, 2019

adrinjalali Aug 6, 2019

adrinjalali Aug 6, 2019

jnothman Aug 7, 2019

adrinjalali Aug 6, 2019

adrinjalali commented Aug 6, 2019

NicolasHug commented Oct 7, 2019

jnothman commented Oct 7, 2019

hermidalc left a comment •

edited

Loading

hermidalc commented Oct 22, 2019 •

edited

Loading

jnothman commented Jun 18, 2020

adrinjalali commented Jun 18, 2020

thomasjpfan Jun 18, 2020

adrinjalali Jun 18, 2020

hermidalc Jun 19, 2020

adrinjalali Jun 19, 2020

jnothman Jun 20, 2020

hermidalc Jun 20, 2020

amueller Jun 20, 2020 •

edited

Loading

jnothman left a comment

adrinjalali commented Jun 28, 2020

jnothman commented Jun 29, 2020 via email

SLEP006 on Sample Properties #16

SLEP006 on Sample Properties #16

Conversation

jnothman commented Mar 7, 2019

adrinjalali commented Mar 7, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adrinjalali commented Apr 4, 2019

SSaishruthi commented Jun 28, 2019

animeshsingh commented Jul 1, 2019

jnothman commented Jul 1, 2019 via email

jnothman commented Aug 5, 2019

jnothman commented Aug 5, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adrinjalali commented Aug 6, 2019

NicolasHug commented Oct 7, 2019

jnothman commented Oct 7, 2019

hermidalc left a comment • edited Loading

Choose a reason for hiding this comment

hermidalc commented Oct 22, 2019 • edited Loading

jnothman commented Jun 18, 2020

adrinjalali commented Jun 18, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amueller Jun 20, 2020 • edited Loading

Choose a reason for hiding this comment

jnothman left a comment

Choose a reason for hiding this comment

adrinjalali commented Jun 28, 2020

jnothman commented Jun 29, 2020 via email

hermidalc left a comment •

edited

Loading

hermidalc commented Oct 22, 2019 •

edited

Loading

amueller Jun 20, 2020 •

edited

Loading