Skip to content

Improve the ratio explanation in the docstring of the samplers #406

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
amueller opened this issue Feb 27, 2018 · 4 comments · Fixed by #413
Closed

Improve the ratio explanation in the docstring of the samplers #406

amueller opened this issue Feb 27, 2018 · 4 comments · Fixed by #413
Labels
For: Documentation Indicates a need for improvements or additions to documentation good first issue Indicates a good issue for first-time contributors Status: Help Wanted Indicates that a maintainer wants help on an issue or pull request

Comments

@amueller
Copy link
Member

http://contrib.scikit-learn.org/imbalanced-learn/dev/generated/imblearn.ensemble.BalancedBaggingClassifier.html#imblearn.ensemble.BalancedBaggingClassifier

'auto': correspond to 'all' with for over-sampling methods and 'not minority' for under-sampling methods. The classes targeted will be over-sampled or under-sampled to achieve an equal number of sample with the majority or minority class.

This is always undersampling in BalancedBaggingClassifier, right? So auto is always 'not minority'? If so, I feel this docstring can be simplified.

@glemaitre
Copy link
Member

True. This is copy paste from other sampler. We should probably address that in all samplers.
Thanks.

@glemaitre glemaitre added For: Documentation Indicates a need for improvements or additions to documentation Status: Help Wanted Indicates that a maintainer wants help on an issue or pull request good first issue Indicates a good issue for first-time contributors labels Feb 27, 2018
@glemaitre
Copy link
Member

Having some feedback it seems that our docstring is not good enough regarding the ratio.
I think that we should avoid to mention over-/under- sampling option whenever it is not useful and also, we should link to the example in the documentation.

Better explanation regarding the possibility to give a dict should be explained.

@glemaitre glemaitre changed the title BalancedBaggingClassifier ratio explanation odd Improve the ratio explanation in the docstring of the samplers Mar 1, 2018
@glemaitre
Copy link
Member

@amueller I would appreciate some input regarding the ratio argument.

Naming

ratio does not apply anymore. It was true at the 0.0.1 of imblearn but we support multi classes since then. Do you think it is worth to rename it to something more related to what it does. Any input is welcomed. ping @jorisvandenbossche since it does not like the naming ;)

API

Right now, we can pass some string, dict and callable. We decided to do so to not have 2 arguments (targeted classes and #samples to get) avoid tedious sanity check between those arguments (length, order, etc.). Right now, we have an issue with the cleaning under-sampling method which discard the number of samples to select and just considered the class. This is bringing some confusion to users.

So if you see a way to make this parameter easier to manipulate and intuitive for the end-user, it would be great.

@glemaitre
Copy link
Member

Regarding API I moved the discussion #411

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
For: Documentation Indicates a need for improvements or additions to documentation good first issue Indicates a good issue for first-time contributors Status: Help Wanted Indicates that a maintainer wants help on an issue or pull request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants