-
-
Notifications
You must be signed in to change notification settings - Fork 34
VOTE SLEP018 - Pandas Output for Transformers #72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'd say most users don't have a separate pipeline for their transform steps and another pipeline for adding the final predictor. How does a usual pipeline would look like? Should users do sth like Otherwise I'm happy with the SLEP. |
This is the behavior I implemented and was going for. |
Then the example in the SLEP could also mirror that to make it clear. But it's a +1 for me anyway :) |
In this SLEP, I updated the pipeline example to have a classifier showcasing how |
+1 |
So can I clarify that (Do we have other non-transformers that have a transformer for a parameter, aside from TransformedTargetRegressor?) |
Thinking about it more, the special case for Pipeline can influence other meta-estimators. For example, a voting = VotingClassifier([
("pipe1", pipe1), ("pipe2", pipe2), ("pipe3", pipe3)
])
# If `VotingClassifier` defines a `set_output`, then the whole pipeline can be configured with:
voting.set_output(transform="pandas")
# If not, then every pipeline needs to be set individually:
voting2 = VotingClassifier([
("pipe1, pipe1.set_output(transform="pandas")), ...
]) For a better UX, I think all first-party meta-estimators should define a
|
+1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
ping @GaelVaroquaux ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the thoughtful SLEP, the discussions and the prototype, @thomasjpfan.
Thanks for the ping, @amueller :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
``ValueError`` if ``set_output(transform="pandas")``. Dealing with sparse output | ||
might be the scope of another future SLEP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dealing with sparse output might be the scope of another future SLEP.
Nit: Might it be worth mentioning it in the Discussion
section?
Co-authored-by: Julien Jerphanion <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
I pushed a commit with a typo fix: StandardScalar -> StandardScaler
I am also +1 on this SLEP. Including my vote we have 12 in favor and 0 against which means this enhancement proposal is accepted. Thank you everyone for making this possible! |
This PR is for us to discuss and collect votes for SLEP018 - Pandas Output for Transformers. The current implementation is available at scikit-learn/scikit-learn#23734. Note that this vote is for the API and the implementation can be adjusted.
According to our governance model, the vote will be open for a month (till 17th August), and the motion is accepted if 2/3 of the cast votes are in favor.
@scikit-learn/core-devs