-
Notifications
You must be signed in to change notification settings - Fork 418
DataFrameMapper.inverse_transform() for simple transformations #133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Not sure what's going wrong - when I tested the solution, all tests passed. Should I have tested differently?
|
@erikjandevries Click Details link near CircleCI message to see what is going wrong. Mostly - PEP8 violations, as I can see. |
Merge branch 'master' of github.com:erikjandevries/sklearn-pandas # Conflicts: # sklearn_pandas/dataframe_mapper.py # tests/test_dataframe_mapper.py
@devforfu Thanks for the hints, indeed they were PEP8 violations, which I've now fixed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make the suggested change to avoid creating more internal attributes?
@@ -283,6 +285,10 @@ def transform(self, X): | |||
self.transformed_names_ += self.get_names( | |||
columns, transformers, Xt, alias) | |||
|
|||
self.transformed_cols_ += [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we really need to store this. We already have the columns
and transformers
at self.built_features
, and can get the names from self.transformed_names_
.
|
||
# Let's keep track of the column we've processed | ||
prev_col = 0 | ||
for columns, transformers, transformed_cols in self.transformed_cols_: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be replaced by:
for built_feature, transformed_cols in zip(self.built_features, self.transformed_names_):
transformed_cols = self.get_names(columns, transformers, X, alias)
columns, transformers, _ = built_feature
@erikjandevries Do you think that it is possible to address the issues pointed by @dukebody? Then we can do a final review and merge into |
After a failed PR and some fiddling around, I figured out why that new sub-field was necessary. In the case of one-to-many transformers, it is necessary to maintain a label list that preserves the grouping of the columns. (i.e. ['A'_1, 'A_2', 'A_3'] in the case of the label encoder) The field that @dukebody suggested to use only has these columns preserved in a flat structure. I would vote to merge this PR in (@devforfu) as it looks good otherwise. |
With
Though it isn't critical at all, the feature is nice and useful and can be merged as is. Just pointing a direction for further improvement. edit: Actually, I was unable to make it work for me... |
你好,已收到,谢谢。
|
I've added an
inverse_transform()
method to theDataFrameMapper
that works for simple transformations.I've included tests using the
LabelEncoder
andLabelBinarizer
, which are passed.This still fails for more complicated transformations such as
Pipeline
s. I hope it's a useful start at least.