Skip to content

DataFrameMapper.inverse_transform() for simple transformations #133

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

Conversation

erikjandevries
Copy link

I've added an inverse_transform() method to the DataFrameMapper that works for simple transformations.
I've included tests using the LabelEncoder and LabelBinarizer, which are passed.

This still fails for more complicated transformations such as Pipelines. I hope it's a useful start at least.

@erikjandevries
Copy link
Author

Not sure what's going wrong - when I tested the solution, all tests passed. Should I have tested differently?

$ python -m pytest -s -q tests/test_dataframe_mapper.py
/usr/lib/python3.6/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
...................................................
============================================================================================================= warnings summary ==============================================================================================================
tests/test_dataframe_mapper.py::test_list_transformers
  /usr/lib/python3.6/site-packages/sklearn/utils/validation.py:444: DataConversionWarning: Data with input dtype int64 was converted to float64 by StandardScaler.
    warnings.warn(msg, DataConversionWarning)

-- Docs: http://doc.pytest.org/en/latest/warnings.html
51 passed, 1 warnings in 4.82 seconds

@devforfu
Copy link
Collaborator

@erikjandevries Click Details link near CircleCI message to see what is going wrong. Mostly - PEP8 violations, as I can see.

Merge branch 'master' of github.com:erikjandevries/sklearn-pandas

# Conflicts:
#	sklearn_pandas/dataframe_mapper.py
#	tests/test_dataframe_mapper.py
@erikjandevries
Copy link
Author

@devforfu Thanks for the hints, indeed they were PEP8 violations, which I've now fixed.
I guess in my opinion, some PEP8 rules make my code less readable, but I understand the need for standardisation when working in (larger) teams :)

Copy link
Collaborator

@dukebody dukebody left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make the suggested change to avoid creating more internal attributes?

@@ -283,6 +285,10 @@ def transform(self, X):
self.transformed_names_ += self.get_names(
columns, transformers, Xt, alias)

self.transformed_cols_ += [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we really need to store this. We already have the columns and transformers at self.built_features, and can get the names from self.transformed_names_.


# Let's keep track of the column we've processed
prev_col = 0
for columns, transformers, transformed_cols in self.transformed_cols_:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be replaced by:

for built_feature, transformed_cols in zip(self.built_features, self.transformed_names_):
    transformed_cols = self.get_names(columns, transformers, X, alias)
    columns, transformers, _ = built_feature

@devforfu
Copy link
Collaborator

devforfu commented Sep 5, 2018

@erikjandevries Do you think that it is possible to address the issues pointed by @dukebody? Then we can do a final review and merge into master.

@adithyabsk
Copy link

adithyabsk commented Nov 9, 2018

After a failed PR and some fiddling around, I figured out why that new sub-field was necessary. In the case of one-to-many transformers, it is necessary to maintain a label list that preserves the grouping of the columns. (i.e. ['A'_1, 'A_2', 'A_3'] in the case of the label encoder) The field that @dukebody suggested to use only has these columns preserved in a flat structure. I would vote to merge this PR in (@devforfu) as it looks good otherwise.

@anatol-grabowski
Copy link

anatol-grabowski commented Nov 9, 2018

mapper = sklearn_pandas.DataFrameMapper([
    ('index', None),
   ...

With None transforms I get the error:

'NoneType' object has no attribute 'inverse_transform'

Though it isn't critical at all, the feature is nice and useful and can be merged as is. Just pointing a direction for further improvement.

edit: Actually, I was unable to make it work for me... TypeError: unhashable type: 'slice'

@erikjandevries erikjandevries closed this by deleting the head repository Dec 14, 2023
@hu-minghao
Copy link

hu-minghao commented Dec 14, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants