Skip to content

ENH: Series Mapping na_action=ignore result is misleading. #47262

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
chapmanderek opened this issue Jun 7, 2022 · 5 comments
Open

ENH: Series Mapping na_action=ignore result is misleading. #47262

chapmanderek opened this issue Jun 7, 2022 · 5 comments
Labels
Enhancement Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Needs Discussion Requires discussion from core team before further action Series Series data structure

Comments

@chapmanderek
Copy link

chapmanderek commented Jun 7, 2022

Is your feature request related to a problem?

More a misleading flag... When mapping a dictionary to a pandas series and using the na_action='ignore' i would expect it to ignore unknown values. Currently it replaces them with an NaN. For example:

pd.Series(['calf', 'foal', 'bunny']).map({'calf':'cow','bunny':'rabbit'}, na_action='ignore')

returns:

0       cow
1       NaN
2    rabbit

Describe the solution you'd like

I would expect it to ignore items that aren't in the dictionary instead of replacing them. I would expect this to be returned:

0       cow
1       foal
2    rabbit

The current behavior is problematic for large dataframes with many different elements in a series where perhaps you only want to rename some of them. In this case you would have to either do a replace for each one... or make a mapping for every one and hope you didn't miss one.

Describe alternatives you've considered

Trying to differentiate the current action of ignore and the action of ignoring-but-not-replacing will be difficult. Perhaps a new value for na_action of dont_replace so that it is very apparent what you are asking to happen with NaNs while keeping the current behavior (albeit confusing) of ignore.

Additional context

Similar-ish to #14210

@chapmanderek chapmanderek added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 7, 2022
@rhshadrach
Copy link
Member

With ser being the pandas Series:

ser = ser.map(mapper).fillna(ser)

will map only values that exist in mapper, leaving other values untouched.

i would expect it to ignore unknown values

I could see that as a reasonable interpretation when viewing the name alone, but do you have that same expectation from the documentation?

@rhshadrach rhshadrach added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Series Series data structure labels Jun 7, 2022
@simonjayhawkins simonjayhawkins changed the title Series Mapping na_action=ignore result is misleading. ENH: ENH: Series Mapping na_action=ignore result is misleading. Jun 9, 2022
@chapmanderek
Copy link
Author

@rhshadrach
No i think the documentation spells it out. It specifically says:

If ‘ignore’, propagate NaN values

Also one of the examples ("rabbit" is missing) it fills in with a NaN value. I just think that the actual flag is misleading. The label isnt the contents of the can.

The ignore flag only really makes sense in terms of the last example where the value is getting passed onto another function or thing. When you are trying to replace items (possibly the more common scenario) it doesn't work as I would expect given the name.

@topper-123
Copy link
Contributor

topper-123 commented Mar 25, 2023

For what it's worth, I think mappings using the existing value would be more logical to me than replacing it with Nan. But I agree the current behavior is as intended.

@topper-123
Copy link
Contributor

topper-123 commented Mar 25, 2023

Thinking a bit further, I think it's probably always better to use the Series.replace method for this:

>>> import pandas as pd
>>> ser = pd.Series(['calf', 'foal', 'bunny'])
>>> ser.map({'calf':'cow','bunny':'rabbit'}, na_action='ignore')
0       cow
1       NaN
2    rabbit
dtype: object
>>> ser.replace({'calf':'cow','bunny':'rabbit'})
0       cow
1      foal
2    rabbit
dtype: object

IMO we should add replace under the See also doc section under Series.map.

Thinking even further, should we not just deprecate allowing dict-likes to Series.map and direct users to use Series.replace for dict-likes instead? That seems very logical to me to do, especially if we see Series.map as a element-level version of Series.apply after #52140.

@rhshadrach

@rhshadrach
Copy link
Member

It seems natural for Series.map to take a mapping. While there is an overlap in functionality here, I lean toward not deprecating unless there is an issue with map taking a mapping.

@mroeschke mroeschke added Needs Discussion Requires discussion from core team before further action and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Needs Discussion Requires discussion from core team before further action Series Series data structure
Projects
None yet
Development

No branches or pull requests

4 participants