Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: standardize fill_value behavior across the API #15587

Closed
wants to merge 4 commits into from
Closed
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions pandas/types/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -491,3 +491,27 @@ def pandas_dtype(dtype):
return dtype

return np.dtype(dtype)


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all of this should be in pandas.types.missing

def _is_fillable_value(value):
pandas_ts_types = ('Timestamp', 'Period', 'Timedelta')
pandas_block_types = ('Series', 'DataFrame')

if any([isinstance(value, (list, dict)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the way to do this is

def validate_fill_value(value):
   def _validate(v):
        # do validation on a scalar
        return boolean
   if is_list_like(value) or is_dict_like(value):
       return all(_validate(v) for v in list(values))
   return _validate(value)

Copy link
Contributor Author

@ResidentMario ResidentMario Mar 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed these is_list_like/is_dict_like helpers, these are important, thanks. But, why the evaluation that comes afterwards? As I understand it, we should be rejecting list and dict type inputs outright.

The former is valid in fillna, though passing a dict isn't implemented in any of the fill_value parameters. Lists, meanwhile, are never a valid fill value.

What I mean is that I think the implementation would be something like:

def validate_fill_value(value):
   def _validate(v):
        # do validation on a scalar
        return boolean
   if is_list_like(value) or is_dict_like(value):
       return False
   else:
       return _validate(value)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the idiom i gave us for using s validation function in a scalar or in each element of a list

adapt 2 what u need

callable(value),
(not (isinstance(value, string_types) or
isinstance(value, (int, float, complex, str, None.__class__)) or
is_numeric_dtype(value) or
is_datetime_or_timedelta_dtype(value) or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what you actually need though is to pass in 2 values at the top-level

def validate_fill_value(value, dtype):
     def _validate(v):
           # only a sample
           if is_datetime64_any_dtype(dtype):
              return isinstance(value, (np.datetime64, datetime))
           elif is_numeric_dtype(dtype):
              return is_float(value) or is_integer(value)
           else:
             # string
              return isinstance(value, compat.string_type)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So when I call this method, would I need to do something like:

dtype = value.dtype if hasattr(value, dtype) else None

Beforehand?

Not sure I grok this separate parameter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the dtype must be passed in otherwise how do unknow is the filll_value is the right type
e.g. an int is not valid if u have a datetime array

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I've been following the fillna behavior thus far. Right now fillna would convert that input to a timestamp in ns. Same with float or bool. And upcast the column to object dtype to fit a str fill.

fillna behaves that way AFAIK because it's convenient to propagate a 0 or an np.nan or whatever other out-of-type sentinel value across an entire DataFrame all at once, instead of having to go column-by-column.

The same argument might apply for fill_value, but, I do see it being a far weaker one. So if you think that it's OK for fill_value to have a separate, stricter behavior than fillna, sure.

is_period_dtype(value) or
type(value).__name__ in pandas_ts_types)),
type(value).__name__ in pandas_block_types]):
return False
else:
return True


def validate_fill_value(value):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just do this in one function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to separate creating the validity boolean from raising a ValueError for it, in case in the future there's a need to do the former without the latter. Fine with it being just one method tho, if you think that's better.

if not _is_fillable_value(value):
raise TypeError('"value" parameter must be a scalar, but '
'you passed a "{0}"'.format(type(value).__name__))