Skip to content

ENH: GH3371 support timedelta fillna #4684

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Sep 8, 2013
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions doc/source/release.rst
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,9 @@ pandas 0.13
- A Series of dtype ``timedelta64[ns]`` can now be divided by another
``timedelta64[ns]`` object to yield a ``float64`` dtyped Series. This
is frequency conversion.
- Timedelta64 support ``fillna/ffill/bfill`` with an integer interpreted as seconds,
or a ``timedelta`` (:issue:`3371`)
- Datetime64 support ``ffill/bfill``
- Performance improvements with ``__getitem__`` on ``DataFrames`` with
when the key is a column
- Support for using a ``DatetimeIndex/PeriodsIndex`` directly in a datelike calculation
Expand Down Expand Up @@ -154,6 +157,8 @@ pandas 0.13
- Remove undocumented/unused ``kind`` keyword argument from ``read_excel``, and ``ExcelFile``. (:issue:`4713`, :issue:`4712`)
- The ``method`` argument of ``NDFrame.replace()`` is valid again, so that a
a list can be passed to ``to_replace`` (:issue:`4743`).
- provide automatic dtype conversions on _reduce operations (:issue:`3371`)
- exclude non-numerics if mixed types with datelike in _reduce operations (:issue:`3371`)

**Internal Refactoring**

Expand Down
9 changes: 9 additions & 0 deletions doc/source/timeseries.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1195,6 +1195,15 @@ issues). ``idxmin, idxmax`` are supported as well.
df.min().idxmax()
df.min(axis=1).idxmin()

You can fillna on timedeltas. Integers will be interpreted as seconds. You can
pass a timedelta to get a particular value.

.. ipython:: python

y.fillna(0)
y.fillna(10)
y.fillna(timedelta(days=-1,seconds=5))

.. _timeseries.timedeltas_convert:

Time Deltas & Conversions
Expand Down
16 changes: 12 additions & 4 deletions doc/source/v0.13.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -195,6 +195,7 @@ Enhancements
- NaN handing in get_dummies (:issue:`4446`) with `dummy_na`

.. ipython:: python

# previously, nan was erroneously counted as 2 here
# now it is not counted at all
get_dummies([1, 2, np.nan])
Expand Down Expand Up @@ -237,10 +238,17 @@ Enhancements
from pandas import offsets
td + offsets.Minute(5) + offsets.Milli(5)

- ``plot(kind='kde')`` now accepts the optional parameters ``bw_method`` and
``ind``, passed to scipy.stats.gaussian_kde() (for scipy >= 0.11.0) to set
the bandwidth, and to gkde.evaluate() to specify the indicies at which it
is evaluated, respecttively. See scipy docs.
- Fillna is now supported for timedeltas

.. ipython:: python

td.fillna(0)
td.fillna(timedelta(days=1,seconds=5))

- ``plot(kind='kde')`` now accepts the optional parameters ``bw_method`` and
``ind``, passed to scipy.stats.gaussian_kde() (for scipy >= 0.11.0) to set
the bandwidth, and to gkde.evaluate() to specify the indicies at which it
is evaluated, respecttively. See scipy docs.

.. _whatsnew_0130.refactoring:

Expand Down
51 changes: 50 additions & 1 deletion pandas/core/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -705,6 +705,54 @@ def diff(arr, n, axis=0):
return out_arr


def _coerce_scalar_to_timedelta_type(r):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is so insane 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a suggestion - it might be easier to read if you separated the under 1.7 handling, like this: (I'm pretty sure this does the same thing as the current function). To me, this also makes it easier to see that under 1.7 it returns a timedelta and above 1.7 it returns a timedelta64 type.

def _coerce_scalar_to_timedelta_type(r):
    # kludgy here until we have a timedelta scalar
    # handle the numpy < 1.7 case

    if is_integer(r):
        r = timedelta(microseconds=r/1000)

    if _np_version_under1p7:
        if not isinstance(r, timedelta):
            raise AssertionError("Invalid type for timedelta scalar: %s" % type(r))
        return r

    if isinstance(r, timedelta):
        r = np.timedelta64(r)
    elif not isinstance(r, np.timedelta64):
        raise AssertionError("Invalid type for timedelta scalar: %s" % type(r))
    return r.astype('timedelta64[ns]')

# kludgy here until we have a timedelta scalar
# handle the numpy < 1.7 case

if is_integer(r):
r = timedelta(microseconds=r/1000)

if _np_version_under1p7:
if not isinstance(r, timedelta):
raise AssertionError("Invalid type for timedelta scalar: %s" % type(r))
if compat.PY3:
# convert to microseconds in timedelta64
r = np.timedelta64(int(r.total_seconds()*1e9 + r.microseconds*1000))
else:
return r

if isinstance(r, timedelta):
r = np.timedelta64(r)
elif not isinstance(r, np.timedelta64):
raise AssertionError("Invalid type for timedelta scalar: %s" % type(r))
return r.astype('timedelta64[ns]')

def _coerce_to_dtypes(result, dtypes):
""" given a dtypes and a result set, coerce the result elements to the dtypes """
if len(result) != len(dtypes):
raise AssertionError("_coerce_to_dtypes requires equal len arrays")

def conv(r,dtype):
try:
if isnull(r):
pass
elif dtype == _NS_DTYPE:
r = Timestamp(r)
elif dtype == _TD_DTYPE:
r = _coerce_scalar_to_timedelta_type(r)
elif dtype == np.bool_:
r = bool(r)
elif dtype.kind == 'f':
r = float(r)
elif dtype.kind == 'i':
r = int(r)
except:
pass

return r

return np.array([ conv(r,dtype) for r, dtype in zip(result,dtypes) ])

def _infer_dtype_from_scalar(val):
""" interpret the dtype from a scalar, upcast floats and ints
return the new value and the dtype """
Expand Down Expand Up @@ -1288,7 +1336,7 @@ def _possibly_cast_to_timedelta(value, coerce=True):
# coercion compatability
if coerce == 'compat' and _np_version_under1p7:

def convert(td, type):
def convert(td, dtype):

# we have an array with a non-object dtype
if hasattr(td,'item'):
Expand Down Expand Up @@ -1317,6 +1365,7 @@ def convert(td, type):
# < 1.7 coercion
if not is_list_like(value):
value = np.array([ value ])

dtype = value.dtype
return np.array([ convert(v,dtype) for v in value ], dtype='m8[ns]')

Expand Down
22 changes: 19 additions & 3 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,8 @@

from pandas.core.common import (isnull, notnull, PandasError, _try_sort,
_default_index, _maybe_upcast, _is_sequence,
_infer_dtype_from_scalar, _values_from_object)
_infer_dtype_from_scalar, _values_from_object,
_coerce_to_dtypes, _DATELIKE_DTYPES)
from pandas.core.generic import NDFrame
from pandas.core.index import Index, MultiIndex, _ensure_index
from pandas.core.indexing import (_NDFrameIndexer, _maybe_droplevels,
Expand Down Expand Up @@ -4235,11 +4236,24 @@ def _reduce(self, op, axis=0, skipna=True, numeric_only=None,
axis = self._get_axis_number(axis)
f = lambda x: op(x, axis=axis, skipna=skipna, **kwds)
labels = self._get_agg_axis(axis)

# exclude timedelta/datetime unless we are uniform types
if axis == 1 and self._is_mixed_type and len(set(self.dtypes) & _DATELIKE_DTYPES):
numeric_only = True

if numeric_only is None:
try:
values = self.values
result = f(values)
except Exception as e:

# try by-column first
if filter_type is None and axis == 0:
try:
return self.apply(f).iloc[0]
except:
pass

if filter_type is None or filter_type == 'numeric':
data = self._get_numeric_data()
elif filter_type == 'bool':
Expand Down Expand Up @@ -4273,9 +4287,11 @@ def _reduce(self, op, axis=0, skipna=True, numeric_only=None,
result = result.astype(np.float64)
elif filter_type == 'bool' and notnull(result).all():
result = result.astype(np.bool_)
# otherwise, accept it
except (ValueError, TypeError):
pass

# try to coerce to the original dtypes item by item if we can
if axis == 0:
result = com._coerce_to_dtypes(result, self.dtypes)

return Series(result, index=labels)

Expand Down
19 changes: 15 additions & 4 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,9 @@
_infer_dtype_from_scalar, _maybe_promote,
ABCSeries)



def is_dictlike(x):
return isinstance(x, (dict, com.ABCSeries))


def _single_replace(self, to_replace, method, inplace, limit):
orig_dtype = self.dtype
result = self if inplace else self.copy()
Expand Down Expand Up @@ -1906,7 +1903,21 @@ def abs(self):
abs: type of caller
"""
obj = np.abs(self)
obj = com._possibly_cast_to_timedelta(obj, coerce=False)

# suprimo numpy 1.6 hacking
if com._np_version_under1p7:
if self.ndim == 1:
if obj.dtype == 'm8[us]':
obj = obj.astype('m8[ns]')
elif self.ndim == 2:
def f(x):
if x.dtype == 'm8[us]':
x = x.astype('m8[ns]')
return x

if 'm8[us]' in obj.dtypes.values:
obj = obj.apply(f)

return obj

def groupby(self, by=None, axis=0, level=None, as_index=True, sort=True,
Expand Down
Loading