Skip to content

BUG/ENH: Handle NonexistentTimeError in date rounding #23406

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Nov 2, 2018
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,7 @@ Other Enhancements
- :class:`Series` and :class:`DataFrame` now support :class:`Iterable` in constructor (:issue:`2193`)
- :class:`DatetimeIndex` gained :attr:`DatetimeIndex.timetz` attribute. Returns local time with timezone information. (:issue:`21358`)
- :meth:`round`, :meth:`ceil`, and meth:`floor` for :class:`DatetimeIndex` and :class:`Timestamp` now support an ``ambiguous`` argument for handling datetimes that are rounded to ambiguous times (:issue:`18946`)
- :meth:`round`, :meth:`ceil`, and meth:`floor` for :class:`DatetimeIndex` and :class:`Timestamp` now support an ``nonexistent`` argument for handling datetimes that are rounded to nonexistent times (:issue:`22647`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an nonexistent --> a nonexistent

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this requires a mini-section to explain behavior?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can link the section in the timeseries.rst that explains nonexistent behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. That works just as well.

- :class:`Resampler` now is iterable like :class:`GroupBy` (:issue:`15314`).
- :meth:`Series.resample` and :meth:`DataFrame.resample` have gained the :meth:`Resampler.quantile` (:issue:`15023`).
- :meth:`pandas.core.dtypes.is_list_like` has gained a keyword ``allow_sets`` which is ``True`` by default; if ``False``,
Expand Down
11 changes: 8 additions & 3 deletions pandas/_libs/tslibs/conversion.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,10 @@ from np_datetime import OutOfBoundsDatetime

from util cimport (is_string_object,
is_datetime64_object,
is_integer_object, is_float_object, is_array)
is_integer_object, is_float_object)

from timedeltas cimport cast_from_unit
from timezones cimport (is_utc, is_tzlocal, is_fixed_offset,
treat_tz_as_dateutil, treat_tz_as_pytz,
get_utcoffset, get_dst_info,
get_timezone, maybe_get_tz, tz_compare)
from parsing import parse_datetime_string
Expand Down Expand Up @@ -852,6 +851,8 @@ def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None,
Py_ssize_t i, idx, pos, ntrans, n = len(vals)
int64_t *tdata
int64_t v, left, right, val, v_left, v_right
int64_t remaining_minutes, new_local
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add with the previous line here

int delta_idx_offset, delta_idx
ndarray[int64_t] result, result_a, result_b, dst_hours
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we use Py_ssize_t for indexers?

npy_datetimestruct dts
bint infer_dst = False, is_dst = False, fill = False
Expand Down Expand Up @@ -1007,7 +1008,11 @@ def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None,
# time
remaining_minutes = val % HOURS_NS
new_local = val + (HOURS_NS - remaining_minutes)
delta_idx = trans.searchsorted(new_local, side='right') - 1
delta_idx = trans.searchsorted(new_local, side='right')
# Need to subtract 1 from the delta_idx if the UTC offset of
# the target tz is greater than 0
delta_idx_offset = int(deltas[0] > 0)
delta_idx = delta_idx - delta_idx_offset
result[i] = new_local - deltas[delta_idx]
elif fill_nonexist:
result[i] = NPY_NAT
Expand Down
33 changes: 33 additions & 0 deletions pandas/_libs/tslibs/nattype.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -484,6 +484,17 @@ class NaTType(_NaT):
- 'raise' will raise an AmbiguousTimeError for an ambiguous time

.. versionadded:: 0.24.0
nonexistent : 'shift', 'NaT', default 'raise'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if any way to share these doc-strings would be great, maybe templating?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel do you happen to know why for some of these NaT methods we aren't just passing is Timestamp.[method].__doc__ instead of copying these docstrings?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the dependency structure. We import NaT in timestznps but not the other way around.

A nonexistent time does not exist in a particular timezone
where clocks moved forward due to DST.

- 'shift' will shift the nonexistent time forward to the closest
existing time
- 'NaT' will return NaT where there are nonexistent times
- 'raise' will raise an NonExistentTimeError if there are
nonexistent times

.. versionadded:: 0.24.0

Raises
------
Expand All @@ -503,6 +514,17 @@ class NaTType(_NaT):
- 'raise' will raise an AmbiguousTimeError for an ambiguous time

.. versionadded:: 0.24.0
nonexistent : 'shift', 'NaT', default 'raise'
A nonexistent time does not exist in a particular timezone
where clocks moved forward due to DST.

- 'shift' will shift the nonexistent time forward to the closest
existing time
- 'NaT' will return NaT where there are nonexistent times
- 'raise' will raise an NonExistentTimeError if there are
nonexistent times

.. versionadded:: 0.24.0

Raises
------
Expand All @@ -522,6 +544,17 @@ class NaTType(_NaT):
- 'raise' will raise an AmbiguousTimeError for an ambiguous time

.. versionadded:: 0.24.0
nonexistent : 'shift', 'NaT', default 'raise'
A nonexistent time does not exist in a particular timezone
where clocks moved forward due to DST.

- 'shift' will shift the nonexistent time forward to the closest
existing time
- 'NaT' will return NaT where there are nonexistent times
- 'raise' will raise an NonExistentTimeError if there are
nonexistent times

.. versionadded:: 0.24.0

Raises
------
Expand Down
53 changes: 45 additions & 8 deletions pandas/_libs/tslibs/timestamps.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -721,7 +721,7 @@ class Timestamp(_Timestamp):

return create_timestamp_from_ts(ts.value, ts.dts, ts.tzinfo, freq)

def _round(self, freq, mode, ambiguous='raise'):
def _round(self, freq, mode, ambiguous='raise', nonexistent='raise'):
if self.tz is not None:
value = self.tz_localize(None).value
else:
Expand All @@ -733,10 +733,12 @@ class Timestamp(_Timestamp):
r = round_nsint64(value, mode, freq)[0]
result = Timestamp(r, unit='ns')
if self.tz is not None:
result = result.tz_localize(self.tz, ambiguous=ambiguous)
result = result.tz_localize(
self.tz, ambiguous=ambiguous, nonexistent=nonexistent
)
return result

def round(self, freq, ambiguous='raise'):
def round(self, freq, ambiguous='raise', nonexistent='raise'):
"""
Round the Timestamp to the specified resolution

Expand All @@ -754,14 +756,27 @@ class Timestamp(_Timestamp):
- 'raise' will raise an AmbiguousTimeError for an ambiguous time

.. versionadded:: 0.24.0
nonexistent : 'shift', 'NaT', default 'raise'
A nonexistent time does not exist in a particular timezone
where clocks moved forward due to DST.

- 'shift' will shift the nonexistent time forward to the closest
existing time
- 'NaT' will return NaT where there are nonexistent times
- 'raise' will raise an NonExistentTimeError if there are
nonexistent times
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above about the doc-strings


.. versionadded:: 0.24.0

Raises
------
ValueError if the freq cannot be converted
"""
return self._round(freq, RoundTo.NEAREST_HALF_EVEN, ambiguous)
return self._round(
freq, RoundTo.NEAREST_HALF_EVEN, ambiguous, nonexistent
)

def floor(self, freq, ambiguous='raise'):
def floor(self, freq, ambiguous='raise', nonexistent='raise'):
"""
return a new Timestamp floored to this resolution

Expand All @@ -775,14 +790,25 @@ class Timestamp(_Timestamp):
- 'raise' will raise an AmbiguousTimeError for an ambiguous time

.. versionadded:: 0.24.0
nonexistent : 'shift', 'NaT', default 'raise'
A nonexistent time does not exist in a particular timezone
where clocks moved forward due to DST.

- 'shift' will shift the nonexistent time forward to the closest
existing time
- 'NaT' will return NaT where there are nonexistent times
- 'raise' will raise an NonExistentTimeError if there are
nonexistent times

.. versionadded:: 0.24.0

Raises
------
ValueError if the freq cannot be converted
"""
return self._round(freq, RoundTo.MINUS_INFTY, ambiguous)
return self._round(freq, RoundTo.MINUS_INFTY, ambiguous, nonexistent)

def ceil(self, freq, ambiguous='raise'):
def ceil(self, freq, ambiguous='raise', nonexistent='raise'):
"""
return a new Timestamp ceiled to this resolution

Expand All @@ -796,12 +822,23 @@ class Timestamp(_Timestamp):
- 'raise' will raise an AmbiguousTimeError for an ambiguous time

.. versionadded:: 0.24.0
nonexistent : 'shift', 'NaT', default 'raise'
A nonexistent time does not exist in a particular timezone
where clocks moved forward due to DST.

- 'shift' will shift the nonexistent time forward to the closest
existing time
- 'NaT' will return NaT where there are nonexistent times
- 'raise' will raise an NonExistentTimeError if there are
nonexistent times

.. versionadded:: 0.24.0

Raises
------
ValueError if the freq cannot be converted
"""
return self._round(freq, RoundTo.PLUS_INFTY, ambiguous)
return self._round(freq, RoundTo.PLUS_INFTY, ambiguous, nonexistent)

@property
def tz(self):
Expand Down
37 changes: 27 additions & 10 deletions pandas/core/indexes/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,17 @@ class TimelikeOps(object):
Only relevant for DatetimeIndex

.. versionadded:: 0.24.0
nonexistent : 'shift', 'NaT', default 'raise'
A nonexistent time does not exist in a particular timezone
where clocks moved forward due to DST.

- 'shift' will shift the nonexistent time forward to the closest
existing time
- 'NaT' will return NaT where there are nonexistent times
- 'raise' will raise an NonExistentTimeError if there are
nonexistent times

.. versionadded:: 0.24.0

Returns
-------
Expand Down Expand Up @@ -182,7 +193,7 @@ class TimelikeOps(object):
"""
)

def _round(self, freq, mode, ambiguous):
def _round(self, freq, mode, ambiguous, nonexistent):
# round the local times
values = _ensure_datetimelike_to_i8(self)
result = round_nsint64(values, mode, freq)
Expand All @@ -194,20 +205,22 @@ def _round(self, freq, mode, ambiguous):
if 'tz' in attribs:
attribs['tz'] = None
return self._ensure_localized(
self._shallow_copy(result, **attribs), ambiguous
self._shallow_copy(result, **attribs), ambiguous, nonexistent
)

@Appender((_round_doc + _round_example).format(op="round"))
def round(self, freq, ambiguous='raise'):
return self._round(freq, RoundTo.NEAREST_HALF_EVEN, ambiguous)
def round(self, freq, ambiguous='raise', nonexistent='raise'):
return self._round(
freq, RoundTo.NEAREST_HALF_EVEN, ambiguous, nonexistent
)

@Appender((_round_doc + _floor_example).format(op="floor"))
def floor(self, freq, ambiguous='raise'):
return self._round(freq, RoundTo.MINUS_INFTY, ambiguous)
def floor(self, freq, ambiguous='raise', nonexistent='raise'):
return self._round(freq, RoundTo.MINUS_INFTY, ambiguous, nonexistent)

@Appender((_round_doc + _ceil_example).format(op="ceil"))
def ceil(self, freq, ambiguous='raise'):
return self._round(freq, RoundTo.PLUS_INFTY, ambiguous)
def ceil(self, freq, ambiguous='raise', nonexistent='raise'):
return self._round(freq, RoundTo.PLUS_INFTY, ambiguous, nonexistent)


class DatetimeIndexOpsMixin(DatetimeLikeArrayMixin):
Expand Down Expand Up @@ -278,7 +291,8 @@ def _evaluate_compare(self, other, op):
except TypeError:
return result

def _ensure_localized(self, arg, ambiguous='raise', from_utc=False):
def _ensure_localized(self, arg, ambiguous='raise', nonexistent='raise',
from_utc=False):
"""
ensure that we are re-localized

Expand All @@ -289,6 +303,7 @@ def _ensure_localized(self, arg, ambiguous='raise', from_utc=False):
----------
arg : DatetimeIndex / i8 ndarray
ambiguous : str, bool, or bool-ndarray, default 'raise'
nonexistent : str, default 'raise'
from_utc : bool, default False
If True, localize the i8 ndarray to UTC first before converting to
the appropriate tz. If False, localize directly to the tz.
Expand All @@ -305,7 +320,9 @@ def _ensure_localized(self, arg, ambiguous='raise', from_utc=False):
if from_utc:
arg = arg.tz_localize('UTC').tz_convert(self.tz)
else:
arg = arg.tz_localize(self.tz, ambiguous=ambiguous)
arg = arg.tz_localize(
self.tz, ambiguous=ambiguous, nonexistent=nonexistent
)
return arg

def _box_values_as_index(self):
Expand Down
21 changes: 19 additions & 2 deletions pandas/tests/scalar/timestamp/test_unary_ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -134,8 +134,8 @@ def test_floor(self):
assert result == expected

@pytest.mark.parametrize('method', ['ceil', 'round', 'floor'])
def test_round_dst_border(self, method):
# GH 18946 round near DST
def test_round_dst_border_ambiguous(self, method):
# GH 18946 round near "fall back" DST
ts = Timestamp('2017-10-29 00:00:00', tz='UTC').tz_convert(
'Europe/Madrid'
)
Expand All @@ -155,6 +155,23 @@ def test_round_dst_border(self, method):
with pytest.raises(pytz.AmbiguousTimeError):
getattr(ts, method)('H', ambiguous='raise')

@pytest.mark.parametrize('method, ts_str, freq', [
['ceil', '2018-03-11 01:59:00-0600', '5min'],
['round', '2018-03-11 01:59:00-0600', '5min'],
['floor', '2018-03-11 03:01:00-0500', '2H']])
def test_round_dst_border_nonexistent(self, method, ts_str, freq):
# GH 23324 round near "spring forward" DST
ts = Timestamp(ts_str, tz='America/Chicago')
result = getattr(ts, method)(freq, nonexistent='shift')
expected = Timestamp('2018-03-11 03:00:00', tz='America/Chicago')
assert result == expected

result = getattr(ts, method)(freq, nonexistent='NaT')
assert result is NaT

with pytest.raises(pytz.NonExistentTimeError):
Copy link
Member

@gfyoung gfyoung Oct 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything meaningful in the error message? (question applies for all of your tests)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this just raises NonExistentTimeError: [time that was nonexistent] but I can test for the message as well.

getattr(ts, method)(freq, nonexistent='raise')

@pytest.mark.parametrize('timestamp', [
'2018-01-01 0:0:0.124999360',
'2018-01-01 0:0:0.125000367',
Expand Down
22 changes: 21 additions & 1 deletion pandas/tests/series/test_datetime_values.py
Original file line number Diff line number Diff line change
Expand Up @@ -252,7 +252,7 @@ def test_dt_round_tz(self):

@pytest.mark.parametrize('method', ['ceil', 'round', 'floor'])
def test_dt_round_tz_ambiguous(self, method):
# GH 18946 round near DST
# GH 18946 round near "fall back" DST
df1 = pd.DataFrame([
pd.to_datetime('2017-10-29 02:00:00+02:00', utc=True),
pd.to_datetime('2017-10-29 02:00:00+01:00', utc=True),
Expand Down Expand Up @@ -281,6 +281,26 @@ def test_dt_round_tz_ambiguous(self, method):
with pytest.raises(pytz.AmbiguousTimeError):
getattr(df1.date.dt, method)('H', ambiguous='raise')

@pytest.mark.parametrize('method, ts_str, freq', [
['ceil', '2018-03-11 01:59:00-0600', '5min'],
['round', '2018-03-11 01:59:00-0600', '5min'],
['floor', '2018-03-11 03:01:00-0500', '2H']])
def test_dt_round_tz_nonexistent(self, method, ts_str, freq):
# GH 23324 round near "spring forward" DST
s = Series([pd.Timestamp(ts_str, tz='America/Chicago')])
result = getattr(s.dt, method)(freq, nonexistent='shift')
expected = Series(
[pd.Timestamp('2018-03-11 03:00:00', tz='America/Chicago')]
)
tm.assert_series_equal(result, expected)

result = getattr(s.dt, method)(freq, nonexistent='NaT')
expected = Series([pd.NaT]).dt.tz_localize(result.dt.tz)
tm.assert_series_equal(result, expected)

with pytest.raises(pytz.NonExistentTimeError):
getattr(s.dt, method)(freq, nonexistent='raise')
Copy link
Member

@gfyoung gfyoung Oct 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm almost inclined to have parameterization on nonexistent, but up to you.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined to keep the format of this test. IMO parameterization over nonexistent would obfuscate the test too much.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem. Just a thought. 👍

More general question, to what extent are we using pytest.raises vs tm.assert_raises_regex ?

cc @jreback

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok with pytest.raises to the extent we are just checking the type of the error. so ok here

Copy link
Member

@gfyoung gfyoung Oct 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback : pytest.raises is also checking the error message here...

(that's why I asked about this)


def test_dt_namespace_accessor_categorical(self):
# GH 19468
dti = DatetimeIndex(['20171111', '20181212']).repeat(2)
Expand Down