-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG/ENH: Handle NonexistentTimeError in date rounding #23406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
21cbe4f
35711b0
eb19fff
580d231
621f921
a60b689
b31f034
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -27,11 +27,10 @@ from np_datetime import OutOfBoundsDatetime | |
|
||
from util cimport (is_string_object, | ||
is_datetime64_object, | ||
is_integer_object, is_float_object, is_array) | ||
is_integer_object, is_float_object) | ||
|
||
from timedeltas cimport cast_from_unit | ||
from timezones cimport (is_utc, is_tzlocal, is_fixed_offset, | ||
treat_tz_as_dateutil, treat_tz_as_pytz, | ||
get_utcoffset, get_dst_info, | ||
get_timezone, maybe_get_tz, tz_compare) | ||
from parsing import parse_datetime_string | ||
|
@@ -852,6 +851,8 @@ def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None, | |
Py_ssize_t i, idx, pos, ntrans, n = len(vals) | ||
int64_t *tdata | ||
int64_t v, left, right, val, v_left, v_right | ||
int64_t remaining_minutes, new_local | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you add with the previous line here |
||
int delta_idx_offset, delta_idx | ||
ndarray[int64_t] result, result_a, result_b, dst_hours | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. don't we use Py_ssize_t for indexers? |
||
npy_datetimestruct dts | ||
bint infer_dst = False, is_dst = False, fill = False | ||
|
@@ -1007,7 +1008,11 @@ def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None, | |
# time | ||
remaining_minutes = val % HOURS_NS | ||
new_local = val + (HOURS_NS - remaining_minutes) | ||
delta_idx = trans.searchsorted(new_local, side='right') - 1 | ||
delta_idx = trans.searchsorted(new_local, side='right') | ||
# Need to subtract 1 from the delta_idx if the UTC offset of | ||
# the target tz is greater than 0 | ||
delta_idx_offset = int(deltas[0] > 0) | ||
delta_idx = delta_idx - delta_idx_offset | ||
result[i] = new_local - deltas[delta_idx] | ||
elif fill_nonexist: | ||
result[i] = NPY_NAT | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -484,6 +484,17 @@ class NaTType(_NaT): | |
- 'raise' will raise an AmbiguousTimeError for an ambiguous time | ||
|
||
.. versionadded:: 0.24.0 | ||
nonexistent : 'shift', 'NaT', default 'raise' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if any way to share these doc-strings would be great, maybe templating? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jbrockmendel do you happen to know why for some of these NaT methods we aren't just passing is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Because of the dependency structure. We import NaT in timestznps but not the other way around. |
||
A nonexistent time does not exist in a particular timezone | ||
where clocks moved forward due to DST. | ||
|
||
- 'shift' will shift the nonexistent time forward to the closest | ||
existing time | ||
- 'NaT' will return NaT where there are nonexistent times | ||
- 'raise' will raise an NonExistentTimeError if there are | ||
nonexistent times | ||
|
||
.. versionadded:: 0.24.0 | ||
|
||
Raises | ||
------ | ||
|
@@ -503,6 +514,17 @@ class NaTType(_NaT): | |
- 'raise' will raise an AmbiguousTimeError for an ambiguous time | ||
|
||
.. versionadded:: 0.24.0 | ||
nonexistent : 'shift', 'NaT', default 'raise' | ||
A nonexistent time does not exist in a particular timezone | ||
where clocks moved forward due to DST. | ||
|
||
- 'shift' will shift the nonexistent time forward to the closest | ||
existing time | ||
- 'NaT' will return NaT where there are nonexistent times | ||
- 'raise' will raise an NonExistentTimeError if there are | ||
nonexistent times | ||
|
||
.. versionadded:: 0.24.0 | ||
|
||
Raises | ||
------ | ||
|
@@ -522,6 +544,17 @@ class NaTType(_NaT): | |
- 'raise' will raise an AmbiguousTimeError for an ambiguous time | ||
|
||
.. versionadded:: 0.24.0 | ||
nonexistent : 'shift', 'NaT', default 'raise' | ||
A nonexistent time does not exist in a particular timezone | ||
where clocks moved forward due to DST. | ||
|
||
- 'shift' will shift the nonexistent time forward to the closest | ||
existing time | ||
- 'NaT' will return NaT where there are nonexistent times | ||
- 'raise' will raise an NonExistentTimeError if there are | ||
nonexistent times | ||
|
||
.. versionadded:: 0.24.0 | ||
|
||
Raises | ||
------ | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -721,7 +721,7 @@ class Timestamp(_Timestamp): | |
|
||
return create_timestamp_from_ts(ts.value, ts.dts, ts.tzinfo, freq) | ||
|
||
def _round(self, freq, mode, ambiguous='raise'): | ||
def _round(self, freq, mode, ambiguous='raise', nonexistent='raise'): | ||
if self.tz is not None: | ||
value = self.tz_localize(None).value | ||
else: | ||
|
@@ -733,10 +733,12 @@ class Timestamp(_Timestamp): | |
r = round_nsint64(value, mode, freq)[0] | ||
result = Timestamp(r, unit='ns') | ||
if self.tz is not None: | ||
result = result.tz_localize(self.tz, ambiguous=ambiguous) | ||
result = result.tz_localize( | ||
self.tz, ambiguous=ambiguous, nonexistent=nonexistent | ||
) | ||
return result | ||
|
||
def round(self, freq, ambiguous='raise'): | ||
def round(self, freq, ambiguous='raise', nonexistent='raise'): | ||
""" | ||
Round the Timestamp to the specified resolution | ||
|
||
|
@@ -754,14 +756,27 @@ class Timestamp(_Timestamp): | |
- 'raise' will raise an AmbiguousTimeError for an ambiguous time | ||
|
||
.. versionadded:: 0.24.0 | ||
nonexistent : 'shift', 'NaT', default 'raise' | ||
A nonexistent time does not exist in a particular timezone | ||
where clocks moved forward due to DST. | ||
|
||
- 'shift' will shift the nonexistent time forward to the closest | ||
existing time | ||
- 'NaT' will return NaT where there are nonexistent times | ||
- 'raise' will raise an NonExistentTimeError if there are | ||
nonexistent times | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same comment as above about the doc-strings |
||
|
||
.. versionadded:: 0.24.0 | ||
|
||
Raises | ||
------ | ||
ValueError if the freq cannot be converted | ||
""" | ||
return self._round(freq, RoundTo.NEAREST_HALF_EVEN, ambiguous) | ||
return self._round( | ||
freq, RoundTo.NEAREST_HALF_EVEN, ambiguous, nonexistent | ||
) | ||
|
||
def floor(self, freq, ambiguous='raise'): | ||
def floor(self, freq, ambiguous='raise', nonexistent='raise'): | ||
""" | ||
return a new Timestamp floored to this resolution | ||
|
||
|
@@ -775,14 +790,25 @@ class Timestamp(_Timestamp): | |
- 'raise' will raise an AmbiguousTimeError for an ambiguous time | ||
|
||
.. versionadded:: 0.24.0 | ||
nonexistent : 'shift', 'NaT', default 'raise' | ||
A nonexistent time does not exist in a particular timezone | ||
where clocks moved forward due to DST. | ||
|
||
- 'shift' will shift the nonexistent time forward to the closest | ||
existing time | ||
- 'NaT' will return NaT where there are nonexistent times | ||
- 'raise' will raise an NonExistentTimeError if there are | ||
nonexistent times | ||
|
||
.. versionadded:: 0.24.0 | ||
|
||
Raises | ||
------ | ||
ValueError if the freq cannot be converted | ||
""" | ||
return self._round(freq, RoundTo.MINUS_INFTY, ambiguous) | ||
return self._round(freq, RoundTo.MINUS_INFTY, ambiguous, nonexistent) | ||
|
||
def ceil(self, freq, ambiguous='raise'): | ||
def ceil(self, freq, ambiguous='raise', nonexistent='raise'): | ||
""" | ||
return a new Timestamp ceiled to this resolution | ||
|
||
|
@@ -796,12 +822,23 @@ class Timestamp(_Timestamp): | |
- 'raise' will raise an AmbiguousTimeError for an ambiguous time | ||
|
||
.. versionadded:: 0.24.0 | ||
nonexistent : 'shift', 'NaT', default 'raise' | ||
A nonexistent time does not exist in a particular timezone | ||
where clocks moved forward due to DST. | ||
|
||
- 'shift' will shift the nonexistent time forward to the closest | ||
existing time | ||
- 'NaT' will return NaT where there are nonexistent times | ||
- 'raise' will raise an NonExistentTimeError if there are | ||
nonexistent times | ||
|
||
.. versionadded:: 0.24.0 | ||
|
||
Raises | ||
------ | ||
ValueError if the freq cannot be converted | ||
""" | ||
return self._round(freq, RoundTo.PLUS_INFTY, ambiguous) | ||
return self._round(freq, RoundTo.PLUS_INFTY, ambiguous, nonexistent) | ||
|
||
@property | ||
def tz(self): | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -134,8 +134,8 @@ def test_floor(self): | |
assert result == expected | ||
|
||
@pytest.mark.parametrize('method', ['ceil', 'round', 'floor']) | ||
def test_round_dst_border(self, method): | ||
# GH 18946 round near DST | ||
def test_round_dst_border_ambiguous(self, method): | ||
# GH 18946 round near "fall back" DST | ||
ts = Timestamp('2017-10-29 00:00:00', tz='UTC').tz_convert( | ||
'Europe/Madrid' | ||
) | ||
|
@@ -155,6 +155,23 @@ def test_round_dst_border(self, method): | |
with pytest.raises(pytz.AmbiguousTimeError): | ||
getattr(ts, method)('H', ambiguous='raise') | ||
|
||
@pytest.mark.parametrize('method, ts_str, freq', [ | ||
['ceil', '2018-03-11 01:59:00-0600', '5min'], | ||
['round', '2018-03-11 01:59:00-0600', '5min'], | ||
['floor', '2018-03-11 03:01:00-0500', '2H']]) | ||
def test_round_dst_border_nonexistent(self, method, ts_str, freq): | ||
# GH 23324 round near "spring forward" DST | ||
ts = Timestamp(ts_str, tz='America/Chicago') | ||
result = getattr(ts, method)(freq, nonexistent='shift') | ||
expected = Timestamp('2018-03-11 03:00:00', tz='America/Chicago') | ||
assert result == expected | ||
|
||
result = getattr(ts, method)(freq, nonexistent='NaT') | ||
assert result is NaT | ||
|
||
with pytest.raises(pytz.NonExistentTimeError): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Anything meaningful in the error message? (question applies for all of your tests) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this just raises |
||
getattr(ts, method)(freq, nonexistent='raise') | ||
|
||
@pytest.mark.parametrize('timestamp', [ | ||
'2018-01-01 0:0:0.124999360', | ||
'2018-01-01 0:0:0.125000367', | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -252,7 +252,7 @@ def test_dt_round_tz(self): | |
|
||
@pytest.mark.parametrize('method', ['ceil', 'round', 'floor']) | ||
def test_dt_round_tz_ambiguous(self, method): | ||
# GH 18946 round near DST | ||
# GH 18946 round near "fall back" DST | ||
df1 = pd.DataFrame([ | ||
pd.to_datetime('2017-10-29 02:00:00+02:00', utc=True), | ||
pd.to_datetime('2017-10-29 02:00:00+01:00', utc=True), | ||
|
@@ -281,6 +281,26 @@ def test_dt_round_tz_ambiguous(self, method): | |
with pytest.raises(pytz.AmbiguousTimeError): | ||
getattr(df1.date.dt, method)('H', ambiguous='raise') | ||
|
||
@pytest.mark.parametrize('method, ts_str, freq', [ | ||
['ceil', '2018-03-11 01:59:00-0600', '5min'], | ||
['round', '2018-03-11 01:59:00-0600', '5min'], | ||
['floor', '2018-03-11 03:01:00-0500', '2H']]) | ||
def test_dt_round_tz_nonexistent(self, method, ts_str, freq): | ||
# GH 23324 round near "spring forward" DST | ||
s = Series([pd.Timestamp(ts_str, tz='America/Chicago')]) | ||
result = getattr(s.dt, method)(freq, nonexistent='shift') | ||
expected = Series( | ||
[pd.Timestamp('2018-03-11 03:00:00', tz='America/Chicago')] | ||
) | ||
tm.assert_series_equal(result, expected) | ||
|
||
result = getattr(s.dt, method)(freq, nonexistent='NaT') | ||
expected = Series([pd.NaT]).dt.tz_localize(result.dt.tz) | ||
tm.assert_series_equal(result, expected) | ||
|
||
with pytest.raises(pytz.NonExistentTimeError): | ||
getattr(s.dt, method)(freq, nonexistent='raise') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm almost inclined to have parameterization on There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm inclined to keep the format of this test. IMO parameterization over nonexistent would obfuscate the test too much. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No problem. Just a thought. 👍 More general question, to what extent are we using cc @jreback There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jreback : (that's why I asked about this) |
||
|
||
def test_dt_namespace_accessor_categorical(self): | ||
# GH 19468 | ||
dti = DatetimeIndex(['20171111', '20181212']).repeat(2) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an
nonexistent
--> anonexistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think this requires a mini-section to explain behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can link the section in the
timeseries.rst
that explainsnonexistent
behavior.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome. That works just as well.