Skip to content

Fix incorrect DTI/TDI indexing; warn before dropping tzinfo #22549

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Sep 8, 2018
Merged
5 changes: 4 additions & 1 deletion doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -485,6 +485,7 @@ Datetimelike API Changes
- :class:`DateOffset` objects are now immutable. Attempting to alter one of these will now raise ``AttributeError`` (:issue:`21341`)
- :class:`PeriodIndex` subtraction of another ``PeriodIndex`` will now return an object-dtype :class:`Index` of :class:`DateOffset` objects instead of raising a ``TypeError`` (:issue:`20049`)
- :func:`cut` and :func:`qcut` now returns a :class:`DatetimeIndex` or :class:`TimedeltaIndex` bins when the input is datetime or timedelta dtype respectively and ``retbins=True`` (:issue:`19891`)
- :meth:`DatetimeIndex.to_period` and :meth:`Timestamp.to_period` will issue a warning when timezone information will be lost (:issue:`21333`)

.. _whatsnew_0240.api.other:

Expand Down Expand Up @@ -585,6 +586,8 @@ Datetimelike
- Bug in :class:`DataFrame` with mixed dtypes including ``datetime64[ns]`` incorrectly raising ``TypeError`` on equality comparisons (:issue:`13128`,:issue:`22163`)
- Bug in :meth:`DataFrame.eq` comparison against ``NaT`` incorrectly returning ``True`` or ``NaN`` (:issue:`15697`,:issue:`22163`)
- Bug in :class:`DatetimeIndex` subtraction that incorrectly failed to raise `OverflowError` (:issue:`22492`, :issue:`22508`)
- Bug in :class:`DatetimeIndex` incorrectly allowing indexing with ``Timedelta`` object (:issue:`20464`)
-

Timedelta
^^^^^^^^^
Expand All @@ -593,7 +596,7 @@ Timedelta
- Bug in multiplying a :class:`Series` with numeric dtype against a ``timedelta`` object (:issue:`22390`)
- Bug in :class:`Series` with numeric dtype when adding or subtracting an an array or ``Series`` with ``timedelta64`` dtype (:issue:`22390`)
- Bug in :class:`Index` with numeric dtype when multiplying or dividing an array with dtype ``timedelta64`` (:issue:`22390`)
-
- Bug in :class:`TimedeltaIndex` incorrectly allowing indexing with ``Timestamp`` object (:issue:`20464`)
-
-

Expand Down
5 changes: 5 additions & 0 deletions pandas/_libs/tslibs/timestamps.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -737,6 +737,11 @@ class Timestamp(_Timestamp):
"""
from pandas import Period

if self.tz is not None:
# GH#21333
warnings.warn("Converting to Period representation will "
"drop timezone information.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning type and stack level?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This questions applies to everywhere where you placed warnings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea what warning type to use. Suggestions?

As to stack level, I tried a bunch to get that to work with tm.assert_produces_warning and eventually threw in the towel.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you have it, UserWarning makes sense, but I think being explicit about it is good.

I know what you mean regarding stacklevel. We generally try to get one that makes sense, and if the tests don't cooperate, we can just use check_stacklevel=False.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the warnings are raised in direct user-api methods, normally putting a stacklevel=2 should do the correct thing.
What did you not get working in the tests? (which kind of code sample)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What did you not get working in the tests?

IIRC the problem is that the affected code paths issues both the new FutureWarning and in some cases also a PerformanceWarning. tm.assert_produces_warning doesn't support multiple expected warnings, and my attempt to modify it led to tests failing the stacklevel checks.


if freq is None:
freq = self.freq

Expand Down
14 changes: 12 additions & 2 deletions pandas/core/indexes/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
from __future__ import division
import operator
import warnings
from datetime import time, datetime
from datetime import time, datetime, timedelta

import numpy as np
from pytz import utc
Expand Down Expand Up @@ -727,6 +727,10 @@ def to_period(self, freq=None):
"""
from pandas.core.indexes.period import PeriodIndex

if self.tz is not None:
warnings.warn("Converting to PeriodIndex representation will "
"drop timezone information.")

if freq is None:
freq = self.freqstr or self.inferred_freq

Expand All @@ -737,7 +741,7 @@ def to_period(self, freq=None):

freq = get_period_alias(freq)

return PeriodIndex(self.values, name=self.name, freq=freq, tz=self.tz)
return PeriodIndex(self.values, name=self.name, freq=freq)

def snap(self, freq='S'):
"""
Expand Down Expand Up @@ -1201,6 +1205,12 @@ def get_loc(self, key, method=None, tolerance=None):
key = Timestamp(key, tz=self.tz)
return Index.get_loc(self, key, method, tolerance)

if isinstance(key, timedelta):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if/elif here?

# GH#20464
raise TypeError("Cannot index {cls} with {other}"
.format(cls=type(self).__name__,
other=type(key).__name__))

if isinstance(key, time):
if method is not None:
raise NotImplementedError('cannot yet lookup inexact labels '
Expand Down
4 changes: 3 additions & 1 deletion pandas/core/indexes/timedeltas.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
""" implement the TimedeltaIndex """
import operator
from datetime import datetime

import numpy as np
from pandas.core.dtypes.common import (
Expand Down Expand Up @@ -487,7 +488,8 @@ def get_loc(self, key, method=None, tolerance=None):
-------
loc : int
"""
if is_list_like(key):
if is_list_like(key) or (isinstance(key, datetime) and key is not NaT):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if we use isna else for NaT checking?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in this context isna is less clear. Since there is a specific na-like object we are catching here, we should be explicit about it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok this comment looks good

# GH#20464 for datetime case
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this more explicit, meaning datetime dtype for all-NaT

raise TypeError

if isna(key):
Expand Down
106 changes: 62 additions & 44 deletions pandas/tests/indexes/datetimes/test_astype.py
Original file line number Diff line number Diff line change
Expand Up @@ -246,15 +246,19 @@ def setup_method(self, method):
def test_to_period_millisecond(self):
index = self.index

period = index.to_period(freq='L')
with tm.assert_produces_warning(UserWarning):
# warning that timezone info will be lost
period = index.to_period(freq='L')
assert 2 == len(period)
assert period[0] == Period('2007-01-01 10:11:12.123Z', 'L')
assert period[1] == Period('2007-01-01 10:11:13.789Z', 'L')

def test_to_period_microsecond(self):
index = self.index

period = index.to_period(freq='U')
with tm.assert_produces_warning(UserWarning):
# warning that timezone info will be lost
period = index.to_period(freq='U')
assert 2 == len(period)
assert period[0] == Period('2007-01-01 10:11:12.123456Z', 'U')
assert period[1] == Period('2007-01-01 10:11:13.789123Z', 'U')
Expand All @@ -266,81 +270,95 @@ def test_to_period_tz_pytz(self):

ts = date_range('1/1/2000', '4/1/2000', tz='US/Eastern')

result = ts.to_period()[0]
expected = ts[0].to_period()
with tm.assert_produces_warning(UserWarning):
# warning that timezone info will be lost
result = ts.to_period()[0]
expected = ts[0].to_period()

assert result == expected
tm.assert_index_equal(ts.to_period(), xp)
assert result == expected
tm.assert_index_equal(ts.to_period(), xp)

ts = date_range('1/1/2000', '4/1/2000', tz=UTC)
ts = date_range('1/1/2000', '4/1/2000', tz=UTC)

result = ts.to_period()[0]
expected = ts[0].to_period()
result = ts.to_period()[0]
expected = ts[0].to_period()

assert result == expected
tm.assert_index_equal(ts.to_period(), xp)
assert result == expected
tm.assert_index_equal(ts.to_period(), xp)

ts = date_range('1/1/2000', '4/1/2000', tz=tzlocal())
ts = date_range('1/1/2000', '4/1/2000', tz=tzlocal())

result = ts.to_period()[0]
expected = ts[0].to_period()
result = ts.to_period()[0]
expected = ts[0].to_period()

assert result == expected
tm.assert_index_equal(ts.to_period(), xp)
assert result == expected
tm.assert_index_equal(ts.to_period(), xp)

def test_to_period_tz_warning(self):
# GH#21333 make sure a warning is issued when timezone
# info is lost
dti = date_range('1/1/2000', '4/1/2000', tz='US/Eastern')
with tm.assert_produces_warning(UserWarning):
# warning that timezone info will be lost
dti.to_period()

def test_to_period_tz_explicit_pytz(self):
xp = date_range('1/1/2000', '4/1/2000').to_period()

ts = date_range('1/1/2000', '4/1/2000', tz=pytz.timezone('US/Eastern'))

result = ts.to_period()[0]
expected = ts[0].to_period()
with tm.assert_produces_warning(UserWarning):
# warning that timezone info will be lost
result = ts.to_period()[0]
expected = ts[0].to_period()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in future PR these should be parameterized

assert result == expected
tm.assert_index_equal(ts.to_period(), xp)
assert result == expected
tm.assert_index_equal(ts.to_period(), xp)

ts = date_range('1/1/2000', '4/1/2000', tz=pytz.utc)
ts = date_range('1/1/2000', '4/1/2000', tz=pytz.utc)

result = ts.to_period()[0]
expected = ts[0].to_period()
result = ts.to_period()[0]
expected = ts[0].to_period()

assert result == expected
tm.assert_index_equal(ts.to_period(), xp)
assert result == expected
tm.assert_index_equal(ts.to_period(), xp)

ts = date_range('1/1/2000', '4/1/2000', tz=tzlocal())
ts = date_range('1/1/2000', '4/1/2000', tz=tzlocal())

result = ts.to_period()[0]
expected = ts[0].to_period()
result = ts.to_period()[0]
expected = ts[0].to_period()

assert result == expected
tm.assert_index_equal(ts.to_period(), xp)
assert result == expected
tm.assert_index_equal(ts.to_period(), xp)

def test_to_period_tz_dateutil(self):
xp = date_range('1/1/2000', '4/1/2000').to_period()

ts = date_range('1/1/2000', '4/1/2000', tz='dateutil/US/Eastern')

result = ts.to_period()[0]
expected = ts[0].to_period()
with tm.assert_produces_warning(UserWarning):
# warning that timezone info will be lost
result = ts.to_period()[0]
expected = ts[0].to_period()

assert result == expected
tm.assert_index_equal(ts.to_period(), xp)
assert result == expected
tm.assert_index_equal(ts.to_period(), xp)

ts = date_range('1/1/2000', '4/1/2000', tz=dateutil.tz.tzutc())
ts = date_range('1/1/2000', '4/1/2000', tz=dateutil.tz.tzutc())

result = ts.to_period()[0]
expected = ts[0].to_period()
result = ts.to_period()[0]
expected = ts[0].to_period()

assert result == expected
tm.assert_index_equal(ts.to_period(), xp)
assert result == expected
tm.assert_index_equal(ts.to_period(), xp)

ts = date_range('1/1/2000', '4/1/2000', tz=tzlocal())
ts = date_range('1/1/2000', '4/1/2000', tz=tzlocal())

result = ts.to_period()[0]
expected = ts[0].to_period()
result = ts.to_period()[0]
expected = ts[0].to_period()

assert result == expected
tm.assert_index_equal(ts.to_period(), xp)
assert result == expected
tm.assert_index_equal(ts.to_period(), xp)

def test_to_period_nofreq(self):
idx = DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-04'])
Expand Down
8 changes: 8 additions & 0 deletions pandas/tests/indexes/datetimes/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -586,3 +586,11 @@ def test_reasonable_keyerror(self):
with pytest.raises(KeyError) as excinfo:
index.get_loc('1/1/2000')
assert '2000' in str(excinfo.value)

def test_timedelta_invalid_key(self):
# GH#20464
dti = pd.date_range('1970-01-01', periods=10)
with pytest.raises(TypeError):
dti.get_loc(pd.Timedelta(0))
with pytest.raises(TypeError):
dti.get_loc(pd.Timedelta(1))
8 changes: 8 additions & 0 deletions pandas/tests/indexes/timedeltas/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,14 @@ def test_getitem(self):
tm.assert_index_equal(result, expected)
assert result.freq == expected.freq

def test_timestamp_invalid_key(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you parameterize over a datetime as well

# GH#20464
tdi = pd.timedelta_range(0, periods=10)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have a test that indexes with NaT? (both Timedelta and Datetime dtype)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have one for timedelta, just added one for datetime

with pytest.raises(TypeError):
tdi.get_loc(pd.Timestamp('1970-01-01'))
with pytest.raises(TypeError):
tdi.get_loc(pd.Timestamp('1970-01-02'))


class TestWhere(object):
# placeholder for symmetry with DatetimeIndex and PeriodIndex tests
Expand Down
8 changes: 8 additions & 0 deletions pandas/tests/scalar/timestamp/test_timestamp.py
Original file line number Diff line number Diff line change
Expand Up @@ -929,3 +929,11 @@ def test_to_datetime_bijective(self):
with tm.assert_produces_warning(exp_warning, check_stacklevel=False):
assert (Timestamp(Timestamp.min.to_pydatetime()).value / 1000 ==
Timestamp.min.value / 1000)

def test_to_period_tz_warning(self):
# GH#21333 make sure a warning is issued when timezone
# info is lost
ts = Timestamp('2009-04-15 16:17:18', tz='US/Eastern')
with tm.assert_produces_warning(UserWarning):
# warning that timezone info will be lost
ts.to_period('D')