Skip to content

BUG: astype with timedelta and datetime string (#22100) #22107

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -500,7 +500,7 @@ Datetimelike
Timedelta
^^^^^^^^^

-
- Fixed bug where :meth:`DataFrame.astype` could not convert timedelta strings (:issue:`#22100`)
-
-

Expand Down
3 changes: 2 additions & 1 deletion pandas/core/dtypes/cast.py
Original file line number Diff line number Diff line change
Expand Up @@ -712,7 +712,8 @@ def astype_nansafe(arr, dtype, copy=True):
elif is_object_dtype(arr):

# work around NumPy brokenness, #1987
if np.issubdtype(dtype.type, np.integer):
is_time = is_timedelta64_dtype(dtype)
if np.issubdtype(dtype.type, np.integer) and is_time:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use is_integer_dtype and also add a comment here (and can remove the existing work around comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but it breaks:

def test_other_timedelta_unit(self, unit):
# GH 13389
df1 = pd.DataFrame({'entity_id': [101, 102]})
s = pd.Series([None, None], index=[101, 102], name='days')
dtype = "m8[{}]".format(unit)
df2 = s.astype(dtype).to_frame('days')
assert df2['days'].dtype == 'm8[ns]'

Because of:
if ((PY3 and dtype not in [_INT64_DTYPE, _TD_DTYPE]) or
(not PY3 and dtype != _TD_DTYPE)):
# allow frequency conversions
# we return a float here!
if dtype.kind == 'm':
mask = isna(arr)
result = arr.astype(dtype).astype(np.float64)
result[mask] = np.nan
return result

Are this necessary? Here timedelta64 are turned to float64.
Should I change the test or this part of astype_nansafe?
Also, I've notices that conversions like pd.Series([None, 1, 2]).astype('timedelta64') do not work now. Should I open a new issue?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be missing something but shouldn't it not be going down this branch? i.e. isn't the statement dtype not in [_INT64_DTYPE, _TD_DTYPE] supposed to be false?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is IntNADType handled correctly here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WillAyd yah at this point I’m not sure if it’s tangential, but it looks like the TDDTYPE check should go above the (PY3 ...) check in line 688, after which that check is always-true.

return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)

# if we have a datetime/timedelta array of objects
Expand Down
13 changes: 12 additions & 1 deletion pandas/tests/dtypes/test_cast.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,8 @@
find_common_type,
construct_1d_object_array_from_listlike,
construct_1d_ndarray_preserving_na,
construct_1d_arraylike_from_scalar)
construct_1d_arraylike_from_scalar,
astype_nansafe)
from pandas.core.dtypes.dtypes import (
CategoricalDtype,
DatetimeTZDtype,
Expand Down Expand Up @@ -456,3 +457,13 @@ def test_cast_1d_arraylike_from_scalar_categorical(self):
def test_construct_1d_ndarray_preserving_na(values, dtype, expected):
result = construct_1d_ndarray_preserving_na(values, dtype=dtype)
tm.assert_numpy_array_equal(result, expected)


@pytest.mark.parametrize('arr, dtype, expected', [
(np.array(['0:0:1'], dtype='O'), 'm8[ns]', 'm8[ns]'),
(np.array(['0:0:1'], dtype='O'), 'm8', 'float64'),
])
def test_astype_nansafe(arr, dtype, expected):
# GH #22100
result = astype_nansafe(arr, dtype)
assert is_dtype_equal(result.dtype, expected)