Skip to content

BUG: astype with timedelta and datetime string (#22100) #22107

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from
Closed
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -500,7 +500,7 @@ Datetimelike
Timedelta
^^^^^^^^^

-
- Fixed bug where :meth:`DataFrame.astype` could not convert timedelta and datetime strings (:issue:`#22100`)
-
-

Expand Down
3 changes: 2 additions & 1 deletion pandas/core/dtypes/cast.py
Original file line number Diff line number Diff line change
Expand Up @@ -712,7 +712,8 @@ def astype_nansafe(arr, dtype, copy=True):
elif is_object_dtype(arr):

# work around NumPy brokenness, #1987
if np.issubdtype(dtype.type, np.integer):
is_time = is_datetime64_dtype(dtype) or dtype == _TD_DTYPE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_TD_DTYPE is specifically timedelta64[ns]. Are there paths that get here with e.g. timedelta64[D]? (I'm assuming those should be excluded too.

IIRC datetimete64 dtype won't trigger np.issubtype(dtype.type, np.integer), am I remembering incorrectly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, seems I was wrong about datetime64, they worked fine before.

if np.issubdtype(dtype.type, np.integer) and not is_time:
return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)

# if we have a datetime/timedelta array of objects
Expand Down
17 changes: 16 additions & 1 deletion pandas/tests/dtypes/test_cast.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,8 @@
find_common_type,
construct_1d_object_array_from_listlike,
construct_1d_ndarray_preserving_na,
construct_1d_arraylike_from_scalar)
construct_1d_arraylike_from_scalar,
astype_nansafe)
from pandas.core.dtypes.dtypes import (
CategoricalDtype,
DatetimeTZDtype,
Expand Down Expand Up @@ -456,3 +457,17 @@ def test_cast_1d_arraylike_from_scalar_categorical(self):
def test_construct_1d_ndarray_preserving_na(values, dtype, expected):
result = construct_1d_ndarray_preserving_na(values, dtype=dtype)
tm.assert_numpy_array_equal(result, expected)


@pytest.mark.parametrize('arr, dtype, expected', [
(np.array(['0:0:1'], dtype='object'),
'timedelta64[ns]', 'timedelta64[ns]'),
(np.array(['2000'], dtype='object'),
'datetime64[ns]', 'datetime64[ns]'),
(np.array(['2000'], dtype='object'),
'datetime64', 'datetime64[ns]'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can be made less verbose (easier to read if on one line, but not a huge deal) by changing 'object' --> 'O', 'timedelta64[ns]' --> 'm8[ns]', and 'datetime64[ns]'' --> M8[ns]

])
def test_astype_nansafe(arr, dtype, expected):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are datetime64[ns, TZ] dtypes relevant here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI whatever the solution is here would impact the other limitation with orient='table' around timezone data:

# Cannot directly use as_type with timezone data on object; raise for now

I'd be fine with just raising as part of this PR until its explicitly resolved but just bringing that up for visibility

# GH #22100
result = astype_nansafe(arr, dtype)
assert is_dtype_equal(result.dtype, expected)