BUG: astype with timedelta and datetime string (#22100) #22107

fjdiod · 2018-07-28T21:12:36Z

closes astype not converting timedelta and datetime strings #22100
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry
Not sure about test. Should I test DataFrame method astype instead? If so, where to put test?

pep8speaks · 2018-07-28T21:12:41Z

Hello @fjdiod! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on July 31, 2018 at 10:45 Hours UTC

jbrockmendel · 2018-07-28T21:39:41Z

pandas/tests/dtypes/test_cast.py

+def test_astype_nansafe(arr, dtype):
+    # GH #22100
+    result = astype_nansafe(arr, dtype)
+    is_dtype_equal(result.dtype, dtype)


Should there be an assertion here?

Piggybacking on this I think it would be better to explicitly parametrize the expected value and make assertions around that.

On a somewhat related note how do we want to handle datetime64[ns, tz]? Off the top of my head I can't think of how to do a conversion to that via astype so maybe should still raise for that?

fjdiod · 2018-07-29T00:43:54Z

Why do we need this?

pandas/pandas/core/dtypes/cast.py

Lines 688 to 699 in 0b7a08b

    
           if ((PY3 and dtype not in [_INT64_DTYPE, _TD_DTYPE]) or 
        
                   (not PY3 and dtype != _TD_DTYPE)): 
        
               # allow frequency conversions 
        
               # we return a float here! 
        
               if dtype.kind == 'm': 
        
                   mask = isna(arr) 
        
                   result = arr.astype(dtype).astype(np.float64) 
        
                   result[mask] = np.nan 
        
                   return result 
        
           elif dtype == _TD_DTYPE: 
        
               return arr.astype(_TD_DTYPE, copy=copy)

With that we have:

>>> s = pd.Series([None])
>>> s.astype('m8')
0   NaT
dtype: timedelta64[ns]
>>> s.astype('m8').astype('m8')
0   NaN
dtype: float64

which looks strange.
Also, I've noticed:

>>> s = pd.Series([None, 1, 2])
>>> s.astype('timedelta64[ns]')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/sergey/projects/pandas/pandas/util/_decorators.py", line 177, in wrapper
    return func(*args, **kwargs)
  File "/home/sergey/projects/pandas/pandas/core/generic.py", line 5182, in astype
    **kwargs)
  File "/home/sergey/projects/pandas/pandas/core/internals/managers.py", line 554, in astype
    return self.apply('astype', dtype=dtype, **kwargs)
  File "/home/sergey/projects/pandas/pandas/core/internals/managers.py", line 421, in apply
    applied = getattr(b, f)(**kwargs)
  File "/home/sergey/projects/pandas/pandas/core/internals/blocks.py", line 564, in astype
    **kwargs)
  File "/home/sergey/projects/pandas/pandas/core/internals/blocks.py", line 655, in _astype
    values = astype_nansafe(values.ravel(), dtype, copy=True)
  File "/home/sergey/projects/pandas/pandas/core/dtypes/cast.py", line 709, in astype_nansafe
    raise ValueError('Cannot convert non-finite values (NA or inf) to '
ValueError: Cannot convert non-finite values (NA or inf) to integer

Is it expected?

codecov · 2018-07-30T12:52:16Z

Codecov Report

Merging #22107 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #22107      +/-   ##
==========================================
- Coverage   92.07%   92.06%   -0.02%     
==========================================
  Files         170      170              
  Lines       50688    50706      +18     
==========================================
+ Hits        46671    46682      +11     
- Misses       4017     4024       +7

Flag	Coverage Δ
#multiple	`90.47% <100%> (-0.02%)`	⬇️
#single	`42.3% <51.56%> (-0.01%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/json/json.py	`92.47% <ø> (ø)`	⬆️
pandas/core/arrays/categorical.py	`95.95% <ø> (ø)`	⬆️
pandas/core/frame.py	`97.26% <ø> (ø)`	⬆️
pandas/core/series.py	`94.11% <ø> (ø)`	⬆️
pandas/io/html.py	`89.17% <ø> (ø)`	⬆️
pandas/util/testing.py	`85.69% <ø> (-0.21%)`	⬇️
pandas/core/generic.py	`96.47% <ø> (ø)`	⬆️
pandas/core/arrays/interval.py	`92.55% <ø> (-0.05%)`	⬇️
pandas/core/internals/blocks.py	`94.45% <100%> (ø)`	⬆️
pandas/core/arrays/datetimes.py	`95.5% <100%> (+0.04%)`	⬆️
... and 13 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 26b3e7d...74e670b. Read the comment docs.

fjdiod · 2018-07-30T19:25:11Z

Hello, @jbrockmendel, @WillAyd
Are there any other changes I should make?

jbrockmendel · 2018-07-30T20:23:37Z

pandas/core/dtypes/cast.py

@@ -712,7 +712,8 @@ def astype_nansafe(arr, dtype, copy=True):
    elif is_object_dtype(arr):

        # work around NumPy brokenness, #1987
-        if np.issubdtype(dtype.type, np.integer):
+        is_time = is_datetime64_dtype(dtype) or dtype == _TD_DTYPE


_TD_DTYPE is specifically timedelta64[ns]. Are there paths that get here with e.g. timedelta64[D]? (I'm assuming those should be excluded too.

IIRC datetimete64 dtype won't trigger np.issubtype(dtype.type, np.integer), am I remembering incorrectly?

Oh, seems I was wrong about datetime64, they worked fine before.

jbrockmendel · 2018-07-30T20:24:27Z

pandas/tests/dtypes/test_cast.py

+    (np.array(['2000'], dtype='object'),
+     'datetime64', 'datetime64[ns]'),
+])
+def test_astype_nansafe(arr, dtype, expected):


are datetime64[ns, TZ] dtypes relevant here?

FYI whatever the solution is here would impact the other limitation with orient='table' around timezone data:

pandas/pandas/io/json/table_schema.py

Line 304 in d010469

# Cannot directly use as_type with timezone data on object; raise for now

I'd be fine with just raising as part of this PR until its explicitly resolved but just bringing that up for visibility

jbrockmendel · 2018-07-30T20:25:56Z

pandas/tests/dtypes/test_cast.py

+    (np.array(['2000'], dtype='object'),
+     'datetime64[ns]', 'datetime64[ns]'),
+    (np.array(['2000'], dtype='object'),
+     'datetime64', 'datetime64[ns]'),


These can be made less verbose (easier to read if on one line, but not a huge deal) by changing 'object' --> 'O', 'timedelta64[ns]' --> 'm8[ns]', and 'datetime64[ns]'' --> M8[ns]

jreback · 2018-07-31T12:25:48Z

pandas/core/dtypes/cast.py

@@ -712,7 +712,8 @@ def astype_nansafe(arr, dtype, copy=True):
    elif is_object_dtype(arr):

        # work around NumPy brokenness, #1987
-        if np.issubdtype(dtype.type, np.integer):
+        is_time = is_timedelta64_dtype(dtype)
+        if np.issubdtype(dtype.type, np.integer) and is_time:


use is_integer_dtype and also add a comment here (and can remove the existing work around comment)

Ok, but it breaks:

pandas/pandas/tests/reshape/merge/test_merge.py

Lines 531 to 538 in f76a3f3

def test_other_timedelta_unit(self, unit):

# GH 13389

df1 = pd.DataFrame({'entity_id': [101, 102]})

s = pd.Series([None, None], index=[101, 102], name='days')

dtype = "m8[{}]".format(unit)

df2 = s.astype(dtype).to_frame('days')

assert df2['days'].dtype == 'm8[ns]'

Because of:

pandas/pandas/core/dtypes/cast.py

Lines 688 to 697 in f76a3f3

if ((PY3 and dtype not in [_INT64_DTYPE, _TD_DTYPE]) or

(not PY3 and dtype != _TD_DTYPE)):

# allow frequency conversions

# we return a float here!

if dtype.kind == 'm':

mask = isna(arr)

result = arr.astype(dtype).astype(np.float64)

result[mask] = np.nan

return result

Are this necessary? Here timedelta64 are turned to float64.
Should I change the test or this part of astype_nansafe?
Also, I've notices that conversions like pd.Series([None, 1, 2]).astype('timedelta64') do not work now. Should I open a new issue?

Might be missing something but shouldn't it not be going down this branch? i.e. isn't the statement dtype not in [_INT64_DTYPE, _TD_DTYPE] supposed to be false?

Is IntNADType handled correctly here?

@WillAyd yah at this point I’m not sure if it’s tangential, but it looks like the TDDTYPE check should go above the (PY3 ...) check in line 688, after which that check is always-true.

jreback · 2018-11-01T01:44:24Z

closing as stale. if you'd like to continue pls ping.

BUG: astype with timedelta and datetime string (pandas-dev#22100)

8cb0654

fix style

52257e3

fjdiod mentioned this pull request Jul 28, 2018

ENH: Add Timedelta Support to JSON Reader with orient=table (#21140) #21827

Closed

4 tasks

jbrockmendel reviewed Jul 28, 2018

View reviewed changes

fjdiod added 2 commits July 29, 2018 01:12

fix test

7196cf5

fix m8

e508d93

Sergey Solovev added 2 commits July 30, 2018 15:43

Merge remote-tracking branch 'upstream/master' into astype-time

3a182a3

add whatsnew entry

7be8de9

gfyoung added Bug Datetime Datetime data dtype labels Jul 30, 2018

jbrockmendel reviewed Jul 30, 2018

View reviewed changes

Sergey Solovev added 2 commits July 31, 2018 11:43

fix test_astype_nansafe

21c0520

fix datetime64 in astype_nansafe

74e670b

jreback requested changes Jul 31, 2018

View reviewed changes

jreback closed this Nov 1, 2018

	def test_other_timedelta_unit(self, unit):
	# GH 13389
	df1 = pd.DataFrame({'entity_id': [101, 102]})
	s = pd.Series([None, None], index=[101, 102], name='days')

	dtype = "m8[{}]".format(unit)
	df2 = s.astype(dtype).to_frame('days')
	assert df2['days'].dtype == 'm8[ns]'

	if ((PY3 and dtype not in [_INT64_DTYPE, _TD_DTYPE]) or
	(not PY3 and dtype != _TD_DTYPE)):

	# allow frequency conversions
	# we return a float here!
	if dtype.kind == 'm':
	mask = isna(arr)
	result = arr.astype(dtype).astype(np.float64)
	result[mask] = np.nan
	return result

Uh oh!

BUG: astype with timedelta and datetime string (#22100) #22107

BUG: astype with timedelta and datetime string (#22100) #22107

Uh oh!

Conversation

fjdiod commented Jul 28, 2018 • edited by WillAyd Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pep8speaks commented Jul 28, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated on July 31, 2018 at 10:45 Hours UTC

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fjdiod commented Jul 29, 2018

Uh oh!

codecov bot commented Jul 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

fjdiod commented Jul 30, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented Nov 1, 2018

Uh oh!

Uh oh!

fjdiod commented Jul 28, 2018 •

edited by WillAyd

Loading

pep8speaks commented Jul 28, 2018 •

edited

Loading

codecov bot commented Jul 30, 2018 •

edited

Loading