Skip to content

BUG: Fix insertion of wrong-dtypes NaT into Series[m8ns] #27323

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jul 17, 2019

Conversation

jbrockmendel
Copy link
Member

Broken off from #27311 to troubleshoot build-specific failures. Also re-wrote test to parametrize and be more succinct

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I agree with the point of this PR. the np.* based 'NaT' should never exist in pandas at all, ever, we simply need to cast them appropriately; I get that you want to turn an M8 array when presented with an m8('NaT') into object, but I do't think we currently do. However I don't view this distinct as very import and it adds a lot of complexity. We now treat NaT, np.nan, None and np.*('nat') synonymously as missing values when presented to say a M8 (or a m8); I don't see why we don't simply say, hey I have a missig value, great use the value for this dtype.

What you are trying to do here IMHO is inserting even more of numpy issues into pandas when we are trying to do the opposite.

@jreback jreback added the Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate label Jul 10, 2019
@jbrockmendel
Copy link
Member Author

not sure I agree with the point of this PR. the np.* based 'NaT' should never exist in pandas at all, ever, we simply need to cast them appropriately

xref #24983. What you are suggesting is 100% impossible to do correctly.

I get that you want to turn an M8 array when presented with an m8('NaT') into object, but I do't think we currently do

Well you're right that it is not what we currently do, but that is a bug. It is unambiguously the correct behavior.

However I don't view this distinct as very import and it adds a lot of complexity

It's your prerogative (a word whose spelling never ceases to amaze me) to consider it a low-priority bug. On the flip side, I see complexity stemming from the fact that we have to guess what dtype NaT is behaving as in any given situation, and the impossibility of always guessing correctly.

We now treat NaT, np.nan, None and np.*('nat') synonymously as missing values when presented to say a M8 (or a m8); I don't see why we don't simply say, hey I have a missig value, great use the value for this dtype.

Consider: if a user passes pd.NaT, that does basically your "hey I have a missing value" thing. But if a user specifically passes np.datetime64("NaT"), they did that on purpose.

elif util.is_datetime64_object(value):
# exclude np.datetime64("NaT") which would otherwise be picked up
# by the `value != value check below
pass
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really everything from 531 down to 555 should be ripped out and this handled by the DatetimeArray/TimedeltaArray __setitem__ implementation

@jbrockmendel
Copy link
Member Author

changed test class to just a function

@jbrockmendel
Copy link
Member Author

rebased following can_hold_element fix

@jbrockmendel
Copy link
Member Author

gentle ping; this one is a bugfix, admittedly a small one

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @jreback?

@jreback jreback added this to the 0.25.0 milestone Jul 17, 2019
@jreback
Copy link
Contributor

jreback commented Jul 17, 2019

sure

@jreback jreback merged commit 479d003 into pandas-dev:master Jul 17, 2019
@jreback
Copy link
Contributor

jreback commented Jul 17, 2019

thanks @jbrockmendel

@jbrockmendel jbrockmendel deleted the wrong_nat2 branch July 17, 2019 22:18
another-green pushed a commit to another-green/pandas that referenced this pull request Jul 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants