Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG - remove scaling multiplier from Period diff result #23915

Merged
merged 23 commits into from
Dec 9, 2018

Conversation

ms7463
Copy link
Contributor

@ms7463 ms7463 commented Nov 26, 2018

@pep8speaks
Copy link

pep8speaks commented Nov 26, 2018

Hello @ArtinSarraf! Thanks for updating the PR.

Comment last updated on December 07, 2018 at 04:37 Hours UTC

@codecov
Copy link

codecov bot commented Nov 26, 2018

Codecov Report

Merging #23915 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #23915      +/-   ##
==========================================
- Coverage    92.2%    92.2%   -0.01%     
==========================================
  Files         162      162              
  Lines       51701    51700       -1     
==========================================
- Hits        47672    47670       -2     
- Misses       4029     4030       +1
Flag Coverage Δ
#multiple 90.6% <100%> (-0.01%) ⬇️
#single 43.02% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/arrays/datetimelike.py 96.35% <100%> (ø) ⬆️
pandas/core/internals/blocks.py 93.65% <0%> (-0.07%) ⬇️
pandas/core/indexes/base.py 96.27% <0%> (-0.01%) ⬇️
pandas/core/frame.py 96.91% <0%> (ø) ⬆️
pandas/core/generic.py 96.65% <0%> (ø) ⬆️
pandas/core/arrays/interval.py 92.98% <0%> (ø) ⬆️
pandas/core/groupby/groupby.py 96.5% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b841374...9ed3629. Read the comment docs.

@gfyoung gfyoung added Bug Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Period Period data type labels Nov 26, 2018
Copy link
Member

@gfyoung gfyoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ArtinSarraf : Good start! I have a couple of comments regarding the tests.

Also, don't forget to add a whatsnew for this bug.

])
def test_period_diff(self, freq, expected):
# GH 23878
for i in range(1, 4):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameterize on i as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

tm.assert_equal(result, expected)
# This test is broken
# result = to_offset('3M') + pi
# tm.assert_equal(result, expected)
Copy link
Member

@gfyoung gfyoung Nov 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we commenting this out?

Existing tests shouldn't be broken (or xfailed) unless we have a very good reason for this...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yea, sorry meant to make a comment addressing this in the PR discussion. This was failing for me on a clean checkout of the latest master code, before I made my changes. I can provide more details tomorrow.

Copy link
Member

@gfyoung gfyoung Nov 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you could revert the change here and let CI run on it, that would be great actually, so that we can also check if it's just a local failure or a more likely implementation issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put it back and tests passed, looks like it was some transient local issue.

@pandas-dev pandas-dev deleted a comment from mroeschke Nov 26, 2018
@pandas-dev pandas-dev deleted a comment from mroeschke Nov 26, 2018
@ms7463
Copy link
Contributor Author

ms7463 commented Nov 26, 2018

@gfyoung - the reason I didn’t add a whatsnew entry is that this Offset result from diffing periods is already new behavior in 0.24 so the bug had not been released yet. Should I still add an entry for it?

@gfyoung
Copy link
Member

gfyoung commented Nov 26, 2018

the reason I didn’t add a whatsnew entry is that this Offset result from diffing periods is already new behavior in 0.24 so the bug had not been released yet. Should I still add an entry for it?

Ah, gotcha. In that case, just add a reference to your issue number to the existing whatsnew entry instead.

@@ -1685,7 +1685,7 @@ cdef class _Period(object):
if other.freq != self.freq:
msg = _DIFFERENT_FREQ.format(self.freqstr, other.freqstr)
raise IncompatibleFrequency(msg)
return (self.ordinal - other.ordinal) * self.freq
return (self.ordinal - other.ordinal) * type(self.freq)()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this be wrong for offsets that have relevant keywords?

Copy link
Contributor Author

@ms7463 ms7463 Nov 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You’re right. So one option would be to do

... * type(self.freq)(normalize=self.freq.normalize, **self.freq.kwds)

This way no other classes need to be modified. However, I think it might be worth adding a property to the DateOffset objet to do the above suggestion. Something like DateOffset.base?

Copy link
Contributor Author

@ms7463 ms7463 Nov 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -1085,3 +1085,21 @@ def test_pi_sub_period_nat(self):
exp = pd.TimedeltaIndex([np.nan, np.nan, np.nan, np.nan], name='idx')
tm.assert_index_equal(idx - pd.Period('NaT', freq='M'), exp)
tm.assert_index_equal(pd.Period('NaT', freq='M') - idx, exp)


class TestPeriodArithmetic(object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all belongs in pandas.tests.scalar.period. If you want to make a new file test_arithmetic.py in that directory, that'd be OK. Otherwise put in test_period.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved

(pd.offsets.Day, 214),
(pd.offsets.MonthEnd, 7),
(pd.offsets.YearEnd, 1),
])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely needs cases with kwargs passed to offset constructors. Putting it in tests.tseries.offsets might be useful since there are test classes that construct a bunch of these

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parameterized the tests on some kwargs too, only YearEnd takes any kwds besides normalize ('month'). And only non-Tick offsets (in this case MonthEnd and YearEnd) can take normalize=True.

@ms7463
Copy link
Contributor Author

ms7463 commented Dec 1, 2018

@gfyoung / @jbrockmendel - any other changes to consider?

@@ -1685,7 +1685,9 @@ cdef class _Period(object):
if other.freq != self.freq:
msg = _DIFFERENT_FREQ.format(self.freqstr, other.freqstr)
raise IncompatibleFrequency(msg)
return (self.ordinal - other.ordinal) * self.freq
base_freq = type(self.freq)(normalize=self.freq.normalize,
**self.freq.kwds)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pass n=1 explicitly here.

add a reference # GH#23915 for future readers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

expected = 0
else:
return
# Only non-Tick frequencies can have normalize set to True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably cleaner to test separately

also this gets a couple of non-tick frequencies, but using the structure in tests.tseries.offsets should make it feasible to be a lot more thorough

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separated.
Only 4 of the non-tick frequencies in pd.offsets are valid Period frequencies so I explicitly parameterized those in the tests. The Tick fixture worked well though.

@jbrockmendel
Copy link
Member

It looks like the same bug exists in the analogous PeriodIndex op. Want to fix it there while you're at it?

@jreback
Copy link
Contributor

jreback commented Dec 2, 2018

@gfyoung if you have any comments.

@jreback jreback added this to the 0.24.0 milestone Dec 2, 2018
@jreback
Copy link
Contributor

jreback commented Dec 2, 2018

@jbrockmendel if you'd have a look at the tests and see if they are sufficient

@jreback
Copy link
Contributor

jreback commented Dec 3, 2018

@ArtinSarraf can you merge master and see if you can resolve failures

@jreback jreback removed this from the 0.24.0 milestone Dec 3, 2018
@ms7463
Copy link
Contributor Author

ms7463 commented Dec 4, 2018

@jreback / @jbrockmendel - is there any way to recreate the testing envs of the automated tests. My tests run fine locally and from the failed test output its not obvious what the error is (since the repr of the result and expected are the same), I'm assuming either the normalize kwd attributes are differing somehow.

@ms7463
Copy link
Contributor Author

ms7463 commented Dec 6, 2018

@jreback / @jbrockmendel

Found the issue. Looks like this is due to an existing bug with PeriodIndex (I will open a separate issue for this). See the example below.

>>> pd.PeriodIndex(['19910905'], freq=pd.offsets.YearEnd(normalize=True)).freq.normalize
True
>>> pd.PeriodIndex(['19910905'], freq=pd.offsets.YearEnd(normalize=False)).freq.normalize
True

Restart the process

>>> pd.PeriodIndex(['19910905'], freq=pd.offsets.YearEnd(normalize=False)).freq.normalize
False
>>> pd.PeriodIndex(['19910905'], freq=pd.offsets.YearEnd(normalize=True)).freq.normalize
False

Looks like the normalize option gets cached for PeriodIndex somehow. This causes my tests to fail because I iterate through normalize = True | False.

I will remove the normalize parameterization from the tests for now.

@jbrockmendel
Copy link
Member

Looks like the normalize option gets cached for PeriodIndex somehow

Good catch. Best guess is it is in pandas.tseries.frequencies

@ms7463
Copy link
Contributor Author

ms7463 commented Dec 6, 2018

@jbrockmendel - looks like the only failing test in the pandas-dev.pandas tests are linting errors now due to importing the tick_classes fixture (since it's not in the discovery path for those tests).

from pandas.tests.tseries.offsets.conftest import tick_classes

Is there another way to discover this fixture? I could move the test, but I think it makes sense to keep it grouped where it is.

@@ -16,6 +16,7 @@
from pandas.core import ops
from pandas import Period, PeriodIndex, period_range, Series
from pandas.tseries.frequencies import to_offset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we can't do this, instead move the tick_classes fixture to pandas/conftest.py. I think everything should still work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@ms7463
Copy link
Contributor Author

ms7463 commented Dec 7, 2018

@jreback / @jbrockmendel - anything else to consider?

@jreback jreback added this to the 0.24.0 milestone Dec 7, 2018
@jreback
Copy link
Contributor

jreback commented Dec 7, 2018

@ArtinSarraf lgtm. can you add a whatsnew note. ping on green.

@gfyoung good?

@ms7463
Copy link
Contributor Author

ms7463 commented Dec 8, 2018

@jreback - tests are all clean.

@jreback jreback merged commit 8dc22d8 into pandas-dev:master Dec 9, 2018
@jreback
Copy link
Contributor

jreback commented Dec 9, 2018

thanks @ArtinSarraf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Bug Period Period data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: non-standard frequency Period arithmetic
6 participants