Skip to content

"AssertionError: Index length did not match values" when resampling with kind='period' #3609

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
JackKelly opened this issue May 15, 2013 · 13 comments · Fixed by #5432
Closed
Milestone

Comments

@JackKelly
Copy link
Contributor

Should raise that kind='period' is not accepted for DatetimeIndex when resampling
Possible issue with period index resampling hanging (see @cpcloud example below)

version = 0.12.0.dev-f61d7e3

This bug also exists in 0.11.

The bug

In [20]: s.resample('T', kind='period')
-----------------
AssertionError  
Traceback (most recent call last)
<ipython-input-79-c290c0578332> in <module>()
----> 1 s.resample('T', kind='period')

/home/dk3810/workspace/python/pda/scripts/src/pandas/pandas/core/generic.py in resample(self, rule, how, axis, fill_method, closed, label, convention, kind, loffset, limit, base)
    255                               fill_method=fill_method, convention=convention,
    256                               limit=limit, base=base)
--> 257         return sampler.resample(self)
    258 
    259     def first(self, offset):

/home/dk3810/workspace/python/pda/scripts/src/pandas/pandas/tseries/resample.py in resample(self, obj)
     81 
     82         if isinstance(axis, DatetimeIndex):
---> 83             rs = self._resample_timestamps(obj)
     84         elif isinstance(axis, PeriodIndex):
     85             offset = to_offset(self.freq)

/home/dk3810/workspace/python/pda/scripts/src/pandas/pandas/tseries/resample.py in _resample_timestamps(self, obj)
    224             # Irregular data, have to use groupby
    225             grouped = obj.groupby(grouper, axis=self.axis)
--> 226             result = grouped.aggregate(self._agg_method)
    227 
    228             if self.fill_method is not None:

/home/dk3810/workspace/python/pda/scripts/src/pandas/pandas/core/groupby.py in aggregate(self, func_or_funcs, *args, **kwargs)
   1410         if isinstance(func_or_funcs, basestring):
-> 1411             return getattr(self, func_or_funcs)(*args, **kwargs)
   1412 
   1413         if hasattr(func_or_funcs, '__iter__'):

/home/dk3810/workspace/python/pda/scripts/src/pandas/pandas/core/groupby.py in mean(self)
    356         except Exception:  # pragma: no cover
    357             f = lambda x: x.mean(axis=self.axis)
--> 358             return self._python_agg_general(f)
    359 
    360     def median(self):

/home/dk3810/workspace/python/pda/scripts/src/pandas/pandas/core/groupby.py in _python_agg_general(self, func, *args, **kwargs)
    498                 output[name] = self._try_cast(values[mask],result)
    499 
--> 500         return self._wrap_aggregated_output(output)
    501 
    502     def _wrap_applied_output(self, *args, **kwargs):

/home/dk3810/workspace/python/pda/scripts/src/pandas/pandas/core/groupby.py in _wrap_aggregated_output(self, output, names)
   1473             return DataFrame(output, index=index, columns=names)
   1474         else:
-> 1475             return Series(output, index=index, name=self.name)
   1476 
   1477     def _wrap_applied_output(self, keys, values, not_indexed_same=False):

/home/dk3810/workspace/python/pda/scripts/src/pandas/pandas/core/series.py in __new__(cls, data, index, dtype, name, copy)
    494         else:
    495             subarr = subarr.view(Series)
--> 496         subarr.index = index
    497         subarr.name = name
    498 

/home/dk3810/workspace/python/pda/scripts/src/pandas/pandas/lib.so in pandas.lib.SeriesIndex.__set__ (pandas/lib.c:29775)()

AssertionError: Index length did not match values

A workaround / expected behaviour

In [81]: s.resample('T').to_period()
Out[81]: 
2013-04-12 19:15    325.000000
2013-04-12 19:16    326.899994
...
2013-04-12 22:58    305.600006
2013-04-12 22:59    320.444458
Freq: T, Length: 225, dtype: float32

More information

In [83]: s
Out[83]: 
2013-04-12 19:15:25    323
2013-04-12 19:15:28    NaN
...
2013-04-12 22:59:55    319
2013-04-12 22:59:56    NaN
2013-04-12 22:59:57    NaN
2013-04-12 22:59:58    NaN
2013-04-12 22:59:59    NaN
Name: aggregate, Length: 13034, dtype: float32

In [76]: s.index
Out[76]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-04-12 19:15:25, ..., 2013-04-12 22:59:59]
Length: 13034, Freq: None, Timezone: None

In [77]: s.head()
Out[77]: 
2013-04-12 19:15:25    323
2013-04-12 19:15:28    NaN
2013-04-12 19:15:29    NaN
2013-04-12 19:15:30    NaN
2013-04-12 19:15:31    327
Name: aggregate, dtype: float32

In [78]: s.resample('T')
Out[78]: 
2013-04-12 19:15:00    325.000000
2013-04-12 19:16:00    326.899994
...
2013-04-12 22:58:00    305.600006
2013-04-12 22:59:00    320.444458
Freq: T, Length: 225, dtype: float32

In [80]: pd.__version__
Out[80]: '0.12.0.dev-f61d7e3'

In [84]: type(s)
Out[84]: pandas.core.series.TimeSeries

(Please let me know if you need more info! I'm using Ubuntu 13.04. It's entirely possible that this isn't a bug but instead I am doing something stupid. Oh, and let me take this opportunity to thank the Pandas dev team! Pandas is awesome!!! THANK YOU!)

@jreback
Copy link
Contributor

jreback commented May 15, 2013

This is not supported. As you indicated you can resample then to_period, or

s.to_period().resample('T',kind='period' will also work

I'll make this an enhancement/bug request, because it should raise a helpful message (or be implemented)

thanks

@JackKelly
Copy link
Contributor Author

wow, thanks for the very swift response ;)

@cpcloud
Copy link
Member

cpcloud commented May 15, 2013

I actually don't get an error here. It just hangs.

@jreback
Copy link
Contributor

jreback commented May 15, 2013

@cpcloud can you post what you did?

@cpcloud
Copy link
Member

cpcloud commented May 15, 2013

@jreback Yeah, sorry to hit and run like that :)

dind = period_range('1/1/2001', '1/1/2002').to_timestamp()
s = Series(randn(dind.size), dind)
s.resample('T', kind='period')  # hangs here

The other ways of doing this (from above) work fine. Doesn't hang (throws the above error) for the simple case of

dind = period_range('1/1/2001', '1/2/2001').to_timestamp()
s = Series(randn(dind.size), dind)
s.resample('T', kind='period')  # hangs here

and starts to hang for dind.size > 2.

@jreback
Copy link
Contributor

jreback commented May 15, 2013

hmm...that might be something else
I replicated @JackKelly was donig by this

In [16]: s = Series(range(100),index=date_range('20130101',freq='s',periods=100),dtype='float')

In [17]: s[10:30] = np.nan

In [18]: s.to_period().resample('T',kind='period')
Out[18]: 
2013-01-01 00:00    34.5
2013-01-01 00:01    79.5
Freq: T, dtype: float64

In [19]: s.resample('T',kind='period')
AssertionError: Index length did not match values

@cpcloud
Copy link
Member

cpcloud commented May 15, 2013

Yeah that works.

@cpcloud
Copy link
Member

cpcloud commented May 15, 2013

should i open an issue for the above? seems to be a day frequency issue.

@jreback
Copy link
Contributor

jreback commented May 15, 2013

yeh....(I put it in the header) so just ref this issue too

@cpcloud
Copy link
Member

cpcloud commented Jun 22, 2013

@jreback i would like to clear this up since clearing it up would actually close 3 issues: this (#3609), #3612, and #3899. what's the original reason for not supporting this...my current fix loses the last element when resampling from datetimes to period so i'm guessing that might be one issue...but that's because of the ability to choose your resampling either include the start/end point of datetimes which periods don't have

@cpcloud
Copy link
Member

cpcloud commented Oct 4, 2013

one issue that two element case is not handled

@cpcloud
Copy link
Member

cpcloud commented Oct 4, 2013

now i'm getting a segfault when i try to use sum ... joy

@jreback
Copy link
Contributor

jreback commented Oct 4, 2013

move back to 0.13 then?

kevinastone added a commit to kevinastone/pandas that referenced this issue Nov 5, 2013
…cal timezone.

Related to pandas-dev#5340.

Signed-off-by: Kevin Stone <[email protected]>

Added Test Case for pandas-dev#3609.

Signed-off-by: Kevin Stone <[email protected]>

Fixes Grouping by Period with Timezones

The timestamp generated to partition the data frame doesn't include timezone information, so it was creating the wrong groups.  It also had the frequency ('D') hard coded.

Fixes pandas-dev#5340 and pandas-dev#3609.

Signed-off-by: Kevin Stone <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants