"AssertionError: Index length did not match values" when resampling with kind='period' #3609

JackKelly · 2013-05-15T12:44:42Z

Should raise that kind='period' is not accepted for DatetimeIndex when resampling
Possible issue with period index resampling hanging (see @cpcloud example below)

version = 0.12.0.dev-f61d7e3

This bug also exists in 0.11.

The bug

In [20]: s.resample('T', kind='period')
-----------------
AssertionError  
Traceback (most recent call last)
<ipython-input-79-c290c0578332> in <module>()
----> 1 s.resample('T', kind='period')

/home/dk3810/workspace/python/pda/scripts/src/pandas/pandas/core/generic.py in resample(self, rule, how, axis, fill_method, closed, label, convention, kind, loffset, limit, base)
    255                               fill_method=fill_method, convention=convention,
    256                               limit=limit, base=base)
--> 257         return sampler.resample(self)
    258 
    259     def first(self, offset):

/home/dk3810/workspace/python/pda/scripts/src/pandas/pandas/tseries/resample.py in resample(self, obj)
     81 
     82         if isinstance(axis, DatetimeIndex):
---> 83             rs = self._resample_timestamps(obj)
     84         elif isinstance(axis, PeriodIndex):
     85             offset = to_offset(self.freq)

/home/dk3810/workspace/python/pda/scripts/src/pandas/pandas/tseries/resample.py in _resample_timestamps(self, obj)
    224             # Irregular data, have to use groupby
    225             grouped = obj.groupby(grouper, axis=self.axis)
--> 226             result = grouped.aggregate(self._agg_method)
    227 
    228             if self.fill_method is not None:

/home/dk3810/workspace/python/pda/scripts/src/pandas/pandas/core/groupby.py in aggregate(self, func_or_funcs, *args, **kwargs)
   1410         if isinstance(func_or_funcs, basestring):
-> 1411             return getattr(self, func_or_funcs)(*args, **kwargs)
   1412 
   1413         if hasattr(func_or_funcs, '__iter__'):

/home/dk3810/workspace/python/pda/scripts/src/pandas/pandas/core/groupby.py in mean(self)
    356         except Exception:  # pragma: no cover
    357             f = lambda x: x.mean(axis=self.axis)
--> 358             return self._python_agg_general(f)
    359 
    360     def median(self):

/home/dk3810/workspace/python/pda/scripts/src/pandas/pandas/core/groupby.py in _python_agg_general(self, func, *args, **kwargs)
    498                 output[name] = self._try_cast(values[mask],result)
    499 
--> 500         return self._wrap_aggregated_output(output)
    501 
    502     def _wrap_applied_output(self, *args, **kwargs):

/home/dk3810/workspace/python/pda/scripts/src/pandas/pandas/core/groupby.py in _wrap_aggregated_output(self, output, names)
   1473             return DataFrame(output, index=index, columns=names)
   1474         else:
-> 1475             return Series(output, index=index, name=self.name)
   1476 
   1477     def _wrap_applied_output(self, keys, values, not_indexed_same=False):

/home/dk3810/workspace/python/pda/scripts/src/pandas/pandas/core/series.py in __new__(cls, data, index, dtype, name, copy)
    494         else:
    495             subarr = subarr.view(Series)
--> 496         subarr.index = index
    497         subarr.name = name
    498 

/home/dk3810/workspace/python/pda/scripts/src/pandas/pandas/lib.so in pandas.lib.SeriesIndex.__set__ (pandas/lib.c:29775)()

AssertionError: Index length did not match values

A workaround / expected behaviour

In [81]: s.resample('T').to_period()
Out[81]: 
2013-04-12 19:15    325.000000
2013-04-12 19:16    326.899994
...
2013-04-12 22:58    305.600006
2013-04-12 22:59    320.444458
Freq: T, Length: 225, dtype: float32

More information

In [83]: s
Out[83]: 
2013-04-12 19:15:25    323
2013-04-12 19:15:28    NaN
...
2013-04-12 22:59:55    319
2013-04-12 22:59:56    NaN
2013-04-12 22:59:57    NaN
2013-04-12 22:59:58    NaN
2013-04-12 22:59:59    NaN
Name: aggregate, Length: 13034, dtype: float32

In [76]: s.index
Out[76]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-04-12 19:15:25, ..., 2013-04-12 22:59:59]
Length: 13034, Freq: None, Timezone: None

In [77]: s.head()
Out[77]: 
2013-04-12 19:15:25    323
2013-04-12 19:15:28    NaN
2013-04-12 19:15:29    NaN
2013-04-12 19:15:30    NaN
2013-04-12 19:15:31    327
Name: aggregate, dtype: float32

In [78]: s.resample('T')
Out[78]: 
2013-04-12 19:15:00    325.000000
2013-04-12 19:16:00    326.899994
...
2013-04-12 22:58:00    305.600006
2013-04-12 22:59:00    320.444458
Freq: T, Length: 225, dtype: float32

In [80]: pd.__version__
Out[80]: '0.12.0.dev-f61d7e3'

In [84]: type(s)
Out[84]: pandas.core.series.TimeSeries

(Please let me know if you need more info! I'm using Ubuntu 13.04. It's entirely possible that this isn't a bug but instead I am doing something stupid. Oh, and let me take this opportunity to thank the Pandas dev team! Pandas is awesome!!! THANK YOU!)

The text was updated successfully, but these errors were encountered:

jreback · 2013-05-15T12:56:25Z

This is not supported. As you indicated you can resample then to_period, or

s.to_period().resample('T',kind='period' will also work

I'll make this an enhancement/bug request, because it should raise a helpful message (or be implemented)

thanks

JackKelly · 2013-05-15T12:57:52Z

wow, thanks for the very swift response ;)

cpcloud · 2013-05-15T15:59:01Z

I actually don't get an error here. It just hangs.

jreback · 2013-05-15T16:26:07Z

@cpcloud can you post what you did?

cpcloud · 2013-05-15T17:37:17Z

@jreback Yeah, sorry to hit and run like that :)

dind = period_range('1/1/2001', '1/1/2002').to_timestamp()
s = Series(randn(dind.size), dind)
s.resample('T', kind='period')  # hangs here

The other ways of doing this (from above) work fine. Doesn't hang (throws the above error) for the simple case of

dind = period_range('1/1/2001', '1/2/2001').to_timestamp()
s = Series(randn(dind.size), dind)
s.resample('T', kind='period')  # hangs here

and starts to hang for dind.size > 2.

jreback · 2013-05-15T17:47:34Z

hmm...that might be something else
I replicated @JackKelly was donig by this

In [16]: s = Series(range(100),index=date_range('20130101',freq='s',periods=100),dtype='float')

In [17]: s[10:30] = np.nan

In [18]: s.to_period().resample('T',kind='period')
Out[18]: 
2013-01-01 00:00    34.5
2013-01-01 00:01    79.5
Freq: T, dtype: float64

In [19]: s.resample('T',kind='period')
AssertionError: Index length did not match values

cpcloud · 2013-05-15T17:55:10Z

Yeah that works.

cpcloud · 2013-05-15T18:00:05Z

should i open an issue for the above? seems to be a day frequency issue.

jreback · 2013-05-15T18:08:42Z

yeh....(I put it in the header) so just ref this issue too

cpcloud · 2013-06-22T01:57:55Z

@jreback i would like to clear this up since clearing it up would actually close 3 issues: this (#3609), #3612, and #3899. what's the original reason for not supporting this...my current fix loses the last element when resampling from datetimes to period so i'm guessing that might be one issue...but that's because of the ability to choose your resampling either include the start/end point of datetimes which periods don't have

cpcloud · 2013-10-04T18:35:13Z

one issue that two element case is not handled

cpcloud · 2013-10-04T18:36:07Z

now i'm getting a segfault when i try to use sum ... joy

jreback · 2013-10-04T18:58:32Z

move back to 0.13 then?

…cal timezone. Related to pandas-dev#5340. Signed-off-by: Kevin Stone <[email protected]> Added Test Case for pandas-dev#3609. Signed-off-by: Kevin Stone <[email protected]> Fixes Grouping by Period with Timezones The timestamp generated to partition the data frame doesn't include timezone information, so it was creating the wrong groups. It also had the frequency ('D') hard coded. Fixes pandas-dev#5340 and pandas-dev#3609. Signed-off-by: Kevin Stone <[email protected]>

cpcloud mentioned this issue May 15, 2013

DatetimeIndexes with day frequency hang when they have more than 2 elements #3612

Closed

cpcloud mentioned this issue Jul 22, 2013

ENH: add expression evaluation functionality via eval #4164

Merged

64 tasks

ghost assigned cpcloud Oct 4, 2013

cpcloud mentioned this issue Nov 4, 2013

Resampling a Series with a timezone using kind='period' Crashes with ~6000 Values #5430

Closed

kevinastone mentioned this issue Nov 4, 2013

BUG: fix Resampling a Series with a timezone using kind='period' (GH5430) #5432

Merged

jtratner closed this as completed in #5432 Nov 6, 2013

wesm unassigned cpcloud Oct 12, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"AssertionError: Index length did not match values" when resampling with kind='period' #3609

"AssertionError: Index length did not match values" when resampling with kind='period' #3609

JackKelly commented May 15, 2013

jreback commented May 15, 2013

JackKelly commented May 15, 2013

cpcloud commented May 15, 2013

jreback commented May 15, 2013

cpcloud commented May 15, 2013

jreback commented May 15, 2013

cpcloud commented May 15, 2013

cpcloud commented May 15, 2013

jreback commented May 15, 2013

cpcloud commented Jun 22, 2013

cpcloud commented Oct 4, 2013

cpcloud commented Oct 4, 2013

jreback commented Oct 4, 2013

"AssertionError: Index length did not match values" when resampling with kind='period' #3609

"AssertionError: Index length did not match values" when resampling with kind='period' #3609

Comments

JackKelly commented May 15, 2013

The bug

A workaround / expected behaviour

More information

jreback commented May 15, 2013

JackKelly commented May 15, 2013

cpcloud commented May 15, 2013

jreback commented May 15, 2013

cpcloud commented May 15, 2013

jreback commented May 15, 2013

cpcloud commented May 15, 2013

cpcloud commented May 15, 2013

jreback commented May 15, 2013

cpcloud commented Jun 22, 2013

cpcloud commented Oct 4, 2013

cpcloud commented Oct 4, 2013

jreback commented Oct 4, 2013