Skip to content

Different behavior with hf_x with type int64 vs float64 #63

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
PandaGab opened this issue May 18, 2022 · 3 comments
Closed

Different behavior with hf_x with type int64 vs float64 #63

PandaGab opened this issue May 18, 2022 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@PandaGab
Copy link

PandaGab commented May 18, 2022

Hello !
This is a really neat project ! Great job !

I have experienced a weird behavior. I work in a jupyter notebook inside vscode. The data has a shape of (6261089,).

This snippet work perfectly fine.

fig = FigureResampler(go.Figure())
fig.add_trace(go.Scattergl(name='Trace', showlegend=True), hf_x=x.astype(np.int64), hf_y=raw_data)
fig.show_dash(mode='inline')

But when I changed the parameter hf_x=x.astype(np.float64)

fig = FigureResampler(go.Figure())
fig.add_trace(go.Scattergl(name='Trace', showlegend=True), hf_x=x.astype(np.float64), hf_y=raw_data)
fig.show_dash(mode='inline')

I get the following error :

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~/data_inspection.ipynb Cell 9' in <cell line: 13>()
     10 print(raw_data.shape, times.shape)
     12 fig = FigureResampler(go.Figure())
---> 13 fig.add_trace(go.Scattergl(name='Trace', showlegend=True), hf_x=ranged_arr.astype(np.float64), hf_y=raw_data)
     14 fig.show_dash(mode='inline')

File ~/.venv/lib/python3.9/site-packages/plotly_resampler/figure_resampler/figure_resampler_interface.py:743, in AbstractFigureAggregator.add_trace(self, trace, max_n_samples, downsampler, limit_to_view, hf_x, hf_y, hf_hovertext, **trace_kwargs)
    732 trace = {
    733     k: trace[k]
    734     for k in set(trace.keys()).difference(
    735         {"text", "hovertext", "x", "y"}
    736     )
    737 }
    739 # NOTE:
    740 # If all the raw data needs to be sent to the javascript, and the trace
    741 # is high-frequency, this would take significant time!
    742 # Hence, you first downsample the trace.
--> 743 trace = self._check_update_trace_data(trace)
    744 assert trace is not None
    745 super(self._figure_class, self).add_trace(trace=trace, **trace_kwargs)

File ~/.venv/lib/python3.9/site-packages/plotly_resampler/figure_resampler/figure_resampler_interface.py:240, in AbstractFigureAggregator._check_update_trace_data(self, trace, start, end)
    238 # Downsample the data and store it in the trace-fields
    239 downsampler: AbstractSeriesAggregator = hf_trace_data["downsampler"]
--> 240 s_res: pd.Series = downsampler.aggregate(
    241     hf_series, hf_trace_data["max_n_samples"]
    242 )
    243 trace["x"] = s_res.index
    244 trace["y"] = s_res.values

File ~/.venv/lib/python3.9/site-packages/plotly_resampler/aggregation/aggregation_interface.py:142, in AbstractSeriesAggregator.aggregate(self, s, n_out)
    138     s = s.astype("uint8")
    140 if len(s) > n_out:
    141     # More samples that n_out -> perform data aggregation
--> 142     s = self._aggregate(s, n_out=n_out)
    144     # When data aggregation is performed -> we do not "insert" gaps but replace
    145     # The end of gap periods (i.e. the first non-gap sample) with None to
    146     # induce such gaps
    147     if self.interleave_gaps:

File ~/.venv/lib/python3.9/site-packages/plotly_resampler/aggregation/aggregators.py:240, in EfficientLTTB._aggregate(self, s, n_out)
    238 def _aggregate(self, s: pd.Series, n_out: int) -> pd.Series:
    239     if s.shape[0] > n_out * 1_000:
--> 240         s = self.minmax._aggregate(s, n_out * 50)
    241     return self.lttb._aggregate(s, n_out)

File ~/.venv/lib/python3.9/site-packages/plotly_resampler/aggregation/aggregators.py:134, in MinMaxOverlapAggregator._aggregate(self, s, n_out)
    127 offset = np.arange(
    128     0, stop=s.shape[0] - block_size - argmax_offset, step=block_size
    129 )
    131 # Calculate the argmin & argmax on the reshaped view of `s` &
    132 # add the corresponding offset
    133 argmin = (
--> 134     s[: block_size * offset.shape[0]]
    135     .values.reshape(-1, block_size)
    136     .argmin(axis=1)
    137     + offset
    138 )
    139 argmax = (
    140     s[argmax_offset : block_size * offset.shape[0] + argmax_offset]
    141     .values.reshape(-1, block_size)
   (...)
    144     + argmax_offset
    145 )
    146 # Sort the argmin & argmax (where we append the first and last index item)
    147 # and then slice the original series on these indexes.

ValueError: cannot reshape array of size 6260945 into shape (251)

Thank you !

@jonasvdd
Copy link
Member

jonasvdd commented May 18, 2022

Hey @PandaGab,

Good catch, I'm able to reproduce this issue, I will further look into it! 🔍
Keep you posted!

And as always, thanks for pointing out these bugs, this is the true power of open-source development! 😄

As for now, I can say it has to do with the MinMaxAggregator so when you use another aggregator (e.g. EveryNthPoint), this will not occur!

@jonasvdd jonasvdd self-assigned this May 18, 2022
@jonasvdd jonasvdd added the bug Something isn't working label May 18, 2022
jonasvdd added a commit that referenced this issue May 18, 2022
@jonasvdd
Copy link
Member

Found the problem! Due to the float-index the int based pd.Series slicing is not performed accordingly. The slicing ➡️ .iloc should resolve this!

jonasvdd added a commit that referenced this issue May 18, 2022
@jonasvdd
Copy link
Member

Released a new version 0.6.4.1 should be fixed over there! 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants