Skip to content

Scattergl plot is sometimes slow to interact with even with moderate number of points #5927

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
bruot opened this issue Sep 7, 2021 · 10 comments
Labels
bug something broken P3 backlog performance something is slow

Comments

@bruot
Copy link

bruot commented Sep 7, 2021

With constant number of points and other settings, I see that Scattergl rendering times vary a lot. With 100,000 points, there are cases where it is virtually impossible to interact with the plot (e.g. zoom), and just moving the mouse seem to cause lengthy calculations.

For example, the following code from the doc works well:

import plotly.graph_objects as go
import numpy as np

N = 100000
fig = go.Figure(data=go.Scattergl(
    x = np.random.randn(N),
    y = np.random.randn(N),
    mode='markers',
    marker=dict(
        color=np.random.randn(N),
        colorscale='Viridis',
        line_width=1
    )
))

fig.show()

but if I replace the x data by zeros,

    x = np.zeros(N),

performance is significantly degraded (so much that using Scatter instead of Scattergl becomes better when trying to zoom in a box).

Any ideas where the difference is coming from and how performance could be improved in this case?

Thanks.

@bruot
Copy link
Author

bruot commented Sep 7, 2021

With this x data, things are equivalently slow:

x = np.random.randn(N)
x[0] = 100

Related: #5881. But the problem is not limited to zeros.

@CmpCtrl
Copy link

CmpCtrl commented Sep 15, 2021

Just to add to this, the issue is related to the hover events and is sensitive to the distribution of the data. Randomly distributed data runs very quickly, but real world data tends to be extremely slow. Artificial data like the linear plot in the example below is also slow, as is random data that has an outlier like @bruot's example above.

I came across the issue using Dash, which is new to me, so please excuse my poor diagnostics. Hopefully this can help point someone in the right direction.

I made a sample python script to demonstrate the issue in Dash.

from dash import Dash, callback, html, dcc, Input, Output
import numpy as np

app = Dash(__name__)

layout = html.Div([
    html.H1('Plotly Scatter Perf Demo'),
    
    html.Div([
        dcc.Input(value='100000',id='nSamples',type='number'),
        html.Div([
            dcc.Graph(id='plot1'),
            dcc.Graph(id='plot2'),
        ],id='row')
    ],id='plot')
])

app.layout = layout

@callback([Output('plot1','figure'),Output('plot2','figure')],
    Input('nSamples','value'))
def doPlots(nSamples):
    fig = [{},{}]
    y = np.random.random(int(nSamples))*5000
    y2 = np.linspace(0,5000,int(nSamples))
    
    trace = {'type':'scattergl','y':y,\
        'x0':0,'dx':1,'name':'Random','mode':'markers'}
    layout = {'title':{'text':f'Random Data n:{nSamples}'}}
    fig[0] = dict(data=[trace],layout=layout)
    trace = {'type':'scattergl','y':y2,\
        'x0':0,'dx':1,'name':'Linear','mode':'markers'}
    layout = {'title':{'text':f'linear Data n:{nSamples}'}}
    fig[1] = dict(data=[trace],layout=layout)
    return fig
if __name__ == '__main__':
    app.run_server(debug=True)

In the image below I used Chrome's performance profiling tool. The first 3seconds or so seconds was hovering around over the Random Data plot, that responds fine. Then at about 3s I moved the mouse to hover over the linear data plot where it appears to hang for about 6 seconds before the hover label appears, then around another full second before the hover label updates after a move.
image

It appears to spend most of its time in this L function in the minified async-plotlyjs.js
image

I think this is also related to #1698

Hopefully this helps someone who understands this a lot better than myself diagnose to the issue.

@CmpCtrl
Copy link

CmpCtrl commented Sep 15, 2021

Clearly I shouldn't rely on google to effectively search the issues list... #5790 is also related.

@Entropy5
Copy link

I have the same problem
250k points of my own dataset lags more than 1M random points from their example

@jvdd
Copy link

jvdd commented Mar 2, 2022

It might be interesting to consider plotly-resampler when you want to visualize large datasets.

This extension adds resampling functionality to Plotly figures (by running an under the hood dash app), allowing to visualize tons of datapoints while being responsive.

@siiruli
Copy link

siiruli commented Aug 29, 2023

I isolated the issue to commit 907415a, which changes the default value of spikedistance from 20 to -1. This means there is now no cutoff for searching data for spikelines.

Setting spikedistance: 0 fixed the problem for me. This is weird because my app was not drawing any spikelines, so it shouldn't need to calculate anything related to them. It looks like the code finds the points first and then filters out everything that doesn't allow spikelines.

// Now if there is range to look in, find the points to draw the spikelines
// Do it only if there is no hoverData
if(hasCartesian && (spikedistance !== 0)) {
if(hoverData.length === 0) {
pointData.distance = spikedistance;
pointData.index = false;
var closestPoints = trace._module.hoverPoints(pointData, xval, yval, 'closest', {
hoverLayer: fullLayout._hoverlayer
});
if(closestPoints) {
closestPoints = closestPoints.filter(function(point) {
// some hover points, like scatter fills, do not allow spikes,
// so will generate a hover point but without a valid spikeDistance
return point.spikeDistance <= spikedistance;
});
}
if(closestPoints && closestPoints.length) {
var tmpPoint;
var closestVPoints = closestPoints.filter(function(point) {
return point.xa.showspikes && point.xa.spikesnap !== 'hovered data';
});

Setting hovermode: 'x' helped slightly but not enough, and disabling hovering also fixed the issue, but hoverlabels were necessary in my application.

I'm pretty sure that issues #6792, #5881, #6054, and maybe #5790 are also caused by this. The change was added in plotly.js version v2.0.0-rc.2 and affects dash versions 1.21.0 and later.

@CmpCtrl
Copy link

CmpCtrl commented Aug 30, 2023

Cool, glad to hear about some progress on this issue. I haven't had a chance to work with it recently (and won't in the near term) but i'm excited to look into it further. Thanks.

@CmpCtrl
Copy link

CmpCtrl commented Jun 23, 2024

Another year later, and i'm surprised this isn't a bigger issue for more users. I took a quick look at issues and one more that i think is related is #6174.

@siiruli's find of the default spike distance seems to fix the issue for me. Setting the default back to 20pixels works great. Does anyone know why that shouldn't be reverted?

@gvwilson gvwilson self-assigned this Jul 12, 2024
@gvwilson gvwilson removed their assignment Aug 2, 2024
@gvwilson gvwilson added bug something broken P3 backlog performance something is slow labels Aug 9, 2024
@CmpCtrl
Copy link

CmpCtrl commented Nov 22, 2024

@gvwilson Any updates on this? It seems like a straightforward fix to revert the spikedistance to 20 pixels. Is there a reason not to do that?

A quick glance at more recent issues, it seems like #7065 may be related.

@gvwilson
Copy link
Contributor

HI @CmpCtrl - I'm sorry we haven't been able to get to it, but we're trying to wrap up the next major release of plotly.js (which I hope will be done next week, but I've been saying that for several weeks now). After that we have a handful of high-priority fixes for internal needs; I hope we can prioritize this once those are out of the way. Thanks - @gvwilson

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug something broken P3 backlog performance something is slow
Projects
None yet
Development

No branches or pull requests

6 participants