Skip to content

Plotly fails when using "scattergl" and value greater than 2^24 [Only on M1] #6820

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ravikirankhareedi opened this issue Mar 2, 2022 · 16 comments · Fixed by #6830
Closed
Labels
bug something broken P2 considered for next cycle sev-2 serious problem

Comments

@ravikirankhareedi
Copy link

When plotting using Plotly.newplot() and type scattergl on M1 machine, the plotting fails if the value of x-axis is greater than 2^24.
Example:

var data = [
  {
    x: [16777217],
    y: [1],
    type: "scattergl",
  }
];

Plotly.newPlot('myDiv', data);

Screen Shot 2022-03-01 at 9 27 12 PM

The data gets plotted on the margin which cannot be zoomed/panned. (16777217 is 2^24 + 1)
Attached screen shot. The issue only happens on M1 machine. On Intel hardware everything works well.

@c-lyu
Copy link

c-lyu commented Jul 5, 2022

The same issue here.
This issue happens only when using the WebGL mode. The plot becomes normal when manually switched back to the SVG mode.

image

image

@grorg
Copy link

grorg commented Oct 5, 2022

This reproduces in both Chrome and Safari on M1 devices, despite them using different WebGL backends. Could plotly be sending different WebGL commands based on the architecture?

@grorg
Copy link

grorg commented Oct 5, 2022

Or is it regl?

@djg
Copy link

djg commented Nov 24, 2022

Safari's implementation of WebGL passes the -ffast-math option to the Metal compiler and this is causing the vertex shader to be rewritten from (just concentrating on the x coord of the vertex position):

position.x =
        (((_ux + _utranslate.x) * _uscale.x) +
        ((_uxFract + _utranslateFract.x) * _uscale.x) +
        ((_ux + _utranslate.x) * _uscaleFract.x) +
        ((_uxFract + _utranslateFract.x) * _uscaleFract.x)) * 2.0 - 1.0f;

to:

position.x = (2.0 * (_uscaleFract.x + _uscale.x)) * ((_utranslateFract.x + _utranslate.x) + (_uxFract + _ux)) - 1.0

This change in the order of floating-point operations leads to a difference in the calculation causing an incorrect position:

position.x = (2.0 * (_uscaleFract.x + _uscale.x)) * ((_utranslateFract.x + _utranslate.x) + (_uxFract + _ux)) - 1.0
= (2.0 * (0.0 + 0.5)) * ((0.0 + -16777216.0) + (1.0 + 1677216.0)) - 1.0

The position and translation 16777216.0 are exactly representable because they’re power-of-2 (2 ** 24), but a 32-bit float can’t represent 16777217.0, so 16777216.0 + 1.0 becomes 16777216.0.

= (2.0 * 0.5) * (-16777216.0 + 16777216) - 1.0
= 1.0 * 0.0 - 1.0
= -1.0

Which is the incorrect position shown in the initial bug report.

As a workaround, -ffast-math can be disabled by marking gl_Position as invariant in the GLSL shader, but it should be noted that this comes with a potential impact on the performance of the shader.

@timon-schmelzer-gcx
Copy link

I have the same issue for both, Safari + Chrome and M1 Pro chip. One should note that this is also a problem when plotting a time series on the x axis, as the timestamps are internally represented as large numbers (I guess unix timestamps?).

@rickywesker
Copy link

Same issue here for M1 Pro chip + Chrome. Everything's fine using devices without the new M1 chip.
M1
img_v2_c9e5de32-8fda-425f-bb84-dd670cf1c3bg
Other devices
image

@grorg
Copy link

grorg commented Jan 9, 2023

Yeah, @djg pointed out what the problem is, and why it only affects M1 devices. (well technically it could impact any device, but luckily/unluckily doesn't)

@nicolaskruchten
Copy link
Contributor

@alexcjohnson @archmoj can you guys weigh in plz?

@alexcjohnson
Copy link
Collaborator

As a workaround, -ffast-math can be disabled by marking gl_Position as invariant in the GLSL shader, but it should be noted that this comes with a potential impact on the performance of the shader.

@djg thanks for the tip! I'm not too worried about the performance implications, in most practical cases we're limited more by data transfer than by the shaders for these charts, but of course this remains to be tested. I'd like to understand the extent of this a bit better though before we move on to implementation, and I don't have access to an M1 device so would be grateful if others could chime in:

  • Does it apply to the 3D trace types as well, if you give them the same datetime data as 2D?
  • Can we replicate the problem with numeric axes as well? ie reduced precision if you have data with small variations around a large baseline? Looks like on M1 we drop down to ~10^7 (24 bits) resolution on the absolute data. Here's a codepen you can use to test: https://codepen.io/alexcjohnson/pen/RwJQGdZ - on my machine with a baseline of 10^9 I have no problem with integer resolution, but if I increase the baseline to 10^17 I see the granularity, and oddly it's different for markers and lines:

Screenshot 2023-01-11 at 13 27 22

@justinjhendrick
Copy link

justinjhendrick commented Dec 7, 2023

Can we replicate the problem with numeric axes as well?

Yes. Here's what your example looks like on my M1 mac (v13.4.1) with Firefox (v120.0.1) with Plotly (v2.27.1). I changed the markers to red to make them easier to see.

Screenshot 2023-12-07 at 11 48 19 AM

@justinjhendrick
Copy link

justinjhendrick commented Dec 7, 2023

Does it apply to the 3D trace types as well?

I'm seeing precision issues with 3D traces, but not specific to M1 macs. I couldn't find a difference in behavior between M1 Mac and Intel Windows 10. At the 1e9 baseline, both windows and mac are falling down.

Here's Intel Windows 10 (v22H2) on Firefox (v120.0.1) with Plotly (v2.27.1):

Screenshot 2023-12-07 at 12 15 10 PM

And Here's M1 mac (v13.4.1) with Firefox (v120.0.1) with Plotly (v2.27.1):

Screenshot 2023-12-07 at 12 15 44 PM

@justinjhendrick
Copy link

As a workaround, -ffast-math can be disabled by marking gl_Position as invariant in the GLSL shader, but it should be noted that this comes with a potential impact on the performance of the shader.

I can confirm that @djg's suggestion fixed the issue for me. I added invariant gl_Position; after these two lines in these shaders: regl-scatter2d/marker-vert.glsl and regl-scatter2d/marker-vert.glsl. With those changed, scattergl was plotting points in the correct places with x values near 1e9.

@Coding-with-Adam Coding-with-Adam transferred this issue from plotly/plotly.py Dec 15, 2023
@Coding-with-Adam Coding-with-Adam added bug something broken sev-2 serious problem P2 considered for next cycle labels Dec 15, 2023
@Coding-with-Adam
Copy link
Contributor

hi @justinjhendrick
Thank you for confirming the solution. Would you be able to create a pull request with the fix?

@justinjhendrick
Copy link

justinjhendrick commented Dec 22, 2023

@Coding-with-Adam, yes, I'll give it a shot!

I have a couple questions about how to do this:

  1. I'm going to try to fix this for all the relevant shaders, but I'm not totally sure how to find all the affected shaders and a good test case for each one. I might need to start by just fixing this on just scattergl. And fix each webgl plot type, one at a time with multiple PRs.
  2. Some of the affected shaders are in dependent packages (such as https://github.com/gl-vis/regl-scatter2d). Are the maintainers of these dependencies as motivated to fix this as plotly devs are?
  3. [Related to point 2] is there a performance penalty? If so, how big? According to this comment, plotly is likely not bottlenecked on webgl render. But if I'm making changes to dependencies, other packages might have bottlenecks in different places. Are there benchmarks I can use to quantify the performance penalty?

@archmoj
Copy link
Contributor

archmoj commented Dec 22, 2023

@Coding-with-Adam, yes, I'll give it a shot!

I have a couple questions about how to do this:

  1. I'm going to try to fix this for all the relevant shaders, but I'm not totally sure how to find all the affected shaders and a good test case for each one. I might need to start by just fixing this on just scattergl. And fix each webgl plot type, one at a time with multiple PRs.
  2. Some of the affected shaders are in dependent packages (such as https://github.com/gl-vis/regl-scatter2d). Are the maintainers of these dependencies as motivated to fix this as plotly devs are?
  3. [Related to point 2] is there a performance penalty? If so, how big? According to this comment, plotly is likely not bottlenecked on webgl render. But if I'm making changes to dependencies, other packages might have bottlenecks in different places. Are there benchmarks I can use to quantify the performance penalty?

@justinjhendrick we also maintain gl-vis modules. Let's start by the minimal PR to regl-scatter2d and provide a link to this issue in its description. After that we will could have a look at rendering times on CircleCI, etc. But at this point I'm not too worried about the performance side effect.
Thank you!

@justinjhendrick
Copy link

justinjhendrick commented Dec 27, 2023

In response to point 1 from my earlier comment, I went looking for other affected WebGL trace types in addition to scattergl. I couldn't find any others that are affected by this same bug. I checked the following trace types:

        type: "streamtube",
        type: "cone",
        type: "mesh3d",
        type: "heatmapgl",
        type: "surface",
        type: "parcoords",
        type: "scattergl",
        type: "scatterpolargl",
        type: "scatter3d",
        type: "splom",

I found precision issues generally with x-values near 1e9. Some of them seem like the integers are being converted to float32s and we're losing precision that way (similar to the scatter3d example from this comment). But I couldn't find any issues that were M1 Mac specific.

Somewhat surprisingly, splom was not fixed by this change, despite also depending on regl-scatter2d. But I can repro that on Windows, so I guess that's being caused by something else.

Can anyone else find M1 mac specific precision issues that aren't fixed by gl-vis/regl-scatter2d#36? If not, I recommend closing this ticket and tracking down the non-platform-specific precision bugs in another issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug something broken P2 considered for next cycle sev-2 serious problem
Projects
None yet
Development

Successfully merging a pull request may close this issue.