Skip to content

xarray 2022.10.0 much slower then 2022.6.0 #7181

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dschwoerer opened this issue Oct 18, 2022 · 17 comments · Fixed by #7209
Closed

xarray 2022.10.0 much slower then 2022.6.0 #7181

dschwoerer opened this issue Oct 18, 2022 · 17 comments · Fixed by #7209

Comments

@dschwoerer
Copy link
Contributor

What is your issue?

xbout's test suite finishes with 2022.6.0 in less than an our, with 2022.10.0 it gets aborted after 6 hours.

I haven't managed to debug what is the issue.

Git bisect will not work, as 2022.9.0 is broken due to #7111

@dschwoerer dschwoerer added the needs triage Issue that has not been reviewed by xarray team member label Oct 18, 2022
@ZedThree
Copy link
Contributor

Here's the output of pytest --profile* on a single test:

With xarray 2022.10 (re-read the .prof file with pstats module to print more functions):

         317962521 function calls (279203754 primitive calls) in 148.688 seconds

   Ordered by: cumulative time
   List reduced from 2791 to 42 due to restriction <0.015>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000  148.690  148.690 runner.py:108(pytest_runtest_protocol)
    30/11    0.000    0.000  148.690   13.517 _hooks.py:244(__call__)
    30/11    0.000    0.000  148.690   13.517 _manager.py:77(_hookexec)
    30/11    0.000    0.000  148.690   13.517 _callers.py:9(_multicall)
        1    0.000    0.000  148.690  148.690 runner.py:116(runtestprotocol)
        3    0.000    0.000  148.690   49.563 runner.py:216(call_and_report)
        3    0.000    0.000  148.689   49.563 runner.py:244(call_runtest_hook)
        3    0.000    0.000  148.689   49.563 runner.py:315(from_call)
        3    0.000    0.000  148.689   49.563 runner.py:259(<lambda>)
        1    0.000    0.000  148.687  148.687 runner.py:157(pytest_runtest_call)
        1    0.000    0.000  148.687  148.687 python.py:1759(runtest)
        1    0.000    0.000  148.687  148.687 python.py:185(pytest_pyfunc_call)
        1    0.002    0.002  148.687  148.687 test_plot.py:18(test_region_disconnecteddoublenull)
31664481/29029   33.752    0.000  140.450    0.005 copy.py:128(deepcopy)
4099751/20042    9.002    0.000  140.195    0.007 copy.py:227(_deepcopy_dict)
       17    0.003    0.000  139.922    8.231 plotfuncs.py:79(plot2d_wrapper)
166283/166220    0.723    0.000  137.501    0.001 copy.py:259(_reconstruct)
650506/181818    4.208    0.000  136.386    0.001 dataarray.py:1179(_copy)
3330522/909322    7.121    0.000  133.892    0.000 variable.py:985(_copy)
   649848    0.463    0.000  124.588    0.000 dataarray.py:1200(__deepcopy__)
   326/86    0.001    0.000  120.356    1.399 boutdataarray.py:205(from_region)
   326/86    0.138    0.000  120.356    1.399 region.py:1713(_from_region)
       17    0.000    0.000  114.522    6.737 utils.py:65(_decompose_regions)
       17    0.000    0.000  114.520    6.736 utils.py:69(<dictcomp>)
      658    0.001    0.000  102.625    0.156 dataarray.py:1109(copy)
        6    0.000    0.000   47.508    7.918 boutdataarray.py:1062(contourf)
        6    0.000    0.000   46.851    7.808 boutdataarray.py:1068(pcolormesh)
        5    0.000    0.000   45.564    9.113 boutdataarray.py:1056(contour)
  3342076    5.816    0.000   44.786    0.000 variable.py:1018(_replace)
  3355730    4.448    0.000   36.445    0.000 variable.py:297(__init__)
     3642    0.115    0.000   34.953    0.010 variable.py:2897(copy)
  326/242    0.005    0.000   33.692    0.139 region.py:1258(_concat_inner_guards)
  326/242    0.003    0.000   33.212    0.137 region.py:1479(_concat_lower_guards)
  326/242    0.003    0.000   32.887    0.136 region.py:1596(_concat_upper_guards)
  326/242    0.004    0.000   31.750    0.131 region.py:1369(_concat_outer_guards)
  3356435   14.551    0.000   23.840    0.000 variable.py:192(as_compatible_data)
     1936    0.005    0.000   23.714    0.012 alignment.py:567(align)
     1936    0.010    0.000   23.673    0.012 alignment.py:549(align)
     1936    0.007    0.000   23.216    0.012 alignment.py:541(reindex_all)
     4769    0.006    0.000   23.208    0.005 alignment.py:542(<genexpr>)
     2833    0.013    0.000   23.203    0.008 alignment.py:521(_reindex_one)
     2833    0.007    0.000   18.699    0.007 alignment.py:498(_get_indexes_and_vars)

xarray 2022.06:

         11730856 function calls (11602239 primitive calls) in 7.233 seconds

   Ordered by: cumulative time
   List reduced from 2759 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    7.236    7.236 runner.py:108(pytest_runtest_protocol)
    30/11    0.000    0.000    7.235    0.658 _hooks.py:244(__call__)
    30/11    0.000    0.000    7.235    0.658 _manager.py:77(_hookexec)
    30/11    0.000    0.000    7.235    0.658 _callers.py:9(_multicall)
        1    0.000    0.000    7.235    7.235 runner.py:116(runtestprotocol)
        3    0.000    0.000    7.235    2.412 runner.py:216(call_and_report)
        3    0.000    0.000    7.235    2.412 runner.py:244(call_runtest_hook)
        3    0.000    0.000    7.235    2.412 runner.py:315(from_call)
        3    0.000    0.000    7.235    2.412 runner.py:259(<lambda>)
        1    0.000    0.000    7.233    7.233 runner.py:157(pytest_runtest_call)
        1    0.000    0.000    7.233    7.233 python.py:1759(runtest)
        1    0.000    0.000    7.233    7.233 python.py:185(pytest_pyfunc_call)
        1    0.001    0.001    7.232    7.232 test_plot.py:18(test_region_disconnecteddoublenull)
       17    0.001    0.000    4.878    0.287 plotfuncs.py:79(plot2d_wrapper)
   326/86    0.000    0.000    2.644    0.031 boutdataarray.py:205(from_region)
   326/86    0.009    0.000    2.644    0.031 region.py:1713(_from_region)
       17    0.000    0.000    2.604    0.153 utils.py:65(_decompose_regions)
       17    0.000    0.000    2.602    0.153 utils.py:69(<dictcomp>)
     1249    0.017    0.000    1.807    0.001 merge.py:696(merge_core)
        5    0.000    0.000    1.710    0.342 boutdataarray.py:1056(contour)

It looks like it's basically all coming from copying dataarrays -- has there been some change in expected behaviour that we need to update our code?


* pytest -k test_region_disconnecteddoublenull[lower-disconnected-double-null-0-False-False-guards0] --profile --long --pstats-dir=prof2022.10

@headtr1ck
Copy link
Collaborator

Yes the behavior of deep copy was changed.
But I don't see why it should affect runtime since we only pass on the memo dict now which is required for recursive structures.

Maybe there is some bug in the logic, or maybe before it was not doing true deep copies?

@TomNicholas TomNicholas added topic-performance regression and removed needs triage Issue that has not been reviewed by xarray team member labels Oct 18, 2022
@ZedThree
Copy link
Contributor

Our tests all pass, but there is something like a x20 slowdown, and it's basically entirely due to copies. It's plausible we're doing way too many copies as it is, but this is obviously still concerning.

I tried adding the following asv benchmark, based off the combine benchmark and the deepcopy tests:

class Copy:
    def setup(self):
        """Create 4 datasets with two different variables"""

        t_size, x_size, y_size = 50, 450, 400
        t = np.arange(t_size)
        data = np.random.randn(t_size, x_size, y_size)

        self.ds = xr.Dataset(
            {"A": xr.DataArray(data, coords={"T": t}, dims=("T", "X", "Y"))}
        )

    def time_copy(self) -> None:
        copy(self.ds)

    def time_deepcopy(self) -> None:
        deepcopy(self.ds)

    def time_copy_method(self) -> None:
        self.ds.copy(deep=False)

    def time_copy_method_deep(self) -> None:
        self.ds.copy(deep=True)

But I didn't see any regressions between v2022.06.0..HEAD, so that simplistic case is clearly not enough.

There are a few differences between our test datasets and the one in the benchmark above:

  • 4D vs 3D
  • smaller grid: (2, 5, 4, 3) vs (50, 450, 400)
  • more variables: ~20 vs 1
  • variable attributes vs none
  • multiple files read in via open_mfdataset vs entirely in-memory

@keewis
Copy link
Collaborator

keewis commented Oct 18, 2022

the only difference you should see is in the deep-copy of attrs and encoding dictionaries. Could check if the attributes in your tests are complex, big, deeply nested, and / or recursive?

@ZedThree
Copy link
Contributor

Nope, either single values or flat dicts:

In [6]: ds.n
Out[6]: 
<xarray.DataArray 'n' (t: 6, x: 6, y: 12, z: 7)>
dask.array<concatenate, shape=(6, 6, 12, 7), dtype=float64, chunksize=(6, 2, 4, 7), chunktype=numpy.ndarray>
Coordinates:
    dx       (x, y) float64 dask.array<chunksize=(2, 4), meta=np.ndarray>
    dy       (x, y) float64 dask.array<chunksize=(2, 4), meta=np.ndarray>
    dz       float64 0.8976
  * t        (t) float64 0.0 10.0 20.0 30.0 40.0 50.0
  * x        (x) int64 0 1 2 3 4 5
  * y        (y) float64 1.0 3.0 5.0 7.0 9.0 11.0 13.0 15.0 17.0 19.0 21.0 23.0
  * z        (z) float64 0.0 0.8976 1.795 2.693 3.59 4.488 5.386
Attributes:
    direction_y:    Standard
    cell_location:  CELL_CENTRE
    direction_z:    Standard
    metadata:       {'BOUT_VERSION': 4.3, 'use_metric_3d': 0, 'NXPE': 3, 'NYP...
    options:        None
    geometry:

Our coordinates are dask.array though -- is that likely to affect things?

@headtr1ck
Copy link
Collaborator

You call DataArray.copy 658 times, that seems ok for a test suite. But I don't understand why it calls DataArray.__deepcopy__ 650k times.
There must be something that deepcopies you arrays all over the place...

@dschwoerer
Copy link
Contributor Author

@headtr1ck
Copy link
Collaborator

I see, I think the problem is Variable.to_index_variable is deep copying.
Probably the fault of deep=True being the default for copy...

@headtr1ck headtr1ck mentioned this issue Oct 24, 2022
3 tasks
@headtr1ck
Copy link
Collaborator

headtr1ck commented Oct 24, 2022

@dschwoerer could you test if you see any improvement with this PR: #7209 ?

The graph is not super helpfull, since most calls to deep_copy are indirectly from deep_copy itself. I tried my best to find some points of improvement.

@ZedThree
Copy link
Contributor

@headtr1ck It does seem to have helped a bit, but looks like it's still around x15 slower:

         243192363 function calls (213211086 primitive calls) in 118.014 seconds

   Ordered by: cumulative time
   List reduced from 2819 to 42 due to restriction <0.015>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000  118.017  118.017 runner.py:108(pytest_runtest_protocol)
    30/11    0.000    0.000  118.016   10.729 _hooks.py:244(__call__)
    30/11    0.000    0.000  118.016   10.729 _manager.py:77(_hookexec)
    30/11    0.000    0.000  118.016   10.729 _callers.py:9(_multicall)
        1    0.000    0.000  118.016  118.016 runner.py:116(runtestprotocol)
        3    0.000    0.000  118.016   39.339 runner.py:216(call_and_report)
        3    0.000    0.000  118.015   39.338 runner.py:244(call_runtest_hook)
        3    0.000    0.000  118.015   39.338 runner.py:315(from_call)
        3    0.000    0.000  118.015   39.338 runner.py:259(<lambda>)
        1    0.000    0.000  118.013  118.013 runner.py:157(pytest_runtest_call)
        1    0.000    0.000  118.013  118.013 python.py:1759(runtest)
        1    0.000    0.000  118.013  118.013 python.py:185(pytest_pyfunc_call)
        1    0.002    0.002  118.012  118.012 test_plot.py:18(test_region_disconnecteddoublenull)
2534922/113722    5.553    0.000  111.083    0.001 variable.py:985(_copy)
23917714/22342   26.196    0.000  109.780    0.005 copy.py:128(deepcopy)
491386/22698    3.290    0.000  109.750    0.005 dataarray.py:1179(_copy)
3096327/15601    6.894    0.000  109.589    0.007 copy.py:227(_deepcopy_dict)
       17    0.003    0.000  109.229    6.425 plotfuncs.py:79(plot2d_wrapper)
   326/86    0.001    0.000  108.467    1.261 boutdataarray.py:205(from_region)
   326/86    0.047    0.000  108.467    1.261 region.py:1713(_from_region)
   125079    0.576    0.000  107.460    0.001 copy.py:259(_reconstruct)
      658    0.001    0.000  105.804    0.161 dataarray.py:1109(copy)
       17    0.000    0.000  102.369    6.022 utils.py:65(_decompose_regions)
       17    0.000    0.000  102.367    6.022 utils.py:69(<dictcomp>)
   490728    0.375    0.000   97.421    0.000 dataarray.py:1200(__deepcopy__)
        6    0.000    0.000   36.768    6.128 boutdataarray.py:1062(contourf)
        6    0.000    0.000   36.750    6.125 boutdataarray.py:1068(pcolormesh)
        5    0.000    0.000   35.711    7.142 boutdataarray.py:1056(contour)
  2546476    4.140    0.000   34.363    0.000 variable.py:1018(_replace)
  326/242    0.004    0.000   28.475    0.118 region.py:1369(_concat_outer_guards)
  2560130    3.442    0.000   28.335    0.000 variable.py:297(__init__)
  326/242    0.003    0.000   28.184    0.116 region.py:1596(_concat_upper_guards)
  326/242    0.005    0.000   28.105    0.116 region.py:1258(_concat_inner_guards)
  326/242    0.003    0.000   27.939    0.115 region.py:1479(_concat_lower_guards)
  2560835   11.470    0.000   18.445    0.000 variable.py:192(as_compatible_data)
   495517    1.986    0.000    7.439    0.000 indexes.py:1275(copy_indexes)
  6174487    4.905    0.000    6.233    0.000 copy.py:243(_keep_alive)
     1936    0.005    0.000    5.797    0.003 alignment.py:567(align)
     1936    0.010    0.000    5.757    0.003 alignment.py:549(align)
   497497    4.988    0.000    5.341    0.000 dataarray.py:893(xindexes)
     1936    0.007    0.000    5.297    0.003 alignment.py:541(reindex_all)
     4769    0.004    0.000    5.290    0.001 alignment.py:542(<genexpr>)

I think previously dataarray.py:1109(copy) was the main culprit and now it's dataarray.py:1200(__deepcopy__)

@dschwoerer
Copy link
Contributor Author

Indeed, it does help. In 6 hours the CI completed 50% of the tests, compared to 17%.

This however still very much slower than before, where we finished in around half an hour - so around 24x slower ...

@ZedThree
Copy link
Contributor

Looking at the call tree (sorry, big image!):

combined

there's a couple of recursive calls that have all the time:

  • the tight one from copy:128:deepcopy to copy:227:_deepcopy_dict
  • the larger one from variable:985:_copy to dataarray:1179:_copy

Is this implying we do actually have some sort of complex or recursive structure hiding in our datasets?

@headtr1ck
Copy link
Collaborator

Every time I look at this graph Im getting more confused, haha.
I still don't know why copy.deepcopy is called 24mio times.

@headtr1ck
Copy link
Collaborator

headtr1ck commented Oct 25, 2022

You call DataArray.copy ~650 times which sounds reasonable (you could check if you really need deep copies!).

But then somehow DataArray._copy is called ~500k times. So somewhere you must have some DataArrays in places where there should be none (in attrs, e.g.).

There are some copy calls from alignments, but I don't see how that adds up to this number...

@ZedThree
Copy link
Contributor

But then somehow DataArray._copy is called ~500k times. So somewhere you must have some DataArrays in places where there should be none (in attrs, e.g.)

I think it has to be something like that. I'll see if I can dig into what these Datasets look like.

I'm sure we can get away with fewer deep copies here. I think it's something like we're caching generated Datasets between different test cases, but some of them need to modify them so they're deep copied. I guess previously that was fine, but maybe now it's just cheaper to make new instances from scratch.

@ZedThree
Copy link
Contributor

Yes, it turns out we do have Datasets with DataArrays in attributes. They're not nested very deep, and they're not cyclic, but it looks like there might be two or three levels of nesting.

Is this nesting now disallowed?

@headtr1ck
Copy link
Collaborator

Is this nesting now disallowed?

No, it is allowed.
But a deep-copy will also deep-copy the attrs now, and therefore the DataArrays in there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants