Rewrite interp to use `apply_ufunc` #9881

dcherian · 2024-12-13T06:31:59Z

Removes a bunch of complexity around interpolating dask arrays by using apply_ufunc instead of blockwise directly.
A major improvement is that we can now use vectorize=True to get sane dask graphs for vectorized interpolation to chunked arrays (interp performance with chunked dimensions #6799 (comment))
Added a bunch of typing.
Happily this fixes Interpolation with multiple mutlidimensional arrays sharing dims fails #4463

Closes interp performance with chunked dimensions #6799 (comment)
Closes Interpolation with multiple mutlidimensional arrays sharing dims fails #4463
Tests added
User visible changes (including notable bug fixes) are documented in whats-new.rst
New functions/methods are listed in api.rst

cc @ks905383 your vectorized interpolation example now has this graph:

instead of this quadratic monstrosity

dcherian · 2024-12-13T06:32:34Z

xarray/core/dataset.py

@@ -4127,18 +4119,6 @@ def interp(

        coords = either_dict_or_kwargs(coords, coords_kwargs, "interp")
        indexers = dict(self._validate_interp_indexers(coords))
-
-        if coords:


Handled by vectorize=True. This is possibly a perf regression with numpy arrays, but a massive improvement with chunked arrays.

For posterity the bad thing about this approach is that it can greatly expand the number of core dimensions for the problem, limiting the potential for parallelism.

Consider the problem in #6799 (comment). In the following, dimension names are listed out in [].

da[time, q, lat, lon].interp(q=bar[lat,lon]) gets rewritten to da[time,q,lat,lon].interp(q=bar[lat, lon], lat=lat[lat], lon=lon[lon]) which thanks to our automatic rechunking, makes dask merge chunks in lat, lon too, for no benefit.

xarray/core/missing.py

dcherian · 2024-12-13T06:33:17Z

xarray/core/missing.py

-def _chunked_aware_interpnd(var, *coords, interp_func, interp_kwargs, localize=True):
-    """Wrapper for `_interpnd` through `blockwise` for chunked arrays.
-
+def _interpnd(


I merged in two functions to reduce indirection and make it easier to read.

xarray/tests/test_interp.py

dcherian · 2024-12-13T17:14:13Z

xarray/core/missing.py

+        exclude_dims=all_in_core_dims,
+        dask="parallelized",
+        kwargs=dict(interp_func=func, interp_kwargs=kwargs),
+        dask_gufunc_kwargs=dict(output_sizes=output_sizes, allow_rechunk=True),


allow_rechunk=True matches the current behaviour where we rechunk along all core dimensions to a single chunk.

Closes pydata#4463

for more information, see https://pre-commit.ci

dcherian · 2024-12-17T17:15:53Z

Merging on thursday if there are no comments.

IMO this is a big win for maintainability.

headtr1ck · 2024-12-17T17:53:58Z

xarray/core/missing.py

@@ -566,29 +577,30 @@ def _get_valid_fill_mask(arr, dim, limit):
    ) <= limit


-def _localize(var, indexes_coords):
+def _localize(obj: T, indexes_coords: SourceDest) -> tuple[T, SourceDest]:


Probably should use T_Xarray instead of a plain T to get rid of the type ignore at return.

That doesn't have Variable, so I'd have to make a new T_DatasetOrVariable or a protocol with .isel perhaps?

xarray/core/missing.py

xarray/tests/test_interp.py

Co-authored-by: Michael Niklas <[email protected]>

Illviljan

Benchmarks still looks good. Nice work!

xarray/core/dataset.py

xarray/core/missing.py

Illviljan · 2024-12-19T00:04:18Z

xarray/core/missing.py

+        # TODO: narrow interp_func to interpolator here
+        return _interp1d(var, x_list, new_x_list, interp_func, interp_kwargs)  # type: ignore[arg-type]


Mypy is correct to error here right?
_interp1d calls interp_func(...)(....) and that should crash with a InterpCallable?
Is there a pytest with interp_func: InterpCallable?
Is InterpCallable necessary? Would be nice to just remove it...

it depends on whether we end up using get_interpolator or get_interpolator_nd. I'm sure there's a test but can't remember which off the top of my head.

Co-authored-by: Illviljan <[email protected]>

for more information, see https://pre-commit.ci

Co-authored-by: Illviljan <[email protected]>

for more information, see https://pre-commit.ci

This reverts commit 1b9845d.

* main: (63 commits) Fix zarr upstream tests (pydata#9927) Update pre-commit hooks (pydata#9925) split out CFDatetimeCoder, deprecate use_cftime as kwarg (pydata#9901) dev whats-new (pydata#9923) Whats-new 2025.01.0 (pydata#9919) Silence upstream Zarr warnings (pydata#9920) time coding refactor (pydata#9906) fix warning from scipy backend guess_can_open on directory (pydata#9911) Enhance and move ISO-8601 parser to coding.times (pydata#9899) Edit serialization error message (pydata#9916) friendlier error messages for missing chunk managers (pydata#9676) Bump codecov/codecov-action from 5.1.1 to 5.1.2 in the actions group (pydata#9915) Rewrite interp to use `apply_ufunc` (pydata#9881) Skip dask rolling (pydata#9909) Explicitly configure ReadTheDocs build to use conf.py (pydata#9908) Cache pre-existing Zarr arrays in Zarr backend (pydata#9861) Optimize idxmin, idxmax with dask (pydata#9800) remove unused "type: ignore" comments in test_plot.py (fixed in matplotlib 3.10.0) (pydata#9904) move scalar-handling logic into `possibly_convert_objects` (pydata#9900) Add missing DataTree attributes to docs (pydata#9876) ...

Closes pydata#10287

* main: Fix performance regression in interp from pydata#9881 (pydata#10370) html repr: improve style for dropdown sections (pydata#10354) Grouper tweaks. (pydata#10362) Docs: Add links to getting help mermaid diagram (pydata#10324) Enforce ruff/flynt rules (FLY) (pydata#10375) Add missing AbstractWritableDataStore base methods and arguments (pydata#10343) Improve html repr in dark mode (Jupyterlab + Xarray docs) (pydata#10353) Pin Mypy to 1.15 (pydata#10378) use numpy dtype exposed by zarr array instead of metadata.data_type (pydata#10348) Fix doc typo for caption "Interoperability" (pydata#10374) Implement cftime vectorization as discussed in PR pydata#8322 (pydata#8324) Enforce ruff/flake8-pyi rules (PYI) (pydata#10359) Apply assorted ruff/Pylint rules (PL) / Enforce PLE rules (pydata#10366) (fix): pandas extension array repr for int64[pyarrow] (pydata#10317) Enforce ruff/flake8-implicit-str-concat rules (ISC) (pydata#10368) Enforce ruff/refurb rules (FURB) (pydata#10367) Ignore ruff/Pyflakes rule F401 more precisely (pydata#10369) Apply assorted ruff/flake8-simplify rules (SIM) (pydata#10364) Apply assorted ruff/flake8-pytest-style rules (PT) (pydata#10363) Fix "a array" misspelling (pydata#10365)

dcherian added 5 commits December 11, 2024 22:12

Don't eagerly compute dask arrays in localize

c1603c5

Clean up test

6109f6e

Clean up Variable handling

61c5b2c

Silence test warning

81b73b9

Use apply_ufunc instead

652bcc1

dcherian added needs review run-benchmark Run the ASV benchmark workflow labels Dec 13, 2024

dcherian requested a review from Illviljan December 13, 2024 06:32

dcherian commented Dec 13, 2024

View reviewed changes

xarray/core/missing.py Show resolved Hide resolved

dcherian commented Dec 13, 2024

View reviewed changes

xarray/tests/test_interp.py Outdated Show resolved Hide resolved

dcherian commented Dec 13, 2024

View reviewed changes

Add test for pydata#4463

03f0b36

Closes pydata#4463

dcherian added the topic-interpolation label Dec 13, 2024

dcherian added 3 commits December 13, 2024 16:26

complete tests

9b915b2

Add comments

be5c783

Clear up broadcasting

a5e1854

dcherian force-pushed the redo-blockwise-interp branch from 245697e to a5e1854 Compare December 14, 2024 00:06

dcherian added 5 commits December 13, 2024 17:07

typing

eef94fa

try a different warning filter

79a7e56

one more fix

6e22072

types + more duck_array_ops

8d4503a

fixes

586f638

dcherian force-pushed the redo-blockwise-interp branch from 652a239 to 586f638 Compare December 14, 2024 04:02

Illviljan and others added 2 commits December 14, 2024 09:37

Merge branch 'main' into pr/9881

972e9fb

[pre-commit.ci] auto fixes from pre-commit.com hooks

38e66fc

for more information, see https://pre-commit.ci

Illviljan mentioned this pull request Dec 14, 2024

Use integers instead of randint #9889

Merged

1 task

Merge branch 'main' into pr/9881

43a0691

dcherian removed the needs review label Dec 17, 2024

dcherian added the needs review label Dec 17, 2024

dcherian added plan to merge Final call for comments and removed needs review labels Dec 17, 2024

headtr1ck reviewed Dec 17, 2024

View reviewed changes

dcherian and others added 2 commits December 17, 2024 13:20

Apply suggestions from code review

97a388e

Co-authored-by: Michael Niklas <[email protected]>

Merge branch 'main' into redo-blockwise-interp

ef24840

dcherian mentioned this pull request Dec 18, 2024

BUG: interpolating Dask Array with NumPy Arrays completely blows up the chunk size for multiple dimensions #9907

Closed

5 tasks

Merge branch 'main' into pr/9881

437219f

Illviljan reviewed Dec 19, 2024

View reviewed changes

dcherian and others added 6 commits December 18, 2024 17:19

Apply suggestions from code review

c152ca3

Co-authored-by: Illviljan <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

6c1dd95

for more information, see https://pre-commit.ci

Apply suggestions from code review

1b9845d

Co-authored-by: Illviljan <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

81b8a90

for more information, see https://pre-commit.ci

fix

6f6dd0a

Revert "Apply suggestions from code review"

e6ec62b

This reverts commit 1b9845d.

dcherian merged commit 29fe679 into pydata:main Dec 19, 2024
29 checks passed

dcherian deleted the redo-blockwise-interp branch December 19, 2024 16:30

dcherian mentioned this pull request Dec 19, 2024

interp performance with chunked dimensions #6799

Open

dcherian mentioned this pull request Mar 31, 2025

diff behavior of xr.interp_like btween xr 2024.10.0 , 2025.3.0 e-marshall/cloud-open-source-geospatial-data-cube-workflows#47

Open

dcherian added a commit to dcherian/xarray that referenced this pull request May 28, 2025

Fix performance regression in interp from pydata#9881

e21fdb6

Closes pydata#10287

dcherian added a commit to dcherian/xarray that referenced this pull request May 28, 2025

Fix performance regression in interp from pydata#9881

6302e83

Closes pydata#10287

dcherian added a commit to dcherian/xarray that referenced this pull request May 28, 2025

Fix performance regression in interp from pydata#9881

3b06696

Closes pydata#10287

dcherian added a commit that referenced this pull request May 30, 2025

Fix performance regression in interp from #9881 (#10370)

34efef2

		# TODO: narrow interp_func to interpolator here
		return _interp1d(var, x_list, new_x_list, interp_func, interp_kwargs) # type: ignore[arg-type]

Uh oh!

Rewrite interp to use apply_ufunc #9881

Rewrite interp to use apply_ufunc #9881

Uh oh!

Conversation

dcherian commented Dec 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dcherian commented Dec 17, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Illviljan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Rewrite interp to use `apply_ufunc` #9881

Rewrite interp to use `apply_ufunc` #9881

dcherian commented Dec 13, 2024 •

edited

Loading