Skip to content
forked from pydata/xarray

Commit 0679d2b

Browse files
committed
Merge branch 'main' into groupby-shuffle
* main: (29 commits) Release notes for v2024.09.0 (pydata#9480) Fix `DataTree.coords.__setitem__` by adding `DataTreeCoordinates` class (pydata#9451) Rename DataTree's "ds" and "data" to "dataset" (pydata#9476) Update DataTree repr to indicate inheritance (pydata#9470) Bump pypa/gh-action-pypi-publish in the actions group (pydata#9460) Repo checker (pydata#9450) Add days_in_year and decimal_year to dt accessor (pydata#9105) remove parent argument from DataTree.__init__ (pydata#9465) Fix inheritance in DataTree.copy() (pydata#9457) Implement `DataTree.__delitem__` (pydata#9453) Add ASV for datatree.from_dict (pydata#9459) Make the first argument in DataTree.from_dict positional only (pydata#9446) Fix typos across the code, doc and comments (pydata#9443) DataTree should not be "Generic" (pydata#9445) Disallow passing a DataArray as data into the DataTree constructor (pydata#9444) Support additional dtypes in `resample` (pydata#9413) Shallow copy parent and children in DataTree constructor (pydata#9297) Bump minimum versions for dependencies (pydata#9434) Always include at least one category in random test data (pydata#9436) Avoid deep-copy when constructing groupby codes (pydata#9429) ...
2 parents 2d48690 + ed0418b commit 0679d2b

File tree

116 files changed

+2240
-1061
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

116 files changed

+2240
-1061
lines changed

.github/workflows/ci-additional.yaml

+14-18
Original file line numberDiff line numberDiff line change
@@ -123,11 +123,11 @@ jobs:
123123
python xarray/util/print_versions.py
124124
- name: Install mypy
125125
run: |
126-
python -m pip install "mypy<1.9" --force-reinstall
126+
python -m pip install "mypy" --force-reinstall
127127
128128
- name: Run mypy
129129
run: |
130-
python -m mypy --install-types --non-interactive --cobertura-xml-report mypy_report xarray/
130+
python -m mypy --install-types --non-interactive --cobertura-xml-report mypy_report
131131
132132
- name: Upload mypy coverage to Codecov
133133
uses: codecov/[email protected]
@@ -138,7 +138,7 @@ jobs:
138138
name: codecov-umbrella
139139
fail_ci_if_error: false
140140

141-
mypy39:
141+
mypy-min:
142142
name: Mypy 3.10
143143
runs-on: "ubuntu-latest"
144144
needs: detect-ci-trigger
@@ -177,32 +177,30 @@ jobs:
177177
python xarray/util/print_versions.py
178178
- name: Install mypy
179179
run: |
180-
python -m pip install "mypy<1.9" --force-reinstall
180+
python -m pip install "mypy" --force-reinstall
181181
182182
- name: Run mypy
183183
run: |
184-
python -m mypy --install-types --non-interactive --cobertura-xml-report mypy_report xarray/
184+
python -m mypy --install-types --non-interactive --cobertura-xml-report mypy_report
185185
186186
- name: Upload mypy coverage to Codecov
187187
uses: codecov/[email protected]
188188
with:
189189
file: mypy_report/cobertura.xml
190-
flags: mypy39
190+
flags: mypy-min
191191
env_vars: PYTHON_VERSION
192192
name: codecov-umbrella
193193
fail_ci_if_error: false
194194

195-
196-
197195
pyright:
198196
name: Pyright
199197
runs-on: "ubuntu-latest"
200198
needs: detect-ci-trigger
201199
if: |
202-
always()
203-
&& (
204-
contains( github.event.pull_request.labels.*.name, 'run-pyright')
205-
)
200+
always()
201+
&& (
202+
contains( github.event.pull_request.labels.*.name, 'run-pyright')
203+
)
206204
defaults:
207205
run:
208206
shell: bash -l {0}
@@ -258,10 +256,10 @@ jobs:
258256
runs-on: "ubuntu-latest"
259257
needs: detect-ci-trigger
260258
if: |
261-
always()
262-
&& (
263-
contains( github.event.pull_request.labels.*.name, 'run-pyright')
264-
)
259+
always()
260+
&& (
261+
contains( github.event.pull_request.labels.*.name, 'run-pyright')
262+
)
265263
defaults:
266264
run:
267265
shell: bash -l {0}
@@ -312,8 +310,6 @@ jobs:
312310
name: codecov-umbrella
313311
fail_ci_if_error: false
314312

315-
316-
317313
min-version-policy:
318314
name: Minimum Version Policy
319315
runs-on: "ubuntu-latest"

.github/workflows/pypi-release.yaml

+2-2
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,7 @@ jobs:
8888
path: dist
8989
- name: Publish package to TestPyPI
9090
if: github.event_name == 'push'
91-
uses: pypa/gh-action-pypi-publish@v1.9.0
91+
uses: pypa/gh-action-pypi-publish@v1.10.1
9292
with:
9393
repository_url: https://test.pypi.org/legacy/
9494
verbose: true
@@ -111,6 +111,6 @@ jobs:
111111
name: releases
112112
path: dist
113113
- name: Publish package to PyPI
114-
uses: pypa/gh-action-pypi-publish@v1.9.0
114+
uses: pypa/gh-action-pypi-publish@v1.10.1
115115
with:
116116
verbose: true

.pre-commit-config.yaml

+2-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
# https://pre-commit.com/
22
ci:
33
autoupdate_schedule: monthly
4+
autoupdate_commit_msg: 'Update pre-commit hooks'
45
exclude: 'xarray/datatree_.*'
56
repos:
67
- repo: https://github.com/pre-commit/pre-commit-hooks
@@ -13,7 +14,7 @@ repos:
1314
- id: mixed-line-ending
1415
- repo: https://github.com/astral-sh/ruff-pre-commit
1516
# Ruff version.
16-
rev: 'v0.6.2'
17+
rev: 'v0.6.3'
1718
hooks:
1819
- id: ruff
1920
args: ["--fix", "--show-fixes"]

asv_bench/benchmarks/dataset_io.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -724,7 +724,7 @@ class PerformanceBackend(xr.backends.BackendEntrypoint):
724724
def open_dataset(
725725
self,
726726
filename_or_obj: str | os.PathLike | None,
727-
drop_variables: tuple[str] = None,
727+
drop_variables: tuple[str, ...] = None,
728728
*,
729729
mask_and_scale=True,
730730
decode_times=True,

asv_bench/benchmarks/datatree.py

+15
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
import xarray as xr
2+
from xarray.core.datatree import DataTree
3+
4+
5+
class Datatree:
6+
def setup(self):
7+
run1 = DataTree.from_dict({"run1": xr.Dataset({"a": 1})})
8+
self.d_few = {"run1": run1}
9+
self.d_many = {f"run{i}": xr.Dataset({"a": 1}) for i in range(100)}
10+
11+
def time_from_dict_few(self):
12+
DataTree.from_dict(self.d_few)
13+
14+
def time_from_dict_many(self):
15+
DataTree.from_dict(self.d_many)

asv_bench/benchmarks/groupby.py

+17-2
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
# import flox to avoid the cost of first import
2+
import cftime
23
import flox.xarray # noqa
34
import numpy as np
45
import pandas as pd
@@ -96,7 +97,7 @@ def setup(self, *args, **kwargs):
9697

9798
requires_dask()
9899
super().setup(**kwargs)
99-
self.ds1d = self.ds1d.chunk({"dim_0": 50}).to_dataframe()
100+
self.ds1d = self.ds1d.chunk({"dim_0": 50}).to_dask_dataframe()
100101
self.ds1d_mean = self.ds1d.groupby("b").mean().compute()
101102

102103
def time_binary_op_2d(self):
@@ -169,7 +170,21 @@ class GroupByLongTime:
169170
def setup(self, use_cftime, use_flox):
170171
arr = np.random.randn(10, 10, 365 * 30)
171172
time = xr.date_range("2000", periods=30 * 365, use_cftime=use_cftime)
172-
self.da = xr.DataArray(arr, dims=("y", "x", "time"), coords={"time": time})
173+
174+
# GH9426 - deep-copying CFTime object arrays is weirdly slow
175+
asda = xr.DataArray(time)
176+
labeled_time = []
177+
for year, month in zip(asda.dt.year, asda.dt.month, strict=True):
178+
labeled_time.append(cftime.datetime(year, month, 1))
179+
180+
self.da = xr.DataArray(
181+
arr,
182+
dims=("y", "x", "time"),
183+
coords={"time": time, "time2": ("time", labeled_time)},
184+
)
185+
186+
def time_setup(self, use_cftime, use_flox):
187+
self.da.groupby("time.month")
173188

174189
def time_mean(self, use_cftime, use_flox):
175190
with xr.set_options(use_flox=use_flox):

asv_bench/benchmarks/rolling.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ def time_rolling_long(self, func, pandas, use_bottleneck):
6464
def time_rolling_np(self, window_, min_periods, use_bottleneck):
6565
with xr.set_options(use_bottleneck=use_bottleneck):
6666
self.ds.rolling(x=window_, center=False, min_periods=min_periods).reduce(
67-
getattr(np, "nansum")
67+
np.nansum
6868
).load()
6969

7070
@parameterized(

ci/requirements/bare-minimum.yml

+2-2
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,6 @@ dependencies:
1111
- pytest-env
1212
- pytest-xdist
1313
- pytest-timeout
14-
- numpy=1.23
14+
- numpy=1.24
1515
- packaging=23.1
16-
- pandas=2.0
16+
- pandas=2.1

ci/requirements/min-all-deps.yml

+12-12
Original file line numberDiff line numberDiff line change
@@ -9,37 +9,37 @@ dependencies:
99
# doc/user-guide/installing.rst, doc/user-guide/plotting.rst and setup.py.
1010
- python=3.10
1111
- array-api-strict=1.0 # dependency for testing the array api compat
12-
- boto3=1.26
12+
- boto3=1.28
1313
- bottleneck=1.3
14-
- cartopy=0.21
14+
- cartopy=0.22
1515
- cftime=1.6
1616
- coveralls
17-
- dask-core=2023.4
18-
- distributed=2023.4
17+
- dask-core=2023.9
18+
- distributed=2023.9
1919
# Flox > 0.8 has a bug with numbagg versions
2020
# It will require numbagg > 0.6
2121
# so we should just skip that series eventually
2222
# or keep flox pinned for longer than necessary
2323
- flox=0.7
24-
- h5netcdf=1.1
24+
- h5netcdf=1.2
2525
# h5py and hdf5 tend to cause conflicts
2626
# for e.g. hdf5 1.12 conflicts with h5py=3.1
2727
# prioritize bumping other packages instead
2828
- h5py=3.8
2929
- hdf5=1.12
3030
- hypothesis
31-
- iris=3.4
31+
- iris=3.7
3232
- lxml=4.9 # Optional dep of pydap
3333
- matplotlib-base=3.7
3434
- nc-time-axis=1.4
3535
# netcdf follows a 1.major.minor[.patch] convention
3636
# (see https://github.com/Unidata/netcdf4-python/issues/1090)
3737
- netcdf4=1.6.0
38-
- numba=0.56
38+
- numba=0.57
3939
- numbagg=0.2.1
40-
- numpy=1.23
40+
- numpy=1.24
4141
- packaging=23.1
42-
- pandas=2.0
42+
- pandas=2.1
4343
- pint=0.22
4444
- pip
4545
- pydap=3.4
@@ -49,9 +49,9 @@ dependencies:
4949
- pytest-xdist
5050
- pytest-timeout
5151
- rasterio=1.3
52-
- scipy=1.10
52+
- scipy=1.11
5353
- seaborn=0.12
5454
- sparse=0.14
5555
- toolz=0.12
56-
- typing_extensions=4.5
57-
- zarr=2.14
56+
- typing_extensions=4.7
57+
- zarr=2.16

design_notes/flexible_indexes_notes.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,7 @@ An `XarrayIndex` subclass must/should/may implement the following properties/met
7171
- a `data` property to access index's data and map it to coordinate data (see [Section 4](#4-indexvariable))
7272
- a `__getitem__()` implementation to propagate the index through DataArray/Dataset indexing operations
7373
- `equals()`, `union()` and `intersection()` methods for data alignment (see [Section 2.6](#26-using-indexes-for-data-alignment))
74-
- Xarray coordinate getters (see [Section 2.2.4](#224-implicit-coodinates))
74+
- Xarray coordinate getters (see [Section 2.2.4](#224-implicit-coordinates))
7575
- a method that may return a new index and that will be called when one of the corresponding coordinates is dropped from the Dataset/DataArray (multi-coordinate indexes)
7676
- `encode()`/`decode()` methods that would allow storage-agnostic serialization and fast-path reconstruction of the underlying index object(s) (see [Section 2.8](#28-index-encoding))
7777
- one or more "non-standard" methods or properties that could be leveraged in Xarray 3rd-party extensions like Dataset/DataArray accessors (see [Section 2.7](#27-using-indexes-for-other-purposes))

design_notes/grouper_objects.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -166,7 +166,7 @@ where `|` represents chunk boundaries. A simple rechunking to
166166
```
167167
000|111122|3333
168168
```
169-
would make this resampling reduction an embarassingly parallel blockwise problem.
169+
would make this resampling reduction an embarrassingly parallel blockwise problem.
170170

171171
Similarly consider monthly-mean climatologies for which the month numbers might be
172172
```

design_notes/named_array_design_doc.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -258,7 +258,7 @@ Questions:
258258
Variable.coarsen_reshape
259259
Variable.rolling_window
260260

261-
Variable.set_dims # split this into broadcas_to and expand_dims
261+
Variable.set_dims # split this into broadcast_to and expand_dims
262262

263263

264264
# Reordering/Reshaping

doc/api.rst

+2
Original file line numberDiff line numberDiff line change
@@ -530,9 +530,11 @@ Datetimelike properties
530530
DataArray.dt.quarter
531531
DataArray.dt.days_in_month
532532
DataArray.dt.daysinmonth
533+
DataArray.dt.days_in_year
533534
DataArray.dt.season
534535
DataArray.dt.time
535536
DataArray.dt.date
537+
DataArray.dt.decimal_year
536538
DataArray.dt.calendar
537539
DataArray.dt.is_month_start
538540
DataArray.dt.is_month_end

doc/user-guide/dask.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -298,7 +298,7 @@ Automatic parallelization with ``apply_ufunc`` and ``map_blocks``
298298

299299
.. tip::
300300

301-
Some problems can become embarassingly parallel and thus easy to parallelize
301+
Some problems can become embarrassingly parallel and thus easy to parallelize
302302
automatically by rechunking to a frequency, e.g. ``ds.chunk(time=TimeResampler("YE"))``.
303303
See :py:meth:`Dataset.chunk` for more.
304304

@@ -559,7 +559,7 @@ larger chunksizes.
559559

560560
.. tip::
561561

562-
Many time domain problems become amenable to an embarassingly parallel or blockwise solution
562+
Many time domain problems become amenable to an embarrassingly parallel or blockwise solution
563563
(e.g. using :py:func:`xarray.map_blocks`, :py:func:`dask.array.map_blocks`, or
564564
:py:func:`dask.array.blockwise`) by rechunking to a frequency along the time dimension.
565565
Provide :py:class:`xarray.groupers.TimeResampler` objects to :py:meth:`Dataset.chunk` to do so.

doc/user-guide/data-structures.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -289,7 +289,7 @@ pressure that were made under various conditions:
289289
* the measurements were made on four different days;
290290
* they were made at two separate locations, which we will represent using
291291
their latitude and longitude; and
292-
* they were made using instruments by three different manufacutrers, which we
292+
* they were made using instruments by three different manufacturers, which we
293293
will refer to as `'manufac1'`, `'manufac2'`, and `'manufac3'`.
294294

295295
.. ipython:: python

doc/user-guide/groupby.rst

+6
Original file line numberDiff line numberDiff line change
@@ -305,6 +305,12 @@ Use grouper objects to group by multiple dimensions:
305305
306306
from xarray.groupers import UniqueGrouper
307307
308+
da.groupby(["lat", "lon"]).sum()
309+
310+
The above is sugar for using ``UniqueGrouper`` objects directly:
311+
312+
.. ipython:: python
313+
308314
da.groupby(lat=UniqueGrouper(), lon=UniqueGrouper()).sum()
309315
310316

doc/user-guide/pandas.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -120,7 +120,7 @@ Particularly after a roundtrip, the following deviations are noted:
120120

121121
- a non-dimension Dataset ``coordinate`` is converted into ``variable``
122122
- a non-dimension DataArray ``coordinate`` is not converted
123-
- ``dtype`` is not allways the same (e.g. "str" is converted to "object")
123+
- ``dtype`` is not always the same (e.g. "str" is converted to "object")
124124
- ``attrs`` metadata is not conserved
125125

126126
To avoid these problems, the third-party `ntv-pandas <https://github.com/loco-philippe/ntv-pandas>`__ library offers lossless and reversible conversions between

doc/user-guide/testing.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -193,7 +193,7 @@ different type:
193193

194194
.. ipython:: python
195195
196-
def sparse_random_arrays(shape: tuple[int]) -> sparse._coo.core.COO:
196+
def sparse_random_arrays(shape: tuple[int, ...]) -> sparse._coo.core.COO:
197197
"""Strategy which generates random sparse.COO arrays"""
198198
if shape is None:
199199
shape = npst.array_shapes()

0 commit comments

Comments
 (0)