Skip to content
forked from pydata/xarray

Commit 65c658b

Browse files
committed
Merge branch 'main' into warn-nd-index-var
* main: Remove hue_style from plot1d docstring (pydata#7925) Add new what's new section (pydata#7986) Release summary for v2023.07.0 (pydata#7979) Improve explanation in example "Working with Multidimensional Coordinates" (pydata#7984) Fix typo in zarr.py (pydata#7983) Examples added to docstrings (pydata#7936) [pre-commit.ci] pre-commit autoupdate (pydata#7973) Skip broken tests on python 3.11 and windows (pydata#7972) Use another repository for upstream testing (pydata#7970) Move absolute path finder from open_mfdataset to own function (pydata#7968) ensure no forward slashes in names for HDF5-based backends (pydata#7953) Chunked array docs (pydata#7951) [pre-commit.ci] pre-commit autoupdate (pydata#7959) manually unshallow the repository on RTD (pydata#7961) Update minimum version of typing extensions in pre-commit (pydata#7960) Docstring examples (pydata#7881)
2 parents f059e50 + a47ff4e commit 65c658b

22 files changed

+1178
-80
lines changed

.pre-commit-config.yaml

+3-3
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,9 @@ repos:
1414
- id: absolufy-imports
1515
name: absolufy-imports
1616
files: ^xarray/
17-
- repo: https://github.com/charliermarsh/ruff-pre-commit
17+
- repo: https://github.com/astral-sh/ruff-pre-commit
1818
# Ruff version.
19-
rev: 'v0.0.275'
19+
rev: 'v0.0.277'
2020
hooks:
2121
- id: ruff
2222
args: ["--fix"]
@@ -47,7 +47,7 @@ repos:
4747
types-pkg_resources,
4848
types-PyYAML,
4949
types-pytz,
50-
typing-extensions==3.10.0.0,
50+
typing-extensions>=4.1.0,
5151
numpy,
5252
]
5353
- repo: https://github.com/citation-file-format/cff-converter-python

.readthedocs.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ build:
77
jobs:
88
post_checkout:
99
- (git --no-pager log --pretty="tformat:%s" -1 | grep -vqF "[skip-rtd]") || exit 183
10+
- git fetch --unshallow || true
1011
pre_install:
1112
- git update-index --assume-unchanged doc/conf.py ci/requirements/doc.yml
1213

ci/install-upstream-wheels.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ conda uninstall -y --force \
2323
xarray
2424
# to limit the runtime of Upstream CI
2525
python -m pip install \
26-
-i https://pypi.anaconda.org/scipy-wheels-nightly/simple \
26+
-i https://pypi.anaconda.org/scientific-python-nightly-wheels/simple \
2727
--no-deps \
2828
--pre \
2929
--upgrade \

doc/conf.py

+1
Original file line numberDiff line numberDiff line change
@@ -323,6 +323,7 @@
323323
"dask": ("https://docs.dask.org/en/latest", None),
324324
"cftime": ("https://unidata.github.io/cftime", None),
325325
"sparse": ("https://sparse.pydata.org/en/latest/", None),
326+
"cubed": ("https://tom-e-white.com/cubed/", None),
326327
}
327328

328329

doc/examples/multidimensional-coords.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@
5656
"cell_type": "markdown",
5757
"metadata": {},
5858
"source": [
59-
"In this example, the _logical coordinates_ are `x` and `y`, while the _physical coordinates_ are `xc` and `yc`, which represent the latitudes and longitude of the data."
59+
"In this example, the _logical coordinates_ are `x` and `y`, while the _physical coordinates_ are `xc` and `yc`, which represent the longitudes and latitudes of the data."
6060
]
6161
},
6262
{

doc/internals/chunked-arrays.rst

+102
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
.. currentmodule:: xarray
2+
3+
.. _internals.chunkedarrays:
4+
5+
Alternative chunked array types
6+
===============================
7+
8+
.. warning::
9+
10+
This is a *highly* experimental feature. Please report any bugs or other difficulties on `xarray's issue tracker <https://github.com/pydata/xarray/issues>`_.
11+
In particular see discussion on `xarray issue #6807 <https://github.com/pydata/xarray/issues/6807>`_
12+
13+
Xarray can wrap chunked dask arrays (see :ref:`dask`), but can also wrap any other chunked array type that exposes the correct interface.
14+
This allows us to support using other frameworks for distributed and out-of-core processing, with user code still written as xarray commands.
15+
In particular xarray also supports wrapping :py:class:`cubed.Array` objects
16+
(see `Cubed's documentation <https://tom-e-white.com/cubed/>`_ and the `cubed-xarray package <https://github.com/xarray-contrib/cubed-xarray>`_).
17+
18+
The basic idea is that by wrapping an array that has an explicit notion of ``.chunks``, xarray can expose control over
19+
the choice of chunking scheme to users via methods like :py:meth:`DataArray.chunk` whilst the wrapped array actually
20+
implements the handling of processing all of the chunks.
21+
22+
Chunked array methods and "core operations"
23+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
24+
25+
A chunked array needs to meet all the :ref:`requirements for normal duck arrays <internals.duckarrays.requirements>`, but must also
26+
implement additional features.
27+
28+
Chunked arrays have additional attributes and methods, such as ``.chunks`` and ``.rechunk``.
29+
Furthermore, Xarray dispatches chunk-aware computations across one or more chunked arrays using special functions known
30+
as "core operations". Examples include ``map_blocks``, ``blockwise``, and ``apply_gufunc``.
31+
32+
The core operations are generalizations of functions first implemented in :py:mod:`dask.array`.
33+
The implementation of these functions is specific to the type of arrays passed to them. For example, when applying the
34+
``map_blocks`` core operation, :py:class:`dask.array.Array` objects must be processed by :py:func:`dask.array.map_blocks`,
35+
whereas :py:class:`cubed.Array` objects must be processed by :py:func:`cubed.map_blocks`.
36+
37+
In order to use the correct implementation of a core operation for the array type encountered, xarray dispatches to the
38+
corresponding subclass of :py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint`,
39+
also known as a "Chunk Manager". Therefore **a full list of the operations that need to be defined is set by the
40+
API of the** :py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint` **abstract base class**. Note that chunked array
41+
methods are also currently dispatched using this class.
42+
43+
Chunked array creation is also handled by this class. As chunked array objects have a one-to-one correspondence with
44+
in-memory numpy arrays, it should be possible to create a chunked array from a numpy array by passing the desired
45+
chunking pattern to an implementation of :py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint.from_array``.
46+
47+
.. note::
48+
49+
The :py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint` abstract base class is mostly just acting as a
50+
namespace for containing the chunked-aware function primitives. Ideally in the future we would have an API standard
51+
for chunked array types which codified this structure, making the entrypoint system unnecessary.
52+
53+
.. currentmodule:: xarray.core.parallelcompat
54+
55+
.. autoclass:: xarray.core.parallelcompat.ChunkManagerEntrypoint
56+
:members:
57+
58+
Registering a new ChunkManagerEntrypoint subclass
59+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
60+
61+
Rather than hard-coding various chunk managers to deal with specific chunked array implementations, xarray uses an
62+
entrypoint system to allow developers of new chunked array implementations to register their corresponding subclass of
63+
:py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint`.
64+
65+
66+
To register a new entrypoint you need to add an entry to the ``setup.cfg`` like this::
67+
68+
[options.entry_points]
69+
xarray.chunkmanagers =
70+
dask = xarray.core.daskmanager:DaskManager
71+
72+
See also `cubed-xarray <https://github.com/xarray-contrib/cubed-xarray>`_ for another example.
73+
74+
To check that the entrypoint has worked correctly, you may find it useful to display the available chunkmanagers using
75+
the internal function :py:func:`~xarray.core.parallelcompat.list_chunkmanagers`.
76+
77+
.. autofunction:: list_chunkmanagers
78+
79+
80+
User interface
81+
~~~~~~~~~~~~~~
82+
83+
Once the chunkmanager subclass has been registered, xarray objects wrapping the desired array type can be created in 3 ways:
84+
85+
#. By manually passing the array type to the :py:class:`~xarray.DataArray` constructor, see the examples for :ref:`numpy-like arrays <userguide.duckarrays>`,
86+
87+
#. Calling :py:meth:`~xarray.DataArray.chunk`, passing the keyword arguments ``chunked_array_type`` and ``from_array_kwargs``,
88+
89+
#. Calling :py:func:`~xarray.open_dataset`, passing the keyword arguments ``chunked_array_type`` and ``from_array_kwargs``.
90+
91+
The latter two methods ultimately call the chunkmanager's implementation of ``.from_array``, to which they pass the ``from_array_kwargs`` dict.
92+
The ``chunked_array_type`` kwarg selects which registered chunkmanager subclass to dispatch to. It defaults to ``'dask'``
93+
if Dask is installed, otherwise it defaults to whichever chunkmanager is registered if only one is registered.
94+
If multiple chunkmanagers are registered it will raise an error by default.
95+
96+
Parallel processing without chunks
97+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
98+
99+
To use a parallel array type that does not expose a concept of chunks explicitly, none of the information on this page
100+
is theoretically required. Such an array type (e.g. `Ramba <https://github.com/Python-for-HPC/ramba>`_ or
101+
`Arkouda <https://github.com/Bears-R-Us/arkouda>`_) could be wrapped using xarray's existing support for
102+
:ref:`numpy-like "duck" arrays <userguide.duckarrays>`.

doc/internals/duck-arrays-integration.rst

+2
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@ Integrating with duck arrays
1111
Xarray can wrap custom numpy-like arrays (":term:`duck array`\s") - see the :ref:`user guide documentation <userguide.duckarrays>`.
1212
This page is intended for developers who are interested in wrapping a new custom array type with xarray.
1313

14+
.. _internals.duckarrays.requirements:
15+
1416
Duck array requirements
1517
~~~~~~~~~~~~~~~~~~~~~~~
1618

doc/internals/index.rst

+1
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ The pages in this section are intended for:
2121

2222
variable-objects
2323
duck-arrays-integration
24+
chunked-arrays
2425
extending-xarray
2526
zarr-encoding-spec
2627
how-to-add-new-backend

doc/user-guide/duckarrays.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ Some numpy-like array types that xarray already has some support for:
2727

2828
For information on wrapping dask arrays see :ref:`dask`. Whilst xarray wraps dask arrays in a similar way to that
2929
described on this page, chunked array types like :py:class:`dask.array.Array` implement additional methods that require
30-
slightly different user code (e.g. calling ``.chunk`` or ``.compute``).
30+
slightly different user code (e.g. calling ``.chunk`` or ``.compute``). See the docs on :ref:`wrapping chunked arrays <internals.chunkedarrays>`.
3131

3232
Why "duck"?
3333
-----------

doc/whats-new.rst

+30-2
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,9 @@ What's New
1414
1515
np.random.seed(123456)
1616
17-
.. _whats-new.2023.06.1:
17+
.. _whats-new.2023.07.1:
1818

19-
v2023.06.1 (unreleased)
19+
v2023.07.1 (unreleased)
2020
-----------------------
2121

2222
New Features
@@ -29,17 +29,45 @@ Breaking changes
2929

3030
Deprecations
3131
~~~~~~~~~~~~
32+
- `hue_style` is being deprecated for scatter plots. (:issue:`7907`, :pull:`7925`).
33+
By `Jimmy Westling <https://github.com/illviljan>`_.
34+
35+
Bug fixes
36+
~~~~~~~~~
37+
38+
39+
Documentation
40+
~~~~~~~~~~~~~
41+
42+
43+
Internal Changes
44+
~~~~~~~~~~~~~~~~
45+
46+
47+
v2023.07.0 (July 11, 2023)
48+
--------------------------
3249

50+
This release brings improvements to the documentation on wrapping numpy-like arrays, improved docstrings, and bug fixes.
3351

3452
Bug fixes
3553
~~~~~~~~~
3654

55+
- Ensure no forward slashes in variable and dimension names for HDF5-based engines.
56+
(:issue:`7943`, :pull:`7953`) By `Kai Mühlbauer <https://github.com/kmuehlbauer>`_.
3757

3858
Documentation
3959
~~~~~~~~~~~~~
4060

61+
- Added examples to docstrings of :py:meth:`Dataset.tail`, :py:meth:`Dataset.head`, :py:meth:`Dataset.dropna`,
62+
:py:meth:`Dataset.ffill`, :py:meth:`Dataset.bfill`, :py:meth:`Dataset.set_coords`, :py:meth:`Dataset.reset_coords`
63+
(:issue:`6793`, :pull:`7936`) By `Harshitha <https://github.com/harshitha1201>`_ .
64+
- Added page on wrapping chunked numpy-like arrays as alternatives to dask arrays.
65+
(:pull:`7951`) By `Tom Nicholas <https://github.com/TomNicholas>`_.
4166
- Expanded the page on wrapping numpy-like "duck" arrays.
4267
(:pull:`7911`) By `Tom Nicholas <https://github.com/TomNicholas>`_.
68+
- Added examples to docstrings of :py:meth:`Dataset.isel`, :py:meth:`Dataset.reduce`, :py:meth:`Dataset.argmin`,
69+
:py:meth:`Dataset.argmax` (:issue:`6793`, :pull:`7881`)
70+
By `Harshitha <https://github.com/harshitha1201>`_ .
4371

4472
Internal Changes
4573
~~~~~~~~~~~~~~~~

xarray/backends/api.py

+7-33
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,6 @@
33
import os
44
from collections.abc import Hashable, Iterable, Mapping, MutableMapping, Sequence
55
from functools import partial
6-
from glob import glob
76
from io import BytesIO
87
from numbers import Number
98
from typing import (
@@ -21,7 +20,12 @@
2120

2221
from xarray import backends, conventions
2322
from xarray.backends import plugins
24-
from xarray.backends.common import AbstractDataStore, ArrayWriter, _normalize_path
23+
from xarray.backends.common import (
24+
AbstractDataStore,
25+
ArrayWriter,
26+
_find_absolute_paths,
27+
_normalize_path,
28+
)
2529
from xarray.backends.locks import _get_scheduler
2630
from xarray.core import indexing
2731
from xarray.core.combine import (
@@ -967,37 +971,7 @@ def open_mfdataset(
967971
.. [1] https://docs.xarray.dev/en/stable/dask.html
968972
.. [2] https://docs.xarray.dev/en/stable/dask.html#chunking-and-performance
969973
"""
970-
if isinstance(paths, str):
971-
if is_remote_uri(paths) and engine == "zarr":
972-
try:
973-
from fsspec.core import get_fs_token_paths
974-
except ImportError as e:
975-
raise ImportError(
976-
"The use of remote URLs for opening zarr requires the package fsspec"
977-
) from e
978-
979-
fs, _, _ = get_fs_token_paths(
980-
paths,
981-
mode="rb",
982-
storage_options=kwargs.get("backend_kwargs", {}).get(
983-
"storage_options", {}
984-
),
985-
expand=False,
986-
)
987-
tmp_paths = fs.glob(fs._strip_protocol(paths)) # finds directories
988-
paths = [fs.get_mapper(path) for path in tmp_paths]
989-
elif is_remote_uri(paths):
990-
raise ValueError(
991-
"cannot do wild-card matching for paths that are remote URLs "
992-
f"unless engine='zarr' is specified. Got paths: {paths}. "
993-
"Instead, supply paths as an explicit list of strings."
994-
)
995-
else:
996-
paths = sorted(glob(_normalize_path(paths)))
997-
elif isinstance(paths, os.PathLike):
998-
paths = [os.fspath(paths)]
999-
else:
1000-
paths = [os.fspath(p) if isinstance(p, os.PathLike) else p for p in paths]
974+
paths = _find_absolute_paths(paths, engine=engine, **kwargs)
1001975

1002976
if not paths:
1003977
raise OSError("no files to open")

0 commit comments

Comments
 (0)