Skip to content

document the duck array integration status #4530

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Nov 20, 2020
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 59 additions & 0 deletions doc/duckarrays.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
.. currentmodule:: xarray

Working with numpy-like arrays
==============================

.. warning::

This feature should be considered experimental. Please report any bug you may find on
xarray’s github repository.

Numpy-like arrays (:term:`duck array`) extend the :py:class:`numpy.ndarray` with
additional features, like propagating physical units or a different layout in memory.

:py:class:`DataArray` and :py:class:`Dataset` objects can wrap these duck arrays, as
long as they satisfy certain conditions (see :ref:`internals.duck_arrays`).
Comment on lines +14 to +15
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to point to the explicitly tested duck arrays here (pint, sparse)? We could also add a user-maintained list of duck array libraries, just like the current "related projects" list.

I did think about adding usage examples, but maybe it's better to leave that to the extension packages?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a list would be nice.


.. note::

For ``dask`` support see :ref:`dask`.


Missing features
----------------
Most of the API does support :term:`duck array` objects, but there are a few areas where
the code will still cast to ``numpy`` arrays:

- dimension coordinates, and thus all indexing operations:

* :py:meth:`Dataset.sel` and :py:meth:`DataArray.sel`
* :py:meth:`Dataset.loc` and :py:meth:`DataArray.loc`
* :py:meth:`Dataset.drop_sel` and :py:meth:`DataArray.drop_sel`
* :py:meth:`Dataset.reindex`, :py:meth:`Dataset.reindex_like`,
:py:meth:`DataArray.reindex` and :py:meth:`DataArray.reindex_like`: duck arrays in
data variables and non-dimension coordinates won't be casted

- functions and methods that depend on external libraries or features of ``numpy`` not
covered by ``__array_function__`` / ``__array_ufunc__``:

* :py:meth:`Dataset.ffill` and :py:meth:`DataArray.ffill` (uses ``bottleneck``)
* :py:meth:`Dataset.bfill` and :py:meth:`DataArray.bfill` (uses ``bottleneck``)
* :py:meth:`Dataset.interp`, :py:meth:`Dataset.interp_like`,
:py:meth:`DataArray.interp` and :py:meth:`DataArray.interp_like` (uses ``scipy``):
duck arrays in data variables and non-dimension coordinates will be casted in
addition to not supporting duck arrays in dimension coordinates
* :py:meth:`Dataset.rolling_exp` and :py:meth:`DataArray.rolling_exp` (uses
``numbagg``)
* :py:meth:`Dataset.rolling` and :py:meth:`DataArray.rolling` (uses internal functions
of ``numpy``)
* :py:meth:`Dataset.interpolate_na` and :py:meth:`DataArray.interpolate_na` (uses
:py:class:`numpy.vectorize`)
* :py:func:`apply_ufunc` with ``vectorize=True`` (uses :py:class:`numpy.vectorize`)


Extensions using duck arrays
----------------------------
Here's a list of libraries extending ``xarray`` to make working with wrapped duck arrays
easier:

- `pint-xarray <https://github.com/xarray-contrib/pint-xarray>`_
2 changes: 2 additions & 0 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ Documentation
* :doc:`io`
* :doc:`dask`
* :doc:`plotting`
* :doc:`duckarrays`

.. toctree::
:maxdepth: 1
Expand All @@ -80,6 +81,7 @@ Documentation
io
dask
plotting
duckarrays

**Help & reference**

Expand Down
15 changes: 8 additions & 7 deletions doc/internals.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,21 +42,24 @@ xarray objects via the (readonly) :py:attr:`Dataset.variables
<xarray.Dataset.variables>` and
:py:attr:`DataArray.variable <xarray.DataArray.variable>` attributes.

Duck arrays
-----------

.. _internals.duck_arrays:

Integrating with duck arrays
----------------------------

.. warning::

This is a experimental feature.

xarray can wrap custom `duck array`_ objects as long as they define numpy's
xarray can wrap custom :term:`duck array` objects as long as they define numpy's
``shape``, ``dtype`` and ``ndim`` properties and the ``__array__``,
``__array_ufunc__`` and ``__array_function__`` methods.

In certain situations (e.g. when printing the collapsed preview of
variables of a ``Dataset``), xarray will display the repr of a `duck array`_
variables of a ``Dataset``), xarray will display the repr of a :term:`duck array`
in a single line, truncating it to a certain number of characters. If that
would drop too much information, the `duck array`_ may define a
would drop too much information, the :term:`duck array` may define a
``_repr_inline_`` method that takes ``max_width`` (number of characters) as an
argument:

Expand All @@ -71,8 +74,6 @@ argument:

...

.. _duck array: https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html


Extending xarray
----------------
Expand Down
8 changes: 8 additions & 0 deletions doc/terminology.rst
Original file line number Diff line number Diff line change
Expand Up @@ -104,3 +104,11 @@ complete examples, please consult the relevant documentation.*
one, it has 0 dimensions. That means that, e.g., :py:class:`int`,
:py:class:`float`, and :py:class:`str` objects are "scalar" while
:py:class:`list` or :py:class:`tuple` are not.

duck array
`Duck arrays <duck array>`_ are array implementations that behave
like numpy arrays. They have to define the ``shape``, ``dtype`` and
``ndim`` properties. For integration with ``xarray``, the ``__array__``,
``__array_ufunc__`` and ``__array_function__`` protocols are also required.

.. _duck array: https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html
3 changes: 2 additions & 1 deletion doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,8 @@ Bug fixes

Documentation
~~~~~~~~~~~~~

- document the API not supported with duck arrays (:pull:`4530`).
By `Justus Magin <https://github.com/keewis>`_.
- Raise a more informative error when :py:meth:`DataArray.to_dataframe` is
is called on a scalar (:issue:`4228`). By `Pieter Gijsbers <https://github.com/pgijsbers>`_.

Expand Down