|
| 1 | +.. currentmodule:: xarray |
| 2 | + |
| 3 | +.. _internals.chunkedarrays: |
| 4 | + |
| 5 | +Alternative chunked array types |
| 6 | +=============================== |
| 7 | + |
| 8 | +.. warning:: |
| 9 | + |
| 10 | + This is a *highly* experimental feature. Please report any bugs or other difficulties on `xarray's issue tracker <https://github.com/pydata/xarray/issues>`_. |
| 11 | + In particular see discussion on `xarray issue #6807 <https://github.com/pydata/xarray/issues/6807>`_ |
| 12 | + |
| 13 | +Xarray can wrap chunked dask arrays (see :ref:`dask`), but can also wrap any other chunked array type that exposes the correct interface. |
| 14 | +This allows us to support using other frameworks for distributed and out-of-core processing, with user code still written as xarray commands. |
| 15 | +In particular xarray also supports wrapping :py:class:`cubed.Array` objects |
| 16 | +(see `Cubed's documentation <https://tom-e-white.com/cubed/>`_ and the `cubed-xarray package <https://github.com/xarray-contrib/cubed-xarray>`_). |
| 17 | + |
| 18 | +The basic idea is that by wrapping an array that has an explicit notion of ``.chunks``, xarray can expose control over |
| 19 | +the choice of chunking scheme to users via methods like :py:meth:`DataArray.chunk` whilst the wrapped array actually |
| 20 | +implements the handling of processing all of the chunks. |
| 21 | + |
| 22 | +Chunked array methods and "core operations" |
| 23 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 24 | + |
| 25 | +A chunked array needs to meet all the :ref:`requirements for normal duck arrays <internals.duckarrays.requirements>`, but must also |
| 26 | +implement additional features. |
| 27 | + |
| 28 | +Chunked arrays have additional attributes and methods, such as ``.chunks`` and ``.rechunk``. |
| 29 | +Furthermore, Xarray dispatches chunk-aware computations across one or more chunked arrays using special functions known |
| 30 | +as "core operations". Examples include ``map_blocks``, ``blockwise``, and ``apply_gufunc``. |
| 31 | + |
| 32 | +The core operations are generalizations of functions first implemented in :py:mod:`dask.array`. |
| 33 | +The implementation of these functions is specific to the type of arrays passed to them. For example, when applying the |
| 34 | +``map_blocks`` core operation, :py:class:`dask.array.Array` objects must be processed by :py:func:`dask.array.map_blocks`, |
| 35 | +whereas :py:class:`cubed.Array` objects must be processed by :py:func:`cubed.map_blocks`. |
| 36 | + |
| 37 | +In order to use the correct implementation of a core operation for the array type encountered, xarray dispatches to the |
| 38 | +corresponding subclass of :py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint`, |
| 39 | +also known as a "Chunk Manager". Therefore **a full list of the operations that need to be defined is set by the |
| 40 | +API of the** :py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint` **abstract base class**. Note that chunked array |
| 41 | +methods are also currently dispatched using this class. |
| 42 | + |
| 43 | +Chunked array creation is also handled by this class. As chunked array objects have a one-to-one correspondence with |
| 44 | +in-memory numpy arrays, it should be possible to create a chunked array from a numpy array by passing the desired |
| 45 | +chunking pattern to an implementation of :py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint.from_array``. |
| 46 | + |
| 47 | +.. note:: |
| 48 | + |
| 49 | + The :py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint` abstract base class is mostly just acting as a |
| 50 | + namespace for containing the chunked-aware function primitives. Ideally in the future we would have an API standard |
| 51 | + for chunked array types which codified this structure, making the entrypoint system unnecessary. |
| 52 | + |
| 53 | +.. currentmodule:: xarray.core.parallelcompat |
| 54 | + |
| 55 | +.. autoclass:: xarray.core.parallelcompat.ChunkManagerEntrypoint |
| 56 | + :members: |
| 57 | + |
| 58 | +Registering a new ChunkManagerEntrypoint subclass |
| 59 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 60 | + |
| 61 | +Rather than hard-coding various chunk managers to deal with specific chunked array implementations, xarray uses an |
| 62 | +entrypoint system to allow developers of new chunked array implementations to register their corresponding subclass of |
| 63 | +:py:class:`~xarray.core.parallelcompat.ChunkManagerEntrypoint`. |
| 64 | + |
| 65 | + |
| 66 | +To register a new entrypoint you need to add an entry to the ``setup.cfg`` like this:: |
| 67 | + |
| 68 | + [options.entry_points] |
| 69 | + xarray.chunkmanagers = |
| 70 | + dask = xarray.core.daskmanager:DaskManager |
| 71 | + |
| 72 | +See also `cubed-xarray <https://github.com/xarray-contrib/cubed-xarray>`_ for another example. |
| 73 | + |
| 74 | +To check that the entrypoint has worked correctly, you may find it useful to display the available chunkmanagers using |
| 75 | +the internal function :py:func:`~xarray.core.parallelcompat.list_chunkmanagers`. |
| 76 | + |
| 77 | +.. autofunction:: list_chunkmanagers |
| 78 | + |
| 79 | + |
| 80 | +User interface |
| 81 | +~~~~~~~~~~~~~~ |
| 82 | + |
| 83 | +Once the chunkmanager subclass has been registered, xarray objects wrapping the desired array type can be created in 3 ways: |
| 84 | + |
| 85 | +#. By manually passing the array type to the :py:class:`~xarray.DataArray` constructor, see the examples for :ref:`numpy-like arrays <userguide.duckarrays>`, |
| 86 | + |
| 87 | +#. Calling :py:meth:`~xarray.DataArray.chunk`, passing the keyword arguments ``chunked_array_type`` and ``from_array_kwargs``, |
| 88 | + |
| 89 | +#. Calling :py:func:`~xarray.open_dataset`, passing the keyword arguments ``chunked_array_type`` and ``from_array_kwargs``. |
| 90 | + |
| 91 | +The latter two methods ultimately call the chunkmanager's implementation of ``.from_array``, to which they pass the ``from_array_kwargs`` dict. |
| 92 | +The ``chunked_array_type`` kwarg selects which registered chunkmanager subclass to dispatch to. It defaults to ``'dask'`` |
| 93 | +if Dask is installed, otherwise it defaults to whichever chunkmanager is registered if only one is registered. |
| 94 | +If multiple chunkmanagers are registered it will raise an error by default. |
| 95 | + |
| 96 | +Parallel processing without chunks |
| 97 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 98 | + |
| 99 | +To use a parallel array type that does not expose a concept of chunks explicitly, none of the information on this page |
| 100 | +is theoretically required. Such an array type (e.g. `Ramba <https://github.com/Python-for-HPC/ramba>`_ or |
| 101 | +`Arkouda <https://github.com/Bears-R-Us/arkouda>`_) could be wrapped using xarray's existing support for |
| 102 | +:ref:`numpy-like "duck" arrays <userguide.duckarrays>`. |
0 commit comments