-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add glossary to documentation #3352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 4 commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
b492a21
First draft at terminology glossary.
gwgundersen 1c40460
Made name matching rules more explicit and hopefully clearer.
gwgundersen 3bf9fa7
Amended what's new.
gwgundersen 71ffb19
Changes based on feedback.
gwgundersen 7363c5a
More changed based on feedback.
gwgundersen File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
.. _terminology: | ||
|
||
.. https://github.com/pydata/xarray/issues/2410 | ||
.. https://github.com/pydata/xarray/issues/1295 | ||
|
||
Terminology | ||
dcherian marked this conversation as resolved.
Show resolved
Hide resolved
|
||
=========== | ||
|
||
*Xarray terminology differs slightly from CF, mathematical conventions, and pandas; and therefore using xarray, understanding the documentation, and parsing error messages is easier once key terminology is defined. This glossary was designed so that more fundamental concepts come first. Thus for new users, this page is best read top-to-bottom. Throughout the glossary,* ``arr`` *will refer to an xarray* :py:class:`DataArray` *in any small examples. For more complete examples, please consult the relevant documentation.* | ||
|
||
---- | ||
dcherian marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
**DataArray:** A multi-dimensional array with labeled or named dimensions. ``DataArray`` objects add metadata such as dimension names, coordinates, and attributes (defined below) to underlying "unlabeled" data structures such as numpy and Dask arrays. If its optional ``name`` property is set, it is a *named DataArray*. | ||
|
||
---- | ||
|
||
**Dataset:** A dict-like collection of ``DataArray`` objects with aligned dimensions. Thus, most operations that can be performed on the dimensions of a single ``DataArray`` can be performed on a dataset. Datasets have data variables (see **Variable** below), dimensions, coordinates, and attributes. | ||
|
||
---- | ||
|
||
**Variable:** A `NetCDF-like variable <https://www.unidata.ucar.edu/software/netcdf/netcdf/Variables.html>`_ consisting of dimensions, data, and attributes which describe a single array. The main functional difference between variables and numpy arrays is that numerical operations on variables implement array broadcasting by dimension name. Each ``DataArray`` has an underlying variable that can be accessed via ``arr.variable``. However, a variable is not fully described outside of either a ``Dataset`` or a ``DataArray``. | ||
|
||
.. note:: | ||
|
||
The :py:class:`Variable` class is low-level interface and can typically be ignored. However, the word "variable" appears often enough in the code and documentation that is useful to understand. | ||
|
||
---- | ||
|
||
**Dimension:** In mathematics, the *dimension* of data is loosely the number of degrees of freedom for it. A *dimension axis* is a set of all points in which all but one of these degrees of freedom is fixed. We can think of each dimension axis as having a name, for example the "x dimension". In xarray, a ``DataArray`` object's *dimensions* are its named dimension axes, and the name of the ``i``-th dimension is ``arr.dims[i]``. If an array is created without dimensions, the default dimension names are ``dim_0``, ``dim_1``, and so forth. | ||
gwgundersen marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
---- | ||
|
||
**Coordinate:** An array that labels a dimension of another ``DataArray``. Loosely, the coordinate array's values can be thought of as tick labels along a dimension. There are two types of coordinate arrays: *dimension coordinates* and *non-dimension coordinates* (see below). A coordinate named ``x`` can be retrieved from ``arr.coords[x]``. A ``DataArray`` can have more coordinates than dimensions because a single dimension can be assigned multiple coordinate arrays. However, only one coordinate array can be a assigned as a particular dimension's dimension coordinate array. As a consequence, ``len(arr.dims) <= len(arr.coords)`` in general. | ||
gwgundersen marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
---- | ||
|
||
**Dimension coordinate:** A coordinate array assigned to ``arr`` with both a name and dimension name in ``arr.dims`` (see **Name matching rules** below). Dimension coordinates are used for label-based indexing and alignment, like the index found on a :py:class:`pandas.DataFrame` or :py:class:`pandas.Series`. In fact, dimension coordinates use :py:class:`pandas.Index` objects under the hood for efficient computation. Dimension coordinates are marked by ``*`` when printing a ``DataArray`` or ``Dataset``. | ||
|
||
---- | ||
|
||
**Non-dimension coordinate:** A coordinate array assigned to `arr`` with a name in ``arr.dims`` but a dimension name *not* in ``arr.dims`` (see **Name matching rules** below). These coordinate arrays are useful for auxiliary labeling. However, non-dimension coordinates are not indexed, and any operation on non-dimension coordinates that leverages indexing will fail. Printing ``arr.coords`` will print all of ``arr``'s coordinate names, with the assigned dimensions in parentheses. For example, ``coord_name (dim_name) 1 2 3 ...``. | ||
|
||
.. note:: | ||
|
||
**Name matching rules:** Xarray follows simple but important-to-understand name matching rules for dimensions and coordinates. Let ``arr`` be an array with an existing dimension ``x`` and assigned new coordinates ``new_coords``. If ``new_coords`` is a list-like for e.g. ``[1, 2, 3]`` then they must be assigned a name that matches an existing dimension. For example, if ``arr.assign_coords({'x': [1, 2, 3]}).`` | ||
|
||
However, if ``new_coords`` is a one-dimensional ``DataArray``, then the rules are slightly more complex. In this case, if both ``new_coords``'s name and only dimension match any dimension name in ``arr.dims``, it is assigned as a dimension coordinate to ``arr``. If ``new_coords``'s name matches a name in ``arr.dims`` but its own dimension name does not, it is assigned as a non-dimension coordinate with name ``new_coords.dims[0]`` to ``arr``. Otherwise, an exception is raised. | ||
gwgundersen marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
---- | ||
|
||
**Index:** An *index* is a data structure optimized for efficient selecting and slicing of an associated array. Xarray creates indexes for dimension coordinates so that operations along dimensions are fast, while non-dimension coordinates are not indexed. Under the hood, indexes are implemented as :py:class:`pandas.Index` objects. The index associated with dimension name ``x`` can be retrieved by ``arr.indexes[x]``. By construction, ``len(arr.dims) == len(arr.indexes)`` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.