Skip to content

Commit 7ad2544

Browse files
Benoit Bovyshoyer
Benoit Bovy
authored andcommitted
Add set_index, reset_index and reorder_levels methods (#1028)
* add Dataset.set_index method * add set_index and reset_index methods for dataarray and dataset * add reorder_levels method for dataset and dataarray * add tests * update doc * fix tests py27 * review changes * fix unresolved rebase conflict * fix reset_index example in docs * fix docstring * change signature of reset_index * add type annotations * update missing coordinate dims * fix and update docs * updated doc
1 parent 352cfd5 commit 7ad2544

File tree

7 files changed

+502
-9
lines changed

7 files changed

+502
-9
lines changed

doc/api.rst

+6
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,9 @@ Indexing
106106
Dataset.squeeze
107107
Dataset.reindex
108108
Dataset.reindex_like
109+
Dataset.set_index
110+
Dataset.reset_index
111+
Dataset.reorder_levels
109112

110113
Computation
111114
-----------
@@ -239,6 +242,9 @@ Indexing
239242
DataArray.squeeze
240243
DataArray.reindex
241244
DataArray.reindex_like
245+
DataArray.set_index
246+
DataArray.reset_index
247+
DataArray.reorder_levels
242248

243249
Comparisons
244250
-----------

doc/reshaping.rst

+61-5
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
Reshaping and reorganizing data
55
###############################
66

7-
These methods allow you to reorganize
7+
These methods allow you to reorganize
88

99
.. ipython:: python
1010
:suppress:
@@ -95,23 +95,79 @@ always succeeds, even if the multi-index being unstacked does not contain all
9595
possible levels. Missing levels are filled in with ``NaN`` in the resulting object:
9696

9797
.. ipython:: python
98-
98+
9999
stacked2 = stacked[::2]
100-
stacked2
100+
stacked2
101101
stacked2.unstack('z')
102102
103103
However, xarray's ``stack`` has an important difference from pandas: unlike
104104
pandas, it does not automatically drop missing values. Compare:
105105

106106
.. ipython:: python
107-
107+
108108
array = xr.DataArray([[np.nan, 1], [2, 3]], dims=['x', 'y'])
109-
array.stack(z=('x', 'y'))
109+
array.stack(z=('x', 'y'))
110110
array.to_pandas().stack()
111111
112112
We departed from pandas's behavior here because predictable shapes for new
113113
array dimensions is necessary for :ref:`dask`.
114114

115+
.. _reshape.set_index:
116+
117+
Set and reset index
118+
-------------------
119+
120+
Complementary to stack / unstack, xarray's ``.set_index``, ``.reset_index`` and
121+
``.reorder_levels`` allow easy manipulation of ``DataArray`` or ``Dataset``
122+
multi-indexes without modifying the data and its dimensions.
123+
124+
You can create a multi-index from several 1-dimensional variables and/or
125+
coordinates using :py:meth:`~xarray.DataArray.set_index`:
126+
127+
.. ipython:: python
128+
129+
da = xr.DataArray(np.random.rand(4),
130+
coords={'band': ('x', ['a', 'a', 'b', 'b']),
131+
'wavenumber': ('x', np.linspace(200, 400, 4))},
132+
dims='x')
133+
da
134+
mda = da.set_index(x=['band', 'wavenumber'])
135+
mda
136+
137+
These coordinates can now be used for indexing, e.g.,
138+
139+
.. ipython:: python
140+
141+
mda.sel(band='a')
142+
143+
Conversely, you can use :py:meth:`~xarray.DataArray.reset_index`
144+
to extract multi-index levels as coordinates (this is mainly useful
145+
for serialization):
146+
147+
.. ipython:: python
148+
149+
mda.reset_index('x')
150+
151+
:py:meth:`~xarray.DataArray.reorder_levels` allows changing the order
152+
of multi-index levels:
153+
154+
.. ipython:: python
155+
156+
mda.reorder_levels(x=['wavenumber', 'band'])
157+
158+
As of xarray v0.9 coordinate labels for each dimension are optional.
159+
You can also use ``.set_index`` / ``.reset_index`` to add / remove
160+
labels for one or several dimensions:
161+
162+
.. ipython:: python
163+
164+
array = xr.DataArray([1, 2, 3], dims='x')
165+
array
166+
array['c'] = ('x', ['a', 'b', 'c'])
167+
array.set_index(x='c')
168+
array.set_index(x='c', inplace=True)
169+
array.reset_index('x', drop=True)
170+
115171
Shift and roll
116172
--------------
117173

doc/whats-new.rst

+3
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,9 @@ Enhancements
111111
as keyword arguments, e.g., ``ds.sel(time='2000-01')``
112112
(see :ref:`multi-level indexing`).
113113
By `Benoit Bovy <https://github.com/benbovy>`_.
114+
- Added ``set_index``, ``reset_index`` and ``reorder_levels`` methods to
115+
easily create and manipulate (multi-)indexes (see :ref:`reshape.set_index`).
116+
By `Benoit Bovy <https://github.com/benbovy>`_.
114117
- Added the ``compat`` option ``'no_conflicts'`` to ``merge``, allowing the
115118
combination of xarray objects with disjoint (:issue:`742`) or
116119
overlapping (:issue:`835`) coordinates as long as all present data agrees.

xarray/core/dataarray.py

+98-1
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818
from .common import AbstractArray, BaseDataObject
1919
from .coordinates import (DataArrayCoordinates, LevelCoordinatesSource,
2020
Indexes)
21-
from .dataset import Dataset
21+
from .dataset import Dataset, merge_indexes, split_indexes
2222
from .pycompat import iteritems, basestring, OrderedDict, zip, range
2323
from .variable import (as_variable, Variable, as_compatible_data,
2424
IndexVariable,
@@ -842,6 +842,103 @@ def swap_dims(self, dims_dict):
842842
ds = self._to_temp_dataset().swap_dims(dims_dict)
843843
return self._from_temp_dataset(ds)
844844

845+
def set_index(self, append=False, inplace=False, **indexes):
846+
"""Set DataArray (multi-)indexes using one or more existing coordinates.
847+
848+
Parameters
849+
----------
850+
append : bool, optional
851+
If True, append the supplied index(es) to the existing index(es).
852+
Otherwise replace the existing index(es) (default).
853+
inplace : bool, optional
854+
If True, set new index(es) in-place. Otherwise, return a new DataArray
855+
object.
856+
**indexes : {dim: index, ...}
857+
Keyword arguments with names matching dimensions and values given
858+
by (lists of) the names of existing coordinates or variables to set
859+
as new (multi-)index.
860+
861+
Returns
862+
-------
863+
obj : DataArray
864+
Another dataarray, with this dataarray's data but replaced coordinates.
865+
866+
See Also
867+
--------
868+
DataArray.reset_index
869+
"""
870+
coords, _ = merge_indexes(indexes, self._coords, set(), append=append)
871+
if inplace:
872+
self._coords = coords
873+
else:
874+
return self._replace(coords=coords)
875+
876+
def reset_index(self, dims_or_levels, drop=False, inplace=False):
877+
"""Reset the specified index(es) or multi-index level(s).
878+
879+
Parameters
880+
----------
881+
dims_or_levels : str or list
882+
Name(s) of the dimension(s) and/or multi-index level(s) that will
883+
be reset.
884+
drop : bool, optional
885+
If True, remove the specified indexes and/or multi-index levels
886+
instead of extracting them as new coordinates (default: False).
887+
inplace : bool, optional
888+
If True, modify the dataarray in-place. Otherwise, return a new
889+
DataArray object.
890+
891+
Returns
892+
-------
893+
obj : DataArray
894+
Another dataarray, with this dataarray's data but replaced
895+
coordinates.
896+
897+
See Also
898+
--------
899+
DataArray.set_index
900+
"""
901+
coords, _ = split_indexes(dims_or_levels, self._coords, set(),
902+
self._level_coords, drop=drop)
903+
if inplace:
904+
self._coords = coords
905+
else:
906+
return self._replace(coords=coords)
907+
908+
def reorder_levels(self, inplace=False, **dim_order):
909+
"""Rearrange index levels using input order.
910+
911+
Parameters
912+
----------
913+
inplace : bool, optional
914+
If True, modify the dataarray in-place. Otherwise, return a new
915+
DataArray object.
916+
**dim_order : optional
917+
Keyword arguments with names matching dimensions and values given
918+
by lists representing new level orders. Every given dimension
919+
must have a multi-index.
920+
921+
Returns
922+
-------
923+
obj : DataArray
924+
Another dataarray, with this dataarray's data but replaced
925+
coordinates.
926+
"""
927+
replace_coords = {}
928+
for dim, order in dim_order.items():
929+
coord = self._coords[dim]
930+
index = coord.to_index()
931+
if not isinstance(index, pd.MultiIndex):
932+
raise ValueError("coordinate %r has no MultiIndex" % dim)
933+
replace_coords[dim] = IndexVariable(coord.dims,
934+
index.reorder_levels(order))
935+
coords = self._coords.copy()
936+
coords.update(replace_coords)
937+
if inplace:
938+
self._coords = coords
939+
else:
940+
return self._replace(coords=coords)
941+
845942
def stack(self, **dimensions):
846943
"""
847944
Stack any number of existing dimensions into a single new dimension.

0 commit comments

Comments
 (0)