diff --git a/doc/source/user_guide/advanced.rst b/doc/source/user_guide/advanced.rst index 280eb05964787..f67b7cfa05c76 100644 --- a/doc/source/user_guide/advanced.rst +++ b/doc/source/user_guide/advanced.rst @@ -965,7 +965,8 @@ If you select a label *contained* within an interval, this will also select the df.loc[2.5] df.loc[[2.5, 3.5]] -``Interval`` and ``IntervalIndex`` are used by ``cut`` and ``qcut``: +:func:`cut` and :func:`qcut` both return a ``Categorical`` object, and the bins they +create are stored as an ``IntervalIndex`` in its ``.categories`` attribute. .. ipython:: python @@ -973,13 +974,17 @@ If you select a label *contained* within an interval, this will also select the c c.categories -Furthermore, ``IntervalIndex`` allows one to bin *other* data with these same -bins, with ``NaN`` representing a missing value similar to other dtypes. +:func:`cut` also accepts an ``IntervalIndex`` for its ``bins`` argument, which enables +a useful pandas idiom. First, We call :func:`cut` with some data and ``bins`` set to a +fixed number, to generate the bins. Then, we pass the values of ``.categories`` as the +``bins`` argument in subsequent calls to :func:`cut`, supplying new data which will be +binned into the same bins. .. ipython:: python pd.cut([0, 3, 5, 1], bins=c.categories) +Any value which falls outside all bins will be assigned a ``NaN`` value. Generating ranges of intervals ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^