Skip to content

Commit f197aea

Browse files
committed
DOC: update the pandas.DataFrame.plot.kde and pandas.Series.plot.kde docstrings
Unfortunately, I was not able to compute a kernel estimate of a two-dimensional random variable. Hence, the example is more of an analysis of some independent data series.
1 parent fb556ed commit f197aea

File tree

1 file changed

+75
-19
lines changed

1 file changed

+75
-19
lines changed

Diff for: pandas/plotting/_core.py

+75-19
Original file line numberDiff line numberDiff line change
@@ -2618,13 +2618,16 @@ def hist(self, bins=10, **kwds):
26182618

26192619
def kde(self, bw_method=None, ind=None, **kwds):
26202620
"""
2621-
Kernel Density Estimate plot using Gaussian kernels.
2621+
Generate Kernel Density Estimate plot using Gaussian kernels.
26222622
2623-
In statistics, kernel density estimation (KDE) is a non-parametric way
2624-
to estimate the probability density function (PDF) of a random
2623+
In statistics, `kernel density estimation`_ (KDE) is a non-parametric
2624+
way to estimate the probability density function (PDF) of a random
26252625
variable. This function uses Gaussian kernels and includes automatic
26262626
bandwith determination.
26272627
2628+
.. _kernel density estimation:
2629+
https://en.wikipedia.org/wiki/Kernel_density_estimation
2630+
26282631
Parameters
26292632
----------
26302633
bw_method : str, scalar or callable, optional
@@ -2635,26 +2638,27 @@ def kde(self, bw_method=None, ind=None, **kwds):
26352638
ind : NumPy array or integer, optional
26362639
Evaluation points for the estimated PDF. If None (default),
26372640
1000 equally spaced points are used. If `ind` is a NumPy array, the
2638-
kde is evaluated at the points passed. If `ind` is an integer,
2641+
KDE is evaluated at the points passed. If `ind` is an integer,
26392642
`ind` number of equally spaced points are used.
2640-
kwds : optional
2643+
**kwds : optional
26412644
Additional keyword arguments are documented in
26422645
:meth:`pandas.Series.plot`.
26432646
26442647
Returns
26452648
-------
26462649
axes : matplotlib.AxesSubplot or np.array of them
26472650
2648-
See also
2651+
See Also
26492652
--------
26502653
scipy.stats.gaussian_kde : Representation of a kernel-density
26512654
estimate using Gaussian kernels. This is the function used
26522655
internally to estimate the PDF.
2656+
DataFrame.plot.kde : Generate a KDE plot for a DataFrame.
26532657
26542658
Examples
26552659
--------
26562660
Given a Series of points randomly sampled from an unknown
2657-
distribution, estimate this distribution using KDE with automatic
2661+
distribution, estimate its distribution using KDE with automatic
26582662
bandwidth determination and plot the results, evaluating them at
26592663
1000 equally spaced points (default):
26602664
@@ -2664,10 +2668,9 @@ def kde(self, bw_method=None, ind=None, **kwds):
26642668
>>> s = pd.Series([1, 2, 2.5, 3, 3.5, 4, 5])
26652669
>>> ax = s.plot.kde()
26662670
2667-
2668-
An scalar fixed bandwidth can be specified. Using a too small bandwidth
2669-
can lead to overfitting, while a too large bandwidth can result in
2670-
underfitting:
2671+
A scalar bandwidth can be specified. Using a small bandwidth value can
2672+
lead to overfitting, while using a large bandwidth value may result
2673+
in underfitting:
26712674
26722675
.. plot::
26732676
:context: close-figs
@@ -2851,27 +2854,80 @@ def hist(self, by=None, bins=10, **kwds):
28512854

28522855
def kde(self, bw_method=None, ind=None, **kwds):
28532856
"""
2854-
Kernel Density Estimate plot
2857+
Generate Kernel Density Estimate plot using Gaussian kernels.
2858+
2859+
In statistics, `kernel density estimation`_ (KDE) is a non-parametric
2860+
way to estimate the probability density function (PDF) of a random
2861+
variable. This function uses Gaussian kernels and includes automatic
2862+
bandwith determination.
2863+
2864+
.. _kernel density estimation:
2865+
https://en.wikipedia.org/wiki/Kernel_density_estimation
28552866
28562867
Parameters
28572868
----------
2858-
bw_method: str, scalar or callable, optional
2859-
The method used to calculate the estimator bandwidth. This can be
2869+
bw_method : str, scalar or callable, optional
2870+
The method used to calculate the estimator bandwidth. This can be
28602871
'scott', 'silverman', a scalar constant or a callable.
28612872
If None (default), 'scott' is used.
28622873
See :class:`scipy.stats.gaussian_kde` for more information.
28632874
ind : NumPy array or integer, optional
2864-
Evaluation points. If None (default), 1000 equally spaced points
2865-
are used. If `ind` is a NumPy array, the kde is evaluated at the
2866-
points passed. If `ind` is an integer, `ind` number of equally
2867-
spaced points are used.
2868-
`**kwds` : optional
2875+
Evaluation points for the estimated PDF. If None (default),
2876+
1000 equally spaced points are used. If `ind` is a NumPy array, the
2877+
KDE is evaluated at the points passed. If `ind` is an integer,
2878+
`ind` number of equally spaced points are used.
2879+
**kwds : optional
28692880
Additional keyword arguments are documented in
28702881
:meth:`pandas.DataFrame.plot`.
28712882
28722883
Returns
28732884
-------
28742885
axes : matplotlib.AxesSubplot or np.array of them
2886+
2887+
See Also
2888+
--------
2889+
scipy.stats.gaussian_kde : Representation of a kernel-density
2890+
estimate using Gaussian kernels. This is the function used
2891+
internally to estimate the PDF.
2892+
Series.plot.kde : Generate a KDE plot for a Series.
2893+
2894+
Examples
2895+
--------
2896+
Given several Series of points randomly sampled from unknown
2897+
distributions, estimate their distribution using KDE with automatic
2898+
bandwidth determination and plot the results, evaluating them at
2899+
1000 equally spaced points (default):
2900+
2901+
.. plot::
2902+
:context: close-figs
2903+
2904+
>>> df = pd.DataFrame({
2905+
... 'x': [1, 2, 2.5, 3, 3.5, 4, 5],
2906+
... 'y': [4, 4, 4.5, 5, 5.5, 6, 6],
2907+
... })
2908+
>>> ax = df.plot.kde()
2909+
2910+
A scalar bandwidth can be specified. Using a small bandwidth value can
2911+
lead to overfitting, while using a large bandwidth value may result
2912+
in underfitting:
2913+
2914+
.. plot::
2915+
:context: close-figs
2916+
2917+
>>> ax = df.plot.kde(bw_method=0.3)
2918+
2919+
.. plot::
2920+
:context: close-figs
2921+
2922+
>>> ax = df.plot.kde(bw_method=3)
2923+
2924+
Finally, the `ind` parameter determines the evaluation points for the
2925+
plot of the estimated PDF:
2926+
2927+
.. plot::
2928+
:context: close-figs
2929+
2930+
>>> ax = df.plot.kde(ind=[1, 2, 3, 4, 5, 6])
28752931
"""
28762932
return self(kind='kde', bw_method=bw_method, ind=ind, **kwds)
28772933

0 commit comments

Comments
 (0)