Skip to content

Commit 2b6dc2a

Browse files
authored
Add another example to the statistics docs (GH-107904)
1 parent 9b75ada commit 2b6dc2a

File tree

2 files changed

+57
-0
lines changed

2 files changed

+57
-0
lines changed

Doc/library/kde_example.png

324 KB
Loading

Doc/library/statistics.rst

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -922,6 +922,10 @@ of applications in statistics.
922922
:class:`NormalDist` Examples and Recipes
923923
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
924924

925+
926+
Classic probability problems
927+
****************************
928+
925929
:class:`NormalDist` readily solves classic probability problems.
926930

927931
For example, given `historical data for SAT exams
@@ -947,6 +951,10 @@ Find the `quartiles <https://en.wikipedia.org/wiki/Quartile>`_ and `deciles
947951
>>> list(map(round, sat.quantiles(n=10)))
948952
[810, 896, 958, 1011, 1060, 1109, 1162, 1224, 1310]
949953

954+
955+
Monte Carlo inputs for simulations
956+
**********************************
957+
950958
To estimate the distribution for a model than isn't easy to solve
951959
analytically, :class:`NormalDist` can generate input samples for a `Monte
952960
Carlo simulation <https://en.wikipedia.org/wiki/Monte_Carlo_method>`_:
@@ -963,6 +971,9 @@ Carlo simulation <https://en.wikipedia.org/wiki/Monte_Carlo_method>`_:
963971
>>> quantiles(map(model, X, Y, Z)) # doctest: +SKIP
964972
[1.4591308524824727, 1.8035946855390597, 2.175091447274739]
965973

974+
Approximating binomial distributions
975+
************************************
976+
966977
Normal distributions can be used to approximate `Binomial
967978
distributions <https://mathworld.wolfram.com/BinomialDistribution.html>`_
968979
when the sample size is large and when the probability of a successful
@@ -1000,6 +1011,10 @@ probability that the Python room will stay within its capacity limits?
10001011
>>> mean(trial() <= k for i in range(10_000))
10011012
0.8398
10021013

1014+
1015+
Naive bayesian classifier
1016+
*************************
1017+
10031018
Normal distributions commonly arise in machine learning problems.
10041019

10051020
Wikipedia has a `nice example of a Naive Bayesian Classifier
@@ -1054,6 +1069,48 @@ The final prediction goes to the largest posterior. This is known as the
10541069
'female'
10551070

10561071

1072+
Kernel density estimation
1073+
*************************
1074+
1075+
It is possible to estimate a continuous probability density function
1076+
from a fixed number of discrete samples.
1077+
1078+
The basic idea is to smooth the data using `a kernel function such as a
1079+
normal distribution, triangular distribution, or uniform distribution
1080+
<https://en.wikipedia.org/wiki/Kernel_(statistics)#Kernel_functions_in_common_use>`_.
1081+
The degree of smoothing is controlled by a single
1082+
parameter, ``h``, representing the variance of the kernel function.
1083+
1084+
.. testcode::
1085+
1086+
import math
1087+
1088+
def kde_normal(sample, h):
1089+
"Create a continous probability density function from a sample."
1090+
# Smooth the sample with a normal distribution of variance h.
1091+
kernel_h = NormalDist(0.0, math.sqrt(h)).pdf
1092+
n = len(sample)
1093+
def pdf(x):
1094+
return sum(kernel_h(x - x_i) for x_i in sample) / n
1095+
return pdf
1096+
1097+
`Wikipedia has an example
1098+
<https://en.wikipedia.org/wiki/Kernel_density_estimation#Example>`_
1099+
where we can use the ``kde_normal()`` recipe to generate and plot
1100+
a probability density function estimated from a small sample:
1101+
1102+
.. doctest::
1103+
1104+
>>> sample = [-2.1, -1.3, -0.4, 1.9, 5.1, 6.2]
1105+
>>> f_hat = kde_normal(sample, h=2.25)
1106+
>>> xarr = [i/100 for i in range(-750, 1100)]
1107+
>>> yarr = [f_hat(x) for x in xarr]
1108+
1109+
The points in ``xarr`` and ``yarr`` can be used to make a PDF plot:
1110+
1111+
.. image:: kde_example.png
1112+
:alt: Scatter plot of the estimated probability density function.
1113+
10571114
..
10581115
# This modelines must appear within the last ten lines of the file.
10591116
kate: indent-width 3; remove-trailing-space on; replace-tabs on; encoding utf-8;

0 commit comments

Comments
 (0)