diff --git a/Doc/library/kde_example.png b/Doc/library/kde_example.png new file mode 100644 index 00000000000000..f4504895699974 Binary files /dev/null and b/Doc/library/kde_example.png differ diff --git a/Doc/library/statistics.rst b/Doc/library/statistics.rst index 395b324c860389..483ebea67f0c6d 100644 --- a/Doc/library/statistics.rst +++ b/Doc/library/statistics.rst @@ -922,6 +922,10 @@ of applications in statistics. :class:`NormalDist` Examples and Recipes ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Classic probability problems +**************************** + :class:`NormalDist` readily solves classic probability problems. For example, given `historical data for SAT exams @@ -947,6 +951,10 @@ Find the `quartiles `_ and `deciles >>> list(map(round, sat.quantiles(n=10))) [810, 896, 958, 1011, 1060, 1109, 1162, 1224, 1310] + +Monte Carlo inputs for simulations +********************************** + To estimate the distribution for a model than isn't easy to solve analytically, :class:`NormalDist` can generate input samples for a `Monte Carlo simulation `_: @@ -963,6 +971,9 @@ Carlo simulation `_: >>> quantiles(map(model, X, Y, Z)) # doctest: +SKIP [1.4591308524824727, 1.8035946855390597, 2.175091447274739] +Approximating binomial distributions +************************************ + Normal distributions can be used to approximate `Binomial distributions `_ when the sample size is large and when the probability of a successful @@ -1000,6 +1011,10 @@ probability that the Python room will stay within its capacity limits? >>> mean(trial() <= k for i in range(10_000)) 0.8398 + +Naive bayesian classifier +************************* + Normal distributions commonly arise in machine learning problems. Wikipedia has a `nice example of a Naive Bayesian Classifier @@ -1054,6 +1069,48 @@ The final prediction goes to the largest posterior. This is known as the 'female' +Kernel density estimation +************************* + +It is possible to estimate a continuous probability density function +from a fixed number of discrete samples. + +The basic idea is to smooth the data using `a kernel function such as a +normal distribution, triangular distribution, or uniform distribution +`_. +The degree of smoothing is controlled by a single +parameter, ``h``, representing the variance of the kernel function. + +.. testcode:: + + import math + + def kde_normal(sample, h): + "Create a continous probability density function from a sample." + # Smooth the sample with a normal distribution of variance h. + kernel_h = NormalDist(0.0, math.sqrt(h)).pdf + n = len(sample) + def pdf(x): + return sum(kernel_h(x - x_i) for x_i in sample) / n + return pdf + +`Wikipedia has an example +`_ +where we can use the ``kde_normal()`` recipe to generate and plot +a probability density function estimated from a small sample: + +.. doctest:: + + >>> sample = [-2.1, -1.3, -0.4, 1.9, 5.1, 6.2] + >>> f_hat = kde_normal(sample, h=2.25) + >>> xarr = [i/100 for i in range(-750, 1100)] + >>> yarr = [f_hat(x) for x in xarr] + +The points in ``xarr`` and ``yarr`` can be used to make a PDF plot: + +.. image:: kde_example.png + :alt: Scatter plot of the estimated probability density function. + .. # This modelines must appear within the last ten lines of the file. kate: indent-width 3; remove-trailing-space on; replace-tabs on; encoding utf-8;