@@ -922,6 +922,10 @@ of applications in statistics.
922
922
:class: `NormalDist ` Examples and Recipes
923
923
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
924
924
925
+
926
+ Classic probability problems
927
+ ****************************
928
+
925
929
:class: `NormalDist ` readily solves classic probability problems.
926
930
927
931
For example, given `historical data for SAT exams
@@ -947,6 +951,10 @@ Find the `quartiles <https://en.wikipedia.org/wiki/Quartile>`_ and `deciles
947
951
>>> list (map (round , sat.quantiles(n = 10 )))
948
952
[810, 896, 958, 1011, 1060, 1109, 1162, 1224, 1310]
949
953
954
+
955
+ Monte Carlo inputs for simulations
956
+ **********************************
957
+
950
958
To estimate the distribution for a model than isn't easy to solve
951
959
analytically, :class: `NormalDist ` can generate input samples for a `Monte
952
960
Carlo simulation <https://en.wikipedia.org/wiki/Monte_Carlo_method> `_:
@@ -963,6 +971,9 @@ Carlo simulation <https://en.wikipedia.org/wiki/Monte_Carlo_method>`_:
963
971
>>> quantiles(map (model, X, Y, Z)) # doctest: +SKIP
964
972
[1.4591308524824727, 1.8035946855390597, 2.175091447274739]
965
973
974
+ Approximating binomial distributions
975
+ ************************************
976
+
966
977
Normal distributions can be used to approximate `Binomial
967
978
distributions <https://mathworld.wolfram.com/BinomialDistribution.html> `_
968
979
when the sample size is large and when the probability of a successful
@@ -1000,6 +1011,10 @@ probability that the Python room will stay within its capacity limits?
1000
1011
>>> mean(trial() <= k for i in range (10_000 ))
1001
1012
0.8398
1002
1013
1014
+
1015
+ Naive bayesian classifier
1016
+ *************************
1017
+
1003
1018
Normal distributions commonly arise in machine learning problems.
1004
1019
1005
1020
Wikipedia has a `nice example of a Naive Bayesian Classifier
@@ -1054,6 +1069,48 @@ The final prediction goes to the largest posterior. This is known as the
1054
1069
'female'
1055
1070
1056
1071
1072
+ Kernel density estimation
1073
+ *************************
1074
+
1075
+ It is possible to estimate a continuous probability density function
1076
+ from a fixed number of discrete samples.
1077
+
1078
+ The basic idea is to smooth the data using `a kernel function such as a
1079
+ normal distribution, triangular distribution, or uniform distribution
1080
+ <https://en.wikipedia.org/wiki/Kernel_(statistics)#Kernel_functions_in_common_use> `_.
1081
+ The degree of smoothing is controlled by a single
1082
+ parameter, ``h ``, representing the variance of the kernel function.
1083
+
1084
+ .. testcode ::
1085
+
1086
+ import math
1087
+
1088
+ def kde_normal(sample, h):
1089
+ "Create a continous probability density function from a sample."
1090
+ # Smooth the sample with a normal distribution of variance h.
1091
+ kernel_h = NormalDist(0.0, math.sqrt(h)).pdf
1092
+ n = len(sample)
1093
+ def pdf(x):
1094
+ return sum(kernel_h(x - x_i) for x_i in sample) / n
1095
+ return pdf
1096
+
1097
+ `Wikipedia has an example
1098
+ <https://en.wikipedia.org/wiki/Kernel_density_estimation#Example> `_
1099
+ where we can use the ``kde_normal() `` recipe to generate and plot
1100
+ a probability density function estimated from a small sample:
1101
+
1102
+ .. doctest ::
1103
+
1104
+ >>> sample = [- 2.1 , - 1.3 , - 0.4 , 1.9 , 5.1 , 6.2 ]
1105
+ >>> f_hat = kde_normal(sample, h = 2.25 )
1106
+ >>> xarr = [i/ 100 for i in range (- 750 , 1100 )]
1107
+ >>> yarr = [f_hat(x) for x in xarr]
1108
+
1109
+ The points in ``xarr `` and ``yarr `` can be used to make a PDF plot:
1110
+
1111
+ .. image :: kde_example.png
1112
+ :alt: Scatter plot of the estimated probability density function.
1113
+
1057
1114
..
1058
1115
# This modelines must appear within the last ten lines of the file.
1059
1116
kate: indent-width 3; remove-trailing-space on; replace-tabs on; encoding utf-8;
0 commit comments