Skip to content

Commit 25ce4ad

Browse files
pre pull
1 parent c4d0c2b commit 25ce4ad

File tree

2 files changed

+106
-247
lines changed

2 files changed

+106
-247
lines changed

Chapter4_TheGreatestTheoremNeverTold/LawOfLargeNumbers.ipynb

+12-3
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@
6464
"\\end{align}\n",
6565
"\n",
6666
"\n",
67-
"Equality holds in the limit, but we can get closer and closer by using more and more samples in the average. This Law holds for *any distribution*, minus some pathological examples that only mathematicians have fun with. \n",
67+
"Equality holds in the limit, but we can get closer and closer by using more and more samples in the average. This Law holds for almost *any distribution*, minus some important cases we will encounter later.\n",
6868
"\n",
6969
"##### Example\n",
7070
"____\n",
@@ -350,7 +350,7 @@
350350
"cell_type": "markdown",
351351
"metadata": {},
352352
"source": [
353-
"What do we observe? *Without accounting for population sizes* we run the risk of making an enormous inference error: if we ignored population size, we would say that the county with the shortest and tallest individuals have been correctly circled. But this inference is wrong for the following reason. These two counties do *not* necessarily have the most extreme heights. The error is that the calculated average of the small population is not a good reflection of the true expected value of the population (which should be $\\mu =150$). The sample size/population size/$N$, whatever you wish to call it, is simply too small to invoke the Law of Large Numbers effectively. \n",
353+
"What do we observe? *Without accounting for population sizes* we run the risk of making an enormous inference error: if we ignored population size, we would say that the county with the shortest and tallest individuals have been correctly circled. But this inference is wrong for the following reason. These two counties do *not* necessarily have the most extreme heights. The error results from the calculated average of smaller populations not being a good reflection of the true expected value of the population (which in truth should be $\\mu =150$). The sample size/population size/$N$, whatever you wish to call it, is simply too small to invoke the Law of Large Numbers effectively. \n",
354354
"\n",
355355
"We provide more damning evidence against this inference. Recall the population numbers were uniformly distributed over 100 to 1500. Our intuition should tell us that the counties with the most extreme population heights should also be uniformly spread over 100 to 4000, and certainly independent of the county's population. Not so. Below are the population sizes of the counties with the most extreme heights."
356356
]
@@ -983,6 +983,15 @@
983983
"In the graphic above, you can see why sorting by mean would be sub-optimal."
984984
]
985985
},
986+
{
987+
"cell_type": "markdown",
988+
"metadata": {},
989+
"source": [
990+
"##### Example: Counting Github stars\n",
991+
"\n",
992+
"What is the average number of stars a Github repository has? How would you calculate this? There are over 6 million respositories, so there is more than enough data to invoke the Law of Large numbers. Let's start pulling some data. TODO"
993+
]
994+
},
986995
{
987996
"cell_type": "markdown",
988997
"metadata": {},
@@ -1170,4 +1179,4 @@
11701179
"metadata": {}
11711180
}
11721181
]
1173-
}
1182+
}

0 commit comments

Comments
 (0)