Skip to main content

Section 9.7 Central Limit Theorem

Often, when one wants to solve various scientific problems, several assumptions will be made regarding the nature of the underlying setting and base their conclusions on those assumptions. Indeed, if one is going to use a Binomial Distribution or a Negative Binomial Distribution, an assumption on the value of p is necessary. For Poisson and Exponential Distributions, one must know the mean. For Normal Distributions, one must assume values for both the mean and the standard deviation. Where do these values come from? Often, one may perform a preliminary study and obtain a sample statistic...such as a sample mean or a relative frequency and use these values for ΞΌ or p.
But what is the underlying distribution of these sample statistics? The Central Limit Theorem gives the answer...
The results from the previous section illustrate the tendency for bell-shaped distributions. This tendency can be described more mathematically through the following theorem. It is presented here without proof.
Often the Central Limit Theorem is stated more formally using a conversion to standard units. Indeed, the theorem indicates that the random variable X― has variance Οƒ2n which means as n grows this variance approaches 0. So, the limiting random variable has a zero variance and therefore is no longer a random variable. To avoid this issue, the Central Limit Theorem is often stated as:
For random variables
Wn=Xβ€•βˆ’ΞΌΟƒ/n
with corresponding distribution function Fn(Wn),
limnβ†’βˆžFn(c)=βˆ«βˆ’βˆžc12Ο€eβˆ’z2/2dz=Ξ¦(c)
that is, the standard normal distribution function.
While we are at it, we can again "go backwards" and figure out the mean and variance if given some probabilities.
A sample of n=11 observations is drawn from a normal population with ΞΌ=920 and Οƒ=250. Find each of the following:
A. P(XΒ―>1055)
Probability =
B. P(XΒ―<791)
Probability =
C. P(XΒ―>844)
Probability =
Answer 1.
0.0366485
Answer 2.
0.043506
Answer 3.
0.843334
Consider an exponential variable X with mean time till first success of ΞΌ=4. Then, Οƒ=2 using the exponential formulas.
You can use the exponential probability function to compute probabilities dealing with X. Indeed,
P(X<3.9)=F(3.9)=1βˆ’eβˆ’3.9/4β‰ˆ0.6228.
If instead you plan to sample from this distribution n=32 times, the Central Limit Theorem implies that you will get a random variable X― which has an approximate normal distribution with the same mean but with new variance ΟƒX―2=432=18. Therefore
P(X―<3.9)β‰ˆnormalcdf(0,3.9,4,sqrt(1/8))=0.3886.
When converting probability problems from continuous (such as exponential or uniform) then no adjustment to the question is needed since you are approximating one area with another area. However, when converting probability problems from discrete (such as binomial or geometric) then you need to consider how the interval would need to be adjusted so that histogram areas for the discrete problem would relate to areas under the normal curve. Generally, you will need to expand the stated interval each way by 1/2.
The Central Limit Theorem provides that regardless of the distribution of X, the distribution of an average of X’s is approximately normally distributed. However, it also shows why X may also be approximated for some distributions using the normal distribution as certain parameters are allowed to increase. Below, you can see how Binomial and Poisson distributions can be approximated directly using the Normal distribution.
Toward that end, for 0<p<1 consider a sequence of Bernoulli trials Y1,Y2,...,Yn with each over the space {0,1}. Then,
X=βˆ‘k=1nYk
is a Binomial variable.
Using the Bernoulli variables Yk each with mean p and variance p(1-p), note that the Central Limit Theorem applied to X―=βˆ‘Ykn gives that
Xβ€•βˆ’pp(1βˆ’p)/n
is approximately standard normal. By multiplying top and bottom by n yields
βˆ‘Ykβˆ’npnp(1βˆ’p)
is approximately standard normal. But βˆ‘Yk actually is the sum of the number of successes in n trials and is therefore a Binomial variable.
Binomial becomes normal as nβ†’βˆž. Consider n = 50 and p = 0.3. Then, ΞΌ=15 and Οƒ2=10.5.
Using the binomial formulas, for example,
P(X=16)=(5016)0.316β‹…0.734β‰ˆ0.11470
Using the normal distribution,
P(X=16)=P(15.5<X<16.5)β‰ˆnormalcdf(15.5,16.5,15,sqrt(10.5))=0.11697
Notice that these are very close.
Note from before that the Poisson distribution function was derived by approximating with Binomial and letting n approach infinity. Therefore, by the previous theorem, the Poisson variable is also approximately Normal using the Poisson mean and variance rather than the binomial’s. Indeed, in standard units
Yβˆ’ΞΌΞΌ
is approximately normal for large ΞΌ.
Poisson becomes normal as ΞΌβ†’βˆž. Consider ΞΌ=20. Then, Οƒ2=ΞΌ=20.
Using the Poisson formulas, for example,
P(X=19)=2019eβˆ’2019!β‰ˆ0.08883
Using the normal distribution,
P(X=19)=P(18.5<X<19.5)β‰ˆnormalcdf(18.5,19.5,20,sqrt(20))=0.08683
Again, these are very close.
Gamma becomes normal as rβ†’βˆž. Assume that the average time till a first success is 12 minutes and that r=8. Then, the mean for the Gamma distribution is ΞΌ=12β‹…8=96 and Οƒ2=8β‹…122=1152 and so Οƒβ‰ˆ33.9411.
Using the Gamma formulas,
P(90≀X≀100)=∫90100f(x)dx=0.59252βˆ’0.47536=0.11716.
Using the normal distribution,
P(90≀X≀100)β‰ˆnormalcdf(90,100,96,33.9411)=0.11707.
Amazingly, these are also very close.
Consider a discrete uniform variable X over R = {1,2,...,20}. Then, ΞΌ=10.5 and Οƒ=202βˆ’1220 using the uniform formulas.
You can use the uniform probability function to compute probabilities dealing with X. Indeed,
P(8≀X<12)=P(X∈{8,9,10,11}=420=1/5.
If instead you plan to sample from this distribution n=49 times, the Central Limit Theorem implies that you will get a random variable X― which has an approximate normal distribution with the same mean but with new variance ΟƒX―2=199/2049=199580. Therefore, expanding the interval to include the boundaries of the corresponding histogram areas,
P(8≀X―<12)=P(7.5≀X―≀11.5)β‰ˆnormalcdf(7.5,11.5,10.5,0.585750)β‰ˆ0.9561.
As these examples illustrate, you will have increasing success in approximating the desired probabilities so long as the distribution’s corresponding parameter is allowed to be "sufficiently large". The mathematical reasoning this is true is not provided but depends upon the "Central Limit Theorem" discussed in the next section.
The above theorems allow you to utilize the normal distribution to compute approximate probabilities for the variable X in the stated distributions. This is not always true for all distributions since some do not have parameters which allow for approaching normality. However, regardless of the distribution the Central Limit Theorem always allows you to approximate probabilities if they involve an average of repeated attempts...that is, for variable X―. This usefulness is illustrated in the examples below.