Sometimes selecting a value for p for a Binomial, Geometric, or Negative Binomial distribution problem can be done by using a theoretical value. Indeed, when flipping a coin it is reasonable to assume p = 1/2 is the probability of getting a head on one flip. Similarly, it is reasonable to assume p = 1/6 when you are looking for a particular side of a 6-sided die. However, many times you will want to deal with a problem in which it is not possible to determine exactly the precise value for the likelihood of success such as your true probability of making a free throw in basketball or knowing the true percentage of the electorate that will vote for your favorite candidate.
In these later situations, we found in the previous section that relative frequency
is generally a good way to estimate p. In this section, you will investigate how to measure the closeness--and thereby assure some confidence in that estimate--regarding how well the point estimate approximates the actual value of p.
To determine E carefully, note that from the central limit theorem
is approximately standard normal for large n. Presuming that
and replacing the unknown p terms on the bottom with
gives
where z is a standard normal distribution variable. So, using the central limit theorem and the standard normal distribution, you can find the value
where
or by rearranging the inside inequality
Setting
gives a way to determine a confidence interval centered on
for p with "confidence level"
To illustrate this process, enjoy this computational cell.
Notice that when computing the confidence intervals above that we choose to just replace some of the p terms with
so that only one p term was left and could be isolated in the middle. There are other ways to deal with this. The easiest is to take the worst case scenario for the p terms in the denominator above. Indeed, the confidence interval is made wider (and therefore more likely to contain the actual p) if the square root term is as large as possible, using basic calculus it is easy to see that p(1-p) is maximized when p = 1/2. Therefore, a second alternative is to create your confidence interval using
and therefore
This method should be used only when trying to create the roughest and "safest" interval.
The methods for determining a confidence interval for p above depend upon a good approximation with the Central Limit Theorem. This approximation will be fine if n is relatively large. To consider a confidence interval for p when n is small, note that the binomial random variable is discrete and so expanding the interval by a factor of
might be in order. Indeed, replace
by
and continue otherwise.
Another more elaborate mechanism when n is relatively large is given by the Wilson Score. This confidence interval is more complicated than just taking
and adding and subtracting E. This approach notes that the possible extreme values for p must satisfy (before replacing some of the p terms with
)
To relate the Wilson Score with the standard approach for creating a confidence interval for p seen above, note that
can be simplified by squaring both sides to get
Replacing
with the relative frequency gives
Solving for p using the quadratic formula and simplifying ultimately results in the described interval.
Presume that from a sample of size n = 400 you get Y = 144 successes. Determine 95% two-sided confidence intervals for the actual p using all three of the methods above. Note that for each you will utilize and
Normal Interval:
or
or
or
So, there is a 95% chance that the actual value for p lies inside the interval
Maximal Interval:
or
or
Notice the interval is only slightly wider than when using to estimate p in the first case.
Wilson Score Interval: Let’s do this on in parts...
Therefore,
or
which is slightly different than the first and slightly smaller than the second.
Note that the maximum for occurs at Therefore, replacing gives the result.