Section10.4Interval Estimates - Confidence Interval for p
Sometimes selecting a value for p for a Binomial, Geometric, or Negative Binomial distribution problem can be done by using a theoretical value. Indeed, when flipping a coin it is reasonable to assume p = 1/2 is the probability of getting a head on one flip. Similarly, it is reasonable to assume p = 1/6 when you are looking for a particular side of a 6-sided die. However, many times you will want to deal with a problem in which it is not possible to determine exactly the precise value for the likelihood of success such as your true probability of making a free throw in basketball or knowing the true percentage of the electorate that will vote for your favorite candidate.
In these later situations, we found in the previous section that relative frequency \(\frac{Y}{n}\) is generally a good way to estimate p. In this section, you will investigate how to measure the closeness--and thereby assure some confidence in that estimate--regarding how well the point estimate approximates the actual value of p.
Definition10.4.1.Confidence Intervals for p.
Given a point estimate \(\tilde{p}\) for p, a confidence interval for p is a range of values which contains the actual value of p with high probability. In notation, a two-sided confidence interval for p is of the form
is approximately standard normal for large n. Presuming that \(\tilde{p} \approx p\) and replacing the unknown p terms on the bottom with \(\tilde{p}\) gives
\begin{equation*}
z = \frac{\tilde{p} - p}{\sqrt{\tilde{p}(1-\tilde{p})/n}}
\end{equation*}
where z is a standard normal distribution variable. So, using the central limit theorem and the standard normal distribution, you can find the value \(z_{ \alpha/2}\) where
Setting \(E = z_{ \alpha/2}\sqrt{\tilde{p}(1-\tilde{p})/n}\) gives a way to determine a confidence interval centered on \(\tilde{p} = \frac{Y}{n}\) for p with "confidence level" \(1-\alpha\text{.}\)
To complete the interval, one needs a specific value for \(z_{ \alpha/2}\) using an inverse normal distribution calculator [STRUCT].[NUM]. Generally, one chooses confidence levels on the order of 90%, 95%, or 99% with 95% being the usual choice. Fortunately this value is easily computed using graphing calculators or other automatic methods although your ancient teacher might have been required to use tables. On a TI calculator, use
The calculators InvNorm(0.995) gives \(z_{ 0.005} \approx 2.576\text{.}\)
The work above can be summarized with the following
Theorem10.4.2.Standard Confidence interval for p.
Given a sample of size n with relative frequency \(\tilde{p}\text{,}\) the standard two-sided confidence interval at confidence level \(1-\alpha\) for the unknown proportion p is given by
where \(z_{\alpha/2}\) satisfies \(P(Z>z_{\alpha/2})=\alpha/2\) in the standard normal distribution.
To illustrate this process, enjoy this computational cell.
Notice that when computing the confidence intervals above that we choose to just replace some of the p terms with \(\tilde{p}\) so that only one p term was left and could be isolated in the middle. There are other ways to deal with this. The easiest is to take the worst case scenario for the p terms in the denominator above. Indeed, the confidence interval is made wider (and therefore more likely to contain the actual p) if the square root term is as large as possible, using basic calculus it is easy to see that p(1-p) is maximized when p = 1/2. Therefore, a second alternative is to create your confidence interval using
\begin{equation*}
z = \frac{\tilde{p} - p}{\frac{1}{2\sqrt{n}}}
\end{equation*}
and therefore \(E = \frac{z_{ \alpha/2}}{2\sqrt{n}}\text{.}\) This method should be used only when trying to create the roughest and "safest" interval.
Theorem10.4.3.Widest Confidence interval for p.
Given a sample of size n with relative frequency \(\tilde{p}\text{,}\) the two-sided confidence interval at confidence level \(1-\alpha\) that presumes the largest standard deviation for the unknown proportion p is given by
where \(z_{\alpha/2}\) satisfies \(P(Z>z_{\alpha/2})=\alpha/2\) in the standard normal distribution.
The methods for determining a confidence interval for p above depend upon a good approximation with the Central Limit Theorem. This approximation will be fine if n is relatively large. To consider a confidence interval for p when n is small, note that the binomial random variable is discrete and so expanding the interval by a factor of \(\frac{1}{2n}\) might be in order. Indeed, replace \(z_{\alpha/2}\) by \(t_{\alpha/2}(n-1)\) and continue otherwise.
Another more elaborate mechanism when n is relatively large is given by the Wilson Score. This confidence interval is more complicated than just taking \(\tilde{p}\) and adding and subtracting E. This approach notes that the possible extreme values for p must satisfy (before replacing some of the p terms with \(\tilde{p}\))
Definition10.4.4.Wilson Score Confidence Interval for p.
Presume that from a sample of size n = 400 you get Y = 144 successes. Determine 95% two-sided confidence intervals for the actual p using all three of the methods above. Note that for each you will utilize \(z_{\alpha/2} = z_{0.025} = 1.960\) and \(\tilde{p} = \frac{144}{400} = 0.36\text{.}\)
Theorem10.4.7.Determining Sample Size for proportions with a preliminary estimate.
Given a margin of error E and preliminary relative frequency estimate \(\tilde{p_0}\) the sample size needed to create the corresponding confidence interval is given by
\begin{equation*}
E = z_{ \alpha/2}\sqrt{\tilde{p}(1-\tilde{p})/n}.
\end{equation*}
Presuming E is given and n is unknown, simply solve for n (noting that n is an integer and therefore you will likely need to replace the equality with an appropriate inequality).
Theorem10.4.8.Determining Sample Size for proportions with no preliminary estimate.
Given only a margin of error E, the sample size needed to create the corresponding confidence interval is given by
Note that the maximum for \(y = x(1-x) \) occurs at \(x = 1/2, y = 1/4.\) Therefore, replacing \(\tilde{p_0}(1-\tilde{p_0} \le \frac{1}{4}\) gives the result.
Given a 99% confidence level, margin of error E=0.03, and preliminary estimate \(\tilde{p_0} = 0.35\text{,}\) notice that \(z_{\alpha / 2} = 2.58\) gives
An epidemiologist is worried about the prevalence of the flu in East Vancouver and the potential shortage of vaccines for the area. She will need to provide a recommendation for how to allocate the vaccines appropriately across the city. She takes a simple random sample of 338 people living in East Vancouver and finds that 35 have recently had the flu.
Suppose that the epidemiologist wants to re-estimate the population proportion and wishes for her 95% confidence interval to have a margin of error no larger than 0.04. How large a sample should she take to achieve this? Please carry answers to at least six decimal places in intermediate steps.