Section 10.2 Interval Estimates - Chebyshev
An interval centered on the mean in which at least a certain proportion of the actual data must lie.
Theorem 10.2.1 Chebyshev's Theorem
Given a random variable X with given mean \(\mu\) and standard deviation \(\sigma\text{,}\) for \(a \in \mathbb{R}^+\) ,
\begin{equation*}
P( \big | X - \mu \big | \lt a ) \gt 1 - \frac{\sigma^2}{a^2}
\end{equation*}
Proof
Notice that the variance of a continuous variable X is given by
\begin{align*}
\sigma^2 & = \int_{-\infty}^{\infty} (x - \mu)^2 f(x) dx\\
& \ge \int_{-\infty}^{\mu-a} (x - \mu)^2 f(x) dx + \int_{\mu + a}^{\infty} (x - \mu)^2 f(x) dx\\
& \ge \int_{-\infty}^{\mu-a} a^2 f(x) dx + \int_{\mu + a}^{\infty} a^2 f(x) dx\\
& = a^2 \big ( \int_{-\infty}^{\mu-a} f(x) dx + \int_{\mu + a}^{\infty} f(x) dx \big )\\
& = a^2 P( X \le \mu - a \text{or} X \ge \mu + a )\\
& = a^2 P( \big | \mu - a \big | \ge a)
\end{align*}
Dividing by \(a^2\) and taking the complement gives the result.
Corollary 10.2.2 Alternate Form for Chebyshev's Theorem
For positive k,
\begin{equation*}
P( \big | X - \mu \big | \lt k \sigma ) \gt 1 - \frac{1}{k^2}
\end{equation*}
Corollary 10.2.3 Special Cases for Chebyshev's Theorem
For any distribution, it is not possible for f(x)=0 within one standard deviation of the mean. Aslo, at least 75% of the data for any distribution must lie within two standard deviations of the mean and at least 88% must lie within three.Proof
Apply the Chebyshev Theorem with \(a = \sigma\) to get
\begin{equation*}
P(\mu - \sigma \lt X \lt \mu + \sigma) \gt 1 - \frac{\sigma^2}{\sigma^2} = 0
\end{equation*}
Apply the Chebyshev Theorem with \(a = 2 \sigma\) to get \(1 - \frac{1}{2^2} = 0.75\) and with \(k = 3 \sigma\) to get \(1 - \frac{1}{3^2} = \frac{8}{9} > 0.8888\text{.}\)
Example 10.2.4 - Comparing known distribution to Chebyshev