Skip to main content

Section 10.2 Interval Estimates - Chebyshev

An interval centered on the mean in which at least a certain proportion of the actual data must lie.
First, notice that if \(X > \mu + a\text{,}\) then \(X - \mu \> a\) and so \((x-\mu)^2 > a^2\text{.}\) Similarly for \(X < \mu - a\text{,}\) \((x-\mu)^2 > a^2\text{.}\)
Starting with the definition of variance for a continuous variable X,
\begin{align*} \sigma^2 & = \int_{-\infty}^{\infty} (x - \mu)^2 f(x) dx\\ & \ge \int_{-\infty}^{\mu-a} (x - \mu)^2 f(x) dx + \int_{\mu + a}^{\infty} (x - \mu)^2 f(x) dx\\ & \ge \int_{-\infty}^{\mu-a} a^2 f(x) dx + \int_{\mu + a}^{\infty} a^2 f(x) dx\\ & = a^2 \left ( \int_{-\infty}^{\mu-a} f(x) dx + \int_{\mu + a}^{\infty} f(x) dx \right )\\ & = a^2 P( X \le \mu - a \text{ or } X \ge \mu + a )\\ & = a^2 P( \big | X - \mu \big | \ge a) \end{align*}
Dividing by \(a^2\) and taking the complement gives the result.
Set \(a = k\sigma \) and plug into Chebyshev's Theorem.
Apply the Chebyshev Theorem with \(a = \sigma\) to get
\begin{equation*} P(\mu - \sigma \lt X \lt \mu + \sigma) \gt 1 - \frac{\sigma^2}{\sigma^2} = 0 \end{equation*}
Apply the Chebyshev Theorem with \(a = 2 \sigma\) to get \(1 - \frac{1}{2^2} = 0.75\) and with \(k = 3 \sigma\) to get \(1 - \frac{1}{3^2} = \frac{8}{9} > 0.8888\text{.}\)
Chebyshev's Theorem requires you to know the mean and standard deviation of the variable if you are seeking the lower bound. On the other hand, you can "go backward" and find the mean and standard deviation for a given interval if you presume that the unknown mean is actually the midpoint of the interval and that \(a\) is the (equal) distance from that midpoint to either endpoint. Use this when working the exercise below.
A statistician uses Chebyshev's Theorem to estimate that at least 20 % of a population lies between the values 4 and 18. Use this information to find the values of the population mean, \(\mu\) , and the population standard deviation \(\sigma\text{.}\)
a) \(\mu =\)
b) \(\sigma =\)
Answer 1.
\(11\)
Answer 2.
\(6.26099033699941\)
Suppose that the blood pressure of the human inhabitants of a certain Pacific island is distributed with mean, \(\mu\) = 82 mmHg and standard deviation , \(\sigma\) = 10 mmHg. According to Chebyshev's Theorem, at least what percentage of the islander's have blood pressure in the range from 49 mmHg to 115 mmHg ?
answer: %
Answer.
\(90.8172635445363\)
Suppose that you have an exponential random variable X with mean 7. Using properties of exponential distributions, you also know that the standard deviation is 7. Also, you should note that for an exponential distribution the random variable represents time and thus can never be smaller than 0. It follows then that
\begin{equation*} P( \mu - 1.8 \sigma \le X \le \mu + 1.8 \sigma) = P( 7 - 1.8 \cdot 7 \le X \le 7 + 1.8 \cdot 7) \\ = P( 0 \le X \le 19.6) = F(19.6) \approx 0.939. \end{equation*}
since the exponential distribution has a known distribution function.
However, using the Chebyshev's Theorem,
\begin{equation*} P( \mu - 1.8 \sigma \le X \le \mu + 1.8 \sigma) = P( \big | X - \mu \big | \lt 1.8 \cdot \sigma ) \gt 1 - \frac{1}{{1.8}^2} \approx 0.691. \end{equation*}
The difference in these two results is not a problem since the first is designed to give you a precise answer with the knowledge that X itself has a known probability function whereas in the second case you only presume that X has the desired mean and standard deviation. With less information, you get a less precise lower bound but since the lower bound \(= 0.691 < 0.939 = \) exact value, then there is no conflict.