Geometric Distribution

Section 7.3 Geometric Distribution

Consider the situation where one can observe a sequence of independent trials where the likelihood of a success on each individual trial stays constant from trial to trial. Call this likelihood the probably of "success" and denote its value by \(p\) where \(0 \lt p \lt 1 \text{.}\) If we let the variable \(X\) measure the number of trials needed in order to obtain the first success with \(R = \{1, 2, 3, ... \}\text{,}\) then the resulting distribution of probabilities is called a Geometric Distribution.

Theorem 7.3.1.

For a Geometric variable \(X\) with \(R = \{1, 2, 3, ... \}\text{,}\)

\begin{equation*} f(x) = (1-p)^{x-1} \cdot p \end{equation*}

Proof.

Since successive trials are independent, then the probability of the first success occurring on the mth trial presumes that the previous m-1 trials were all failures. Therefore the desired probability is given by

\begin{equation*} f(x) = P(X = x) = P(FF...FS) = (1-p)^{x-1}p \end{equation*}

Example 7.3.2. Flipping a die until getting a success.

Let's consider a simple example for rolling a 24-sided die until you get a multiple of 9...that is, either a 9 or an 18. Successive rolls of a die would appear to be independent events and the probability of getting a 9 or 18 on any given roll is \(p = \frac{1}{12}\text{.}\) What is then the likelihood that it takes more than three rolls in order for you to get your first success?

This is easily modeled by a geometric distribution and you are looking for

\begin{equation*} P(X > 3) = 1 - P(X \le 3) = 1 - F(3) = 1 - f(1) - f(2) - f(3) \\ 1 - \frac{1}{12} - \frac{11}{12} \cdot \frac{1}{12} - (\frac{11}{12})^2 \frac{1}{12} \approx 0.77025. \end{equation*}

Example 7.3.3. Testing a critical component until failure.

Often one will test a critical system component until it fails to see how long the component works. Suppose you have a particular component that on any given trial has a p=0.01 probability of breaking. You also might find it reasonable to presume that succesive trials are independent which could be the case if the component shows no wear from trial to trial. So, you can model this situation with a geometric distribution and the probability that the component fails on the x-th trial is given by

\begin{equation*} f(x) = 0.99^{x-1} \cdot 0.01. \end{equation*}

For some reason, you might be interested in whether the component fails on the 5th trial. The probability of this outcome is given by

\begin{equation*} f(5) = 0.99^4 \cdot 0.01 \approx 0.0096 \end{equation*}

so it is unlikely that the component will fail on exactly the 5th trial. However, what about failing on one of the first five trials? Then, you would need

\begin{equation*} F(5) = f(1)+f(2)+f(3)+f(4)+f(5) \\ = 0.01 + 0.99 \cdot 0.01 + 0.99^2 \cdot 0.01 + 0.99^3 \cdot 0.01 + 0.99^4 \cdot 0.01 \approx 0.049 \end{equation*}

which is still relatively small. Indeed, with such a small probability of failure, you might expect the component to last for some time. Indeed, we will uncover a formula below for the number of trials, on average, you might expect before failure.

Let's go ahead and verify this probability function and investigate some of the geometric distribution's properties.

Theorem 7.3.4. Geometric Distribution sums to 1.

\begin{equation*} f(x) = (1-p)^{x-1}p \end{equation*}

sums to 1 over \(R = \{ 1, 2, ... \}\)

Proof.

\begin{equation*} \sum_{x=1}^{\infty} {f(x)} = \sum_{x=1}^{\infty} {(1-p)^{x-1} p} \\ = p \sum_{j=0}^{\infty} {(1-p)^j} = p \frac{1}{1-(1-p)} = 1 \end{equation*}

using a change of variables j = x-1 and the known value for the sum of the geometric series.

Theorem 7.3.5. Geometric Statistics Theorem.

For the geometric distribution,

\begin{equation*} \mu = 1/p \end{equation*}

\begin{equation*} \sigma^2 = \frac{1-p}{p^2} \end{equation*}

\begin{equation*} \gamma_1 = \frac{2-p}{\sqrt{1-p}} \end{equation*}

\begin{equation*} \gamma_2 = \frac{p^2-6p+6}{1-p} + 3 \end{equation*}

Proof.

For the mean 1,

\begin{align*} \mu & = E[X] = \sum_{x=1}^{\infty} {x(1-p)^{x-1}p}\\ & = p \sum_{x=1}^{\infty} {x(1-p)^{x-1}}\\ & = p \frac{1}{(1-(1-p))^2}\\ & = p \frac{1}{p^2} = \frac{1}{p} \end{align*}

For the variance 2,

\begin{align*} \sigma^2 & = E[X(X-1)] + \mu - \mu^2 \\ & = \sum_{x=1}^{\infty} {x(x-1)(1-p)^{x-1}p} + \mu - \mu^2 \\ & = (1-p)p \sum_{x=2}^{\infty} {x(x-1)(1-p)^{x-2}} + \frac{1}{p} - \frac{1}{p^2}\\ & = (1-p)p \frac{2}{(1-(1-p))^3} + \frac{1}{p} - \frac{1}{p^2}\\ & = \frac{1-p}{p^2} \end{align*}

Theorem 7.3.6. Geometric Distribution Function.

\begin{equation*} F(x) = 1- (1-p)^{x} \end{equation*}

Proof.

Consider the accumulated probabilities over a range of values...

\begin{align*} P(X \le x) & = 1 - P(X \gt x)\\ & = 1- \sum_{k={x+1}}^{\infty} {(1-p)^{k-1}p}\\ & = 1- p \frac{(1-p)^{x}}{1-(1-p)}\\ & = 1- (1-p)^{x} \end{align*}

Theorem 7.3.7. Statistics for Geometric Distribution.

Mean, Variance, Skewness 3, Kurtosis 4 computed by Sage.

Proof.

See the interactive Sage cell below...

Theorem 7.3.8. The Geometric Distribution yields a memoryless model..

If X has a geometric distribution and a and b are nonnegative integers, then

\begin{equation*} P( X > a + b | X > b ) = P( X > a) \end{equation*}

Proof.

Using the definition of conditional probability,

\begin{align*} P( X > a + b | X > b ) & = \frac{P( X > a + b \cap X > b )}{P( X > b)}\\ & = \frac{P( X > a + b )}{P( X > b)}\\ & = \frac{(1-p)^{a+b}}{(1-p)^b}\\ & = (1-p)^a\\ & = P(X > a) \end{align*}