Section 7.4 Negative Binomial
Consider the situation where one can observe a sequence of independent trials where the likelihood of a success on each individual trial stays constant from trial to trial. Call this likelihood the probably of "success" and denote its value by p where 0<p<1. If we let the variable X measure the number of trials needed in order to obtain the rth success, r≥1, with R={r,r+1,r+2,...} then the resulting distribution of probabilities is called a Geometric Distribution. Note that r=1 gives the Geometric Distribution.Theorem 7.4.1. Negative Binomial Series.
Proof.
First, convert the problem to a slightly different form: \(\frac{1}{(a+b)^n} = \frac{1}{b^n} \frac{1}{(\frac{a}{b}+1)^n} = \frac{1}{b^n} \sum_{k=0}^{\infty} {(-1)^k \binom{n + k - 1}{k} \left ( \frac{a}{b} \right ) ^k}\)
So, let's replace \(\frac{a}{b} = x\) and ignore for a while the term factored out. Then, we only need to show
However
This infinite sum raised to a power can be expanded by distributing terms in the standard way. In doing so, the various powers of x multiplied together will create a series in powers of x involving \(x^0, x^1, x^2, ...\text{.}\) To detemine the final coefficients notice that the number of time \(m^k\) will appear in this product depends upon the number of ways one can write k as a sum of nonnegative integers.
For example, the coefficient of \(x^3\) will come from the n ways of multiplying the coefficients \(x^3, x^0, ..., x^0\) and \(x^2, x^1, x^0, ..., x^0\) and \(x^1, x^1, x^1, x^0,..., x^0\text{.}\) This is equivalent to finding the number of ways to write the number k as a sum of nonnegative integers. The possible set of nonnegative integers is {0,1,2,...,k} and one way to count the combinations is to separate k *'s by n-1 |'s. For example, if k = 3 then *||** means \(x^1 x^0 x^2 = x^3\text{.}\) Similarly for k = 5 and |**|*|**| implies \(x^0 x^2 x^1 x^2 x^0 = x^5\text{.}\) The number of ways to interchange the identical *'s among the idential |'s is \(\binom{n+k-1}{k}\text{.}\)
Furthermore, to obtain an even power of x will require an even number of odd powers and an odd power of x will require an odd number of odd powers. So, the coefficient of the odd terms stays odd and the coefficient of the even terms remains even. Therefore,
Similarly,
Theorem 7.4.2. Negative Binomial Probability Function.
for x∈R={r,r+1,...}.
Proof.
Since successive trials are independent, then the probability of the rth success occurring on the x-th trial presumes that in the previous x-1 trials were r-1 successes and x-r failures. You can arrange these indistinguishable successes (and failures) in \(\binom{x-1}{r-1}\) unique ways. Therefore the desired probability is given by
Example 7.4.3. Flipping a die until getting a third success.
Once again, consider rolling our 24-sided die until you get a multiple of 9...that is, either a 9 or an 18...for the third time. Once again, the probability of getting a 9 or 18 on any given roll is p=112 but since we will continue rolling until we get a success for the third time, this is modeled by a negative binomial distribution and you are looking for
Computing f(x) for any given x is relatively painless but computing F(x) could take some effort. There is generally not a graphing calculator distribution option for negative binomial but the interactive cells below can be utilized to help with the tedious computations. For example, if you were interested in P(X<20)=P(X≤19)=F(19) then the interactive calculator below using r=3 and p=112 gives F(19)≈0.20737.
Example 7.4.4. Testing a critical component until failure.
Let's once again test a series of critical system components until you find two that fail. Again, suppose a particular component has a p=0.01 probability of breaking on any given trial. Since you will stop when you encounter the third failure, you can model this situation with a negative binomial distribution and the probability that the 2nd component fails on the x-th trial is given by
Once again, let's compute the likelihood that you get the second failure on one of the first five trials. Then,
which is still relatively small.
Theorem 7.4.5. Negative Binomial Distribution Sums to 1.
>Proof.
and by using \(k = x-r\)
xxxxxxxxxx
## Negative Binomial calculator
def _(p=input_box(1/12,width=15),r=slider(1,20,1,2)):
n = 4*(floor(r/p)+1)
np1 = n+1
R = range(r,np1)
f(x) = ((factorial(x-1)/(factorial(r-1)*factorial(x-r)))
*(1-p)^(x-r)*p^r)
acc = 0
for k in R:
prob = f(x=k)
acc = acc+prob
pretty_print('f(%s) = '%k,' %.8f'%prob,
' and F(%s) = '%k,' %.8f'%acc)
Checkpoint 7.4.6.
Theorem 7.4.7. Statistics for Negative Binomial Distribution.
For the Negative Binomial Distribution,
Proof.
xxxxxxxxxx
# Negative Binomial
var('x,n,p,r,alpha')
assume(x,'integer')
assume(alpha,'integer')
assume(alpha > 2)
assume(0 < p < 1)
def _(r=[2,5,10,15,alpha]):
f(x) = binomial(x-1,r-1)*p^r*(1-p)^(x-r)
mu = sum(x*f,x,r,oo).full_simplify()
M2 = sum(x^2*f,x,r,oo).full_simplify()
M3 = sum(x^3*f,x,r,oo).full_simplify()
M4 = sum(x^4*f,x,r,oo).full_simplify()
pretty_print('Mean = ',mu)
v = (M2-mu^2).full_simplify()
pretty_print('Variance = ',v)
stand = sqrt(v)
sk = (((M3 - 3*M2*mu + 2*mu^3)).full_simplify()/stand^3).factor()
pretty_print('Skewness = ',sk)
kurt = ((M4 - 4*M3*mu + 6*M2*mu^2 -3*mu^4)/v^2).full_simplify()
pretty_print('Kurtosis = ',(kurt-3).factor(),'+3')
Theorem 7.4.8. Negative Binomial Limiting Distribution.
For the negative binomial distribution 7.4.2, as r→∞
and
Proof.
For skewness, take the limit of the skewness result above
Similarly for kurtosis
xxxxxxxxxx
r = 3
p = 0.4
M = 20*r
X = r:M # the space R should be infinite
Xshift = 0:(M-r) # r does dbinom, etc. by letting
# x measure the number of failures
# rather than the number
# of total trials. So, x=r would be
# 0 failures and so
# r would count that as x=0.
mu = r/p # the formula for mean of the Binomial Distributions
sdev = sqrt(r*(1-p)/p^2) # the formula for the standard deviation
if(M < 101){
dnbinom( Xshift, r, p ) # let's print out a bunch of actual probs
}
Pnbinom = dnbinom(Xshift, r, p ) # the probability function over X
​
Psample = rnbinom(10^6, r, p) # create a histogram, sample a lot
​
Xtop=max(Psample) # for scaling the x-axis. Shift by 1/2 below.
hist(Psample, prob=TRUE, br=(-1:Xtop)+0.5, col="skyblue2", xlab="X - r",
main="Negative Binomial pdf vs Approximating 'Bell Curve'")
​
points(Xshift, Pnbinom, pch=19, col="darkgreen") # actual (x,f(x))
​
Pnormal <- function(X){dnorm(X, mean=mu-r, sd=sdev)}
# overlap a bell curve, shifting our mean by r to fit the r protocol
curve(Pnormal, col="red", lwd=2, add=TRUE)