Definition of Probability

Section 4.3 Definition of Probability

Relative frequency gives a way to measure the proportion of "successful" outcomes when doing an experimental approach. From the interactive applications above, it appears that the relative frequency does jump around as the experiment is repeated but that the amount of variation decreases as the number of experiments increases. This is known to be true in general and is known as the "Law of Large Numbers".

🔗

We would like to formalize what these relative frequencies are approaching and will call this theoretical limit the "probability" of the outcome. In doing so, we will do our best to model our definition so that it follow the behavior of relative frequency.

🔗

To generate a general definition for probability, we need to know what is is that we measuring. In general, we will be finding the probability of sets of possible outcomes...that is, a subset of the Sample Space S. Toward that end, it is important to briefly look at some properties of sets.

🔗

Definition 4.3.1. Pairwise Disjoint Sets.

$\{ A_1, A_2, ... , A_n \}$ are pairwise disjoint provided $A_k \cap A_j = \emptyset$ so long as $k \ne j\text{.}$ Disjoint sets as also often called mutually exclusive.

🔗

Play around with the interactive cell below by adding and removing items in each of the three sets. Find elements so that the intersection of all three sets is empty but at least one of the paired sets are not disjoint. See if you can make all of the paired sets not disjoint but the intersection of all three disjoint. This is why we need to consider "pairwise" disjoint sets.

xxxxxxxxxx
 
def f(s, braces=True): 
    t = ', '.join(sorted(list(s)))
    if braces: return '{' + t + '}'
    return t
def g(s): return set(str(s).replace(',',' ').split())
​
@interact
def _(X='1,2,3', Y='2,a,3,4,apple', Z='a,b,10,apple'):
    S = [g(X), g(Y), g(Z)]
    X,Y,Z = S
    XY = X & Y
    XZ = X & Z
    YZ = Y & Z
    XYZ = XY & Z
​
    Txy = " - NOT disjoint "
    if Set(XY).is_empty():
        Txy = ' - disjoint '
    pretty_print(html("$X \\cap Y$ = %s"%f(XY)+"%s"%Txy))
    Txz = " - NOT disjoint "
    if Set(XZ).is_empty():
        Txz = ' - disjoint '
    pretty_print(html("$X \\cap Z$ = %s"%f(XZ)+"%s"%Txz))
    Tyz = " - NOT disjoint "
    if Set(YZ).is_empty():
        Tyz = ' - disjoint ' 
    pretty_print(html("$Y \\cap Z$ = %s"%f(YZ)+"%s"%Tyz))
    Txyz = " - NOT disjoint "
    if Set(XYZ).is_empty():
        Txyz = ' - disjoint ' 
    pretty_print(html("$X \\cap Y \\cap Z$ = %s"%f(XYZ)+"%s"%Txyz))
    centers = [(cos(n*2*pi/3), sin(n*2*pi/3)) for n in [0,1,2]]
    scale = 1.7
    clr = ['yellow', 'blue', 'green']
    G = Graphics()
    for i in range(len(S)):
        G += circle(centers[i], scale, rgbcolor=clr[i], 
             fill=True, alpha=0.3)
    for i in range(len(S)):
        G += circle(centers[i], scale, rgbcolor='black')
​
    # Plot what is in one but neither other
    for i in range(len(S)):
        Z = set(S[i])
        for j in range(1,len(S)):
            Z = Z.difference(S[(i+j)%3])
        G += text(f(Z,braces=False), (1.5*centers[i][0],1.7*centers[i][1]), 
                       rgbcolor='black')
​
​
    # Plot pairs of intersections
    for i in range(len(S)):
        Z = (set(S[i]) & S[(i+1)%3]) - set(XYZ)
        C = (1.3*cos(i*2*pi/3 + pi/3), 1.3*sin(i*2*pi/3 + pi/3))
        G += text(f(Z,braces=False), C, rgbcolor='black')
​
    # Plot intersection of all three
    G += text(f(XYZ,braces=False), (0,0), rgbcolor='black')
​
    # Show it
    G.show(aspect_ratio=1, axes=False)

🔗

Consider how we might create a definition for the expectation of a given outcome. To do so, first consider a desired collection of outcomes A. If each outcome in A is chosen randomly then we might consider using a formula similar to relative frequency and set a measure of expectation to be |A|/|S|. For example, on a standard 6-sided die, the expectation of the outcome A={2} from the collection S = {1,2,3,4,5,6} could be |A|/|S| = 1/6.

🔗

From our example where we take the sum of two die, the outcome A = { 4,5 } from the collection S = {2,3,4,...,12} would be

$\begin{gather*} |A| = | \{ (1,3),(2,2),(3,1),(1,4),(2,3),(3,2),(4,1) \}| = 7\\ |S| = | \{ (1,1),...,(1,6),(2,1),...,(2,6),...,(6,1),...,(6,6) \}| = 36 \end{gather*}$

🔗

and so the expected relative frequency would be |A|/|S| = 7/36. Compare this theoretical value with the sum of the two outcomes from your experiment above.

🔗

We are ready to now formally give a name to the theoretical measure of expectation for outcomes from an experiment. Taking our cue from our examples, let's make our definition agree with the following relative frequency properties:

Relative frequency cannot be negative, since cardinality cannot be negative
Relative frequencies for disjoint events should sum to one
Relative frequencies for collections of disjoint outcomes should equal the sum of the individual relative frequencies

🔗

which leads us to the following formal definition...

🔗

Definition 4.3.2. Probability.

The probability P(A) of a given outcome A is a set function that satisfies:

(Nonnegativity) P(A) $\ge 0$
(Totality) P(S) = 1
(Subadditivity) If A $\cap$ B = $\emptyset\text{,}$ then P(A $\cup$ B) = P(A) + P(B). In general, if { $A_k$ } are pairwise disjoint then $P( \cup_k A_k) = \sum_k P(A_k)\text{.}$

🔗

Checkpoint 4.3.3. WeBWorK - Using the definition.

Notice when you are given complete information regarding the entire data set then determining probabilities for events can be relatively easy to compute.

🔗

Based upon this definition we can immediately establish a number of results.

🔗

Theorem 4.3.4. Probability of Complements.

For any event A,

$\begin{equation*} P(A) + P(A^c) = 1 \end{equation*}$

🔗

Proof.

Let A be any event and note that

\begin{equation*} A \cap A^c = \emptyset. \end{equation*}

But $A \cup A^c = S\text{.}$ So, by subadditivity

\begin{equation*} 1 = P(S) = P(A \cup A^c) = P(A) + P(A^c) \end{equation*}

as desired.

🔗

Theorem 4.3.5.

$\begin{equation*} P(\emptyset) = 0 \end{equation*}$

🔗

Proof.

Note that $\emptyset^c = S\text{.}$ So, by the theorem above,

\begin{equation*} 1 = P(S) + P(\emptyset) \Rightarrow 1 = 1 + P(\emptyset). \end{equation*}

Cancelling the 1 on both sides gives $P(\emptyset) = 0\text{.}$

🔗

Theorem 4.3.6.

For events A and B with

$\begin{equation*} A \subset B, P(A) \le P(B)\text{.} \end{equation*}$

🔗

Proof.

Assume sets A and B satisfy $A \subset B\text{.}$ Then, notice that

\begin{equation*} A \cap (B-A) = \emptyset \end{equation*}

and

\begin{equation*} B = A \cup (B-A). \end{equation*}

Therefore, by subadditivity and nonnegativity

\begin{gather*} 0 \le P(B-A)\\ P(A) \le P(A) + P(B-A) \\ P(A) \le P(B) \end{gather*}

🔗

Theorem 4.3.7.

For any event A,

$\begin{equation*} P(A) \le 1 \end{equation*}$

🔗

Proof.

Notice $A \subset S\text{.}$ By the theorem above $P(A) \le P(S) = 1$

🔗

Theorem 4.3.8.

For any sets A and B,

$\begin{equation*} P(A \cup B) = P(A) + P(B) - P(A \cap B) \end{equation*}$

🔗

Proof.

Notice that we can write $A \cup B$ as the disjoint union

\begin{equation*} A \cup B = (A-B) \cup (A \cap B) \cup (B-A). \end{equation*}

We can also write disjointly

\begin{gather*} A = (A-B) \cup (A \cap B)\\ B = (A \cap B) \cup (B-A) \end{gather*}

Hence,

\begin{align*} P(A) & + P(B) - P(A \cap B) \\ & = [P(A-B) + P(A \cap B)] \\ & + [P(A \cap B) + P(B-A)] - P(A \cap B)\\ & = P(A-B) + P(A \cap B) + P(B-A)\\ & = P(A \cup B) \end{align*}

🔗

This result can be extended to more that two sets using a property known as inclusion-exclusion. The following two theorems illustrate this property and are presented without proof.

🔗

Corollary 4.3.9.

For any sets A, B and C,

$\begin{align*} P(A \cup B \cup C) & = P(A) + P(B) + P(C)\\ & - P(A \cap B) - P(A \cap C) - P(B \cap C) \\ & + P(A \cap B \cap C) \end{align*}$

🔗

Corollary 4.3.10.

For any sets A, B, C and D,

$\begin{align*} P(A \cup B \cup C \cup D) & = P(A) + P(B) + P(C) + P(D)\\ & - P(A \cap B) - P(A \cap C) - P(A \cap D) \\ & - P(B \cap C) - P(B \cap D) - P(C \cap D)\\ & + P(A \cap B \cap C) + P(A \cap B \cap D) \\ & + P(A \cap C \cap D) + P(B \cap C \cap D)\\ & - P(A \cap B \cap C \cap D) \end{align*}$

🔗

Many times, you will be dealing with making selections from a sample space where each item in the space has an equal chance of being selected. This may happen (for example) when items in the sample space are of equal size or when selecting a card from a completely shuffled deck or when coins are flipped or when a normal fair die is rolled.

🔗

It is important to notice that not all outcomes are equally likely--even in times when there are only two of them. Indeed, it is generally not an equally likely situation when picking the winner of a football game which pits, say, the New Orleans Saints professional football team with the New Orleans Home School Saints. Even though there are only two options the probability of the professional team winning in most years ought to be much greater than the chances that the high school will prevail.

🔗

When items are equally likely (sometimes also called "randomly selected") then each individual event has the same chance of being selected as any other. In this instance, determining the probability of a collection of outcomes is relatively simple.

🔗

Theorem 4.3.11. Probability of Equally Likely Events.

If outcomes in S are equally likely, then for $A \subset S,$

$\begin{equation*} P(A) = \frac{|A|}{|S|}. \end{equation*}$

🔗

Proof.

Enumerate S = {$x_1, x_2, ..., x_{|S|}$} and note $P( \{ x_k \} ) = c$ for some constant c since each item is equally likely. However, using each outcome as a disjoint event and the definition of probability,

\begin{align*} 1 = P(S) & = P( \{ x_1 \} \cup \{x_2 \} \cup ... \cup \{x_{|S|} \} )\\ & = P(\{ x_1 \}) + P(\{ x_2 \} ) + ... + P(\{ x_{|S|} \} )\\ & = c + c + ... + c = {|S|} \times c \end{align*}

and so $c = \frac{1}{{|S|}}\text{.}$ Therefore, $P( \{ x_k \} ) = \frac{1}{|S|}$ .

Hence, with A = {$a_1, a_2, ..., a_{|A|}$}, breaking up the disjoint probabilities as above gives

\begin{align*} P(A) & = P( \{ a_1 \} \cup \{ a_2 \} \cup ... \cup \{ a_{|A|} \} )\\ & = P(\{ a_1 \}) + P(\{ a_2 \} ) + ... + P(\{ a_{|A|} \} )\\ & = \frac{1}{{|S|}} + \frac{1}{{|S|}} + ... + \frac{1}{{|S|}}\\ & = \frac{|A|}{{|S|}} \end{align*}

as desired.

xxxxxxxxxx
 
var('A C D H J K Q S') 
​
def L(str):
    n = len(str)
    m = int(n/5)
    top = m+1
    if m == n/5:
        top = m
    for k in range(top):
        pretty_print(str[5*k:5*k+5])
        
suits = [S, D, C, H] 
values = [2, 3, 4, 5, 6, 7, 8, 9, 10, J, Q, K, A] 
​
deck = [(value, suit) for suit in suits for value in values]
full_deck = copy(deck)  # to save a copy
​
# L(deck)
shuffle(deck)
# L(deck)
deck1 = copy(full_deck)
shuffle(deck1)
​
@interact
def _(auto_update=False):
    global deck1    
    shuffle(deck1)            
    if (Set(deck1).cardinality() < 5):
        print('Deck is too small...getting a new deck')
        deck1 = copy(full_deck)
    else:
        hand = [deck1.pop() for card in range(5)] 
        pretty_print("The cards dealt:")
        L(hand)
        pretty_print(" The remaining cards in the deck:")
        L(deck1)
        pretty_print(html("\n The number of remaining cards in the deck "
                          +" = %s"%str(Set(deck1).cardinality())))

🔗

Let's see if you understand the relationship between frequency and relative frequency. In this exercise, presume "Probabiity" to be the expected fraction of outcomes you might logically expect.

🔗

Checkpoint 4.3.12. WebWork - Equally Likely.

So, by counting actual "equally likely" outcomes these probabilities are easy to compute.

🔗

Checkpoint 4.3.13. WebWork - Easy Probabilities.

Notice how the probabilities look similar to relative frequencies. It's just the case that you are counting ALL of the individual simple possibilities that lead to a success.