Definition of Probability

Section 4.3 Definition of Probability

Relative frequency gives a way to measure the proportion of "successful" outcomes when doing an experimental approach. From the interactive applications above, it appears that the relative frequency does jump around as the experiment is repeated but that the amount of variation decreases as the number of experiments increases. This is known to be true in general and is known as the "Law of Large Numbers".

We would like to formalize what these relative frequencies are approaching and will call this theoretical limit the "probability" of the outcome. In doing so, we will do our best to model our definition so that it follow the behavior of relative frequency.

To generate a general definition for probability, we need to know what is is that we measuring. In general, we will be finding the probability of sets of possible outcomes...that is, a subset of the Sample Space S. Toward that end, it is important to briefly look at some properties of sets.

Definition 4.3.1. Pairwise Disjoint Sets.

\(\{ A_1, A_2, ... , A_n \}\) are pairwise disjoint provided \(A_k \cap A_j = \emptyset\) so long as \(k \ne j\text{.}\) Disjoint sets as also often called mutually exclusive.

Play around with the interactive cell below by adding and removing items in each of the three sets. Find elements so that the intersection of all three sets is empty but at least one of the paired sets are not disjoint. See if you can make all of the paired sets not disjoint but the intersection of all three disjoint. This is why we need to consider "pairwise" disjoint sets.

Consider how we might create a definition for the expectation of a given outcome. To do so, first consider a desired collection of outcomes A. If each outcome in A is chosen randomly then we might consider using a formula similar to relative frequency and set a measure of expectation to be |A|/|S|. For example, on a standard 6-sided die, the expectation of the outcome A={2} from the collection S = {1,2,3,4,5,6} could be |A|/|S| = 1/6.

From our example where we take the sum of two die, the outcome A = { 4,5 } from the collection S = {2,3,4,...,12} would be

\begin{gather*} |A| = | \{ (1,3),(2,2),(3,1),(1,4),(2,3),(3,2),(4,1) \}| = 7\\ |S| = | \{ (1,1),...,(1,6),(2,1),...,(2,6),...,(6,1),...,(6,6) \}| = 36 \end{gather*}

and so the expected relative frequency would be |A|/|S| = 7/36. Compare this theoretical value with the sum of the two outcomes from your experiment above.

We are ready to now formally give a name to the theoretical measure of expectation for outcomes from an experiment. Taking our cue from our examples, let's make our definition agree with the following relative frequency properties:

Relative frequency cannot be negative, since cardinality cannot be negative
Relative frequencies for disjoint events should sum to one
Relative frequencies for collections of disjoint outcomes should equal the sum of the individual relative frequencies

which leads us to the following formal definition...

Definition 4.3.2. Probability.

The probability P(A) of a given outcome A is a set function that satisfies:

(Nonnegativity) P(A) \(\ge 0\)
(Totality) P(S) = 1
(Subadditivity) If A \(\cap\) B = \(\emptyset\text{,}\) then P(A \(\cup\) B) = P(A) + P(B). In general, if {\(A_k\)} are pairwise disjoint then \(P( \cup_k A_k) = \sum_k P(A_k)\text{.}\)

Checkpoint 4.3.3. WeBWorK - Using the definition.

Suppose you select a letter at random from the word MISSISSIPPI.

The probability of selecting the letter M is

The probability of selecting the letter S is

The probability of selecting the letters P or I is

The probability of not selecting the letter I is

Hint.

Count the number of letters in the word. When computing each probability, this is the number that goes on the bottom.

Answer 1.

\({\textstyle\frac{1}{11}}\)

Answer 2.

\({\textstyle\frac{4}{11}}\)

Answer 3.

\({\textstyle\frac{2}{11}}+{\textstyle\frac{4}{11}}\)

Answer 4.

\(1-{\textstyle\frac{4}{11}}\)

Notice when you are given complete information regarding the entire data set then determining probabilities for events can be relatively easy to compute.

Based upon this definition we can immediately establish a number of results.

Theorem 4.3.4. Probability of Complements.

For any event A,

\begin{equation*} P(A) + P(A^c) = 1 \end{equation*}

Proof.

Let A be any event and note that

\begin{equation*} A \cap A^c = \emptyset. \end{equation*}

But \(A \cup A^c = S\text{.}\) So, by subadditivity 4.3.2

\begin{equation*} 1 = P(S) = P(A \cup A^c) = P(A) + P(A^c) \end{equation*}

as desired.

Theorem 4.3.5.

\begin{equation*} P(\emptyset) = 0 \end{equation*}

Proof.

Note that \(\emptyset^c = S\text{.}\) So, by the theorem above 4.3.4,

\begin{equation*} 1 = P(S) + P(\emptyset) \Rightarrow 1 = 1 + P(\emptyset). \end{equation*}

Cancelling the 1 on both sides gives \(P(\emptyset) = 0\text{.}\)

Theorem 4.3.6.

For events A and B

\begin{equation*} A \subseteq B \Rightarrow P(A) \le P(B)\text{.} \end{equation*}

Proof.

Assume sets A and B satisfy \(A \subseteq B\text{.}\) Then, notice that

\begin{equation*} A \cap (B-A) = \emptyset \end{equation*}

and

\begin{equation*} B = A \cup (B-A). \end{equation*}

Therefore, by subadditivity and nonnegativity 4.3.2

\begin{gather*} 0 \le P(B-A)\\ P(A) \le P(A) + P(B-A) \\ P(A) \le P(B) \end{gather*}

Theorem 4.3.7.

For any event A,

\begin{equation*} P(A) \le 1 \end{equation*}

Proof.

Notice \(A \subseteq S\text{.}\) By the previous theorem 4.3.6 \(P(A) \le P(S) = 1\)

Theorem 4.3.8.

For any sets A and B,

\begin{equation*} P(A \cup B) = P(A) + P(B) - P(A \cap B) \end{equation*}

Proof.

Notice that we can write \(A \cup B\) as the disjoint union

\begin{equation*} A \cup B = (A-B) \cup (A \cap B) \cup (B-A). \end{equation*}

We can also write disjointly

\begin{gather*} A = (A-B) \cup (A \cap B)\\ B = (A \cap B) \cup (B-A) \end{gather*}

Hence,

\begin{align*} P(A) & + P(B) - P(A \cap B) \\ & = [P(A-B) + P(A \cap B)] \\ & + [P(A \cap B) + P(B-A)] - P(A \cap B)\\ & = P(A-B) + P(A \cap B) + P(B-A)\\ & = P(A \cup B) \end{align*}

This result can be extended to more that two sets using a property known as inclusion-exclusion. The following two theorems illustrate this property and are presented without proof.

Corollary 4.3.9.

For any sets A, B and C,

\begin{align*} P(A \cup B \cup C) & = P(A) + P(B) + P(C)\\ & - P(A \cap B) - P(A \cap C) - P(B \cap C) \\ & + P(A \cap B \cap C) \end{align*}

Corollary 4.3.10.

For any sets A, B, C and D,

\begin{align*} P(A \cup B \cup C \cup D) & = P(A) + P(B) + P(C) + P(D)\\ & - P(A \cap B) - P(A \cap C) - P(A \cap D) \\ & - P(B \cap C) - P(B \cap D) - P(C \cap D)\\ & + P(A \cap B \cap C) + P(A \cap B \cap D) \\ & + P(A \cap C \cap D) + P(B \cap C \cap D)\\ & - P(A \cap B \cap C \cap D) \end{align*}

Many times, you will be dealing with making selections from a sample space where each item in the space has an equal chance of being selected. This may happen (for example) when items in the sample space are of equal size or when selecting a card from a completely shuffled deck or when coins are flipped or when a normal fair die is rolled.

It is important to notice that not all outcomes are equally likely--even in times when there are only two of them. Indeed, it is generally not an equally likely situation when picking the winner of a football game which pits, say, the New Orleans Saints professional football team with the New Orleans Home School Saints. Even though there are only two options the probability of the professional team winning in most years ought to be much greater than the chances that the high school will prevail.

When items are equally likely (sometimes also called "randomly selected") then each individual event has the same chance of being selected as any other. In this instance, determining the probability of a collection of outcomes is relatively simple.

Theorem 4.3.11. Probability of Equally Likely Events.

If outcomes in S are equally likely, then for \(A \subseteq S,\)

\begin{equation*} P(A) = \frac{|A|}{|S|}. \end{equation*}

Proof.

Enumerate S = {\(x_1, x_2, ..., x_{|S|}\)} and note \(P( \{ x_k \} ) = c\) for some constant c since each item is equally likely. However, using each outcome as a disjoint event and the definition of probability,

\begin{align*} 1 = P(S) & = P( \{ x_1 \} \cup \{x_2 \} \cup ... \cup \{x_{|S|} \} )\\ & = P(\{ x_1 \}) + P(\{ x_2 \} ) + ... + P(\{ x_{|S|} \} )\\ & = c + c + ... + c = {|S|} \times c \end{align*}

and so \(c = \frac{1}{{|S|}}\text{.}\) Therefore, \(P( \{ x_k \} ) = \frac{1}{|S|}\) .

Hence, with A = {\(a_1, a_2, ..., a_{|A|}\)}, breaking up the disjoint probabilities as above gives

\begin{align*} P(A) & = P( \{ a_1 \} \cup \{ a_2 \} \cup ... \cup \{ a_{|A|} \} )\\ & = P(\{ a_1 \}) + P(\{ a_2 \} ) + ... + P(\{ a_{|A|} \} )\\ & = \frac{1}{{|S|}} + \frac{1}{{|S|}} + ... + \frac{1}{{|S|}}\\ & = \frac{|A|}{{|S|}} \end{align*}

as desired.

Let's see if you understand the relationship between frequency and relative frequency. In this exercise, presume "Probabiity" to be the expected fraction of outcomes you might logically expect.

Checkpoint 4.3.12. WebWork - Equally Likely.

A fun size bag of M\(\\amp \)Ms has about 15 candies. You open one of the bags and discover:

2 Blues, 2 Yellows, 5 Browns, 3 Reds and 3 Greens.

The probability of choosing a brown is .

The odds in favor of choosing a yellow is

The probability of choosing either a blue or a red is

The odds against a green being chosen is

Hint.

Odds in favor of an event = number of favorable outcomes / number of unfavorable outcomes.

Odds against an event = number of unfavorable outcomes / number of favorable outcomes.

\(\frac{5}{15}\)

\(\frac{2}{15-2}\)

\(\frac{2+3}{15}\)

\(\frac{15-3}{3}\)

So, by counting actual "equally likely" outcomes these probabilities are easy to compute.

Checkpoint 4.3.13. WebWork - Easy Probabilities.

(a) \(\) Count the number of ways to arrange a sample of \(2\) elements from a population of \(11\) elements. NOTE: Order is not important.

answer:

(b) \(\) If random sampling is to be employed, the probability that any particular sample will be selected is

Answer 1.

\(55\)

Answer 2.

\(0.0181818181818182\)

Notice how the probabilities look similar to relative frequencies. It's just the case that you are counting ALL of the individual simple possibilities that lead to a success.

Essentials of Mathematical Probability and Statistics

Search Results:

Section 4.3 Definition of Probability

Definition 4.3.1. Pairwise Disjoint Sets.

Definition 4.3.2. Probability.

Checkpoint 4.3.3. WeBWorK - Using the definition.

Theorem 4.3.4. Probability of Complements.

Proof.

Theorem 4.3.5.

Proof.

Theorem 4.3.6.

Proof.

Theorem 4.3.7.

Proof.

Theorem 4.3.8.

Proof.

Corollary 4.3.9.

Corollary 4.3.10.

Theorem 4.3.11. Probability of Equally Likely Events.

Proof.

Checkpoint 4.3.12. WebWork - Equally Likely.

Checkpoint 4.3.13. WebWork - Easy Probabilities.