Section 4.2 Relative Frequency
When attempting to precisely measure uncertainty one often resorts to examples or experiments that model the theoretical question of interest. Before we investigate statistical experiments, we need to create some notation that we will utilize throughout the rest of this text.
- S = Universal Set or Sample Space Experiment or Outcome Space. This is the collection of all possiblilities.
- Random Experiment. A random experiment is a repeatable activity that has more than one possible outcome all of which can be specified in advance but can not be known in advance with certainty.
- Trial. Performing a Random Experiment one time and measuring the result.
- A = Event. A collection of outcomes. Generally denoted by an upper case letter such as A, B, C, etc.
- Success/Failure. When recording the result of a trial, a success for event A occurs when the outcome lies in A. If not, then the trial was a failure. There is no qualitative meaning to this term.
- Mutually Exclusive Events. Two events that share no common outcomes. Also known as disjoint events.
- |A| = Frequency. In a sequence of n events, the frequency is the number of trials which resulted in a success for event A.
- |A| / n = Relative Frequency. A proportion of successes to total number of trials.
- Histogram. A bar chart representation of data where area corresponds to the value being described.
To investigate these terms and to motivate our discussion of probability, consider flipping coins using the interactive cell below. Notice in this case, the sample space S = \{ Heads, Tails \} and the random experiment consists of flipping a fair coin one time. Each trial results in either a Head or a Tail. Since we are measuring both Heads and Tails then we will not worry about which is a success or failure. Further, on each flip the outcomes of Heads or Tails are mutually exclusive events. We count the frequencies and compute the relative frequencies for a varying number of trials selected by you as you move the slider bar. Results are displayed using a histogram.
xxxxxxxxxx
coin = ["Heads", "Tails"]
def _(num_rolls = slider([5..5000],label="Number of Flips")):
rolls = [choice(coin) for roll in range(num_rolls)]
pretty_print(rolls)
freq = [0,0]
for outcome in rolls:
if (outcome=='Tails'):
freq[0] = freq[0]+1
else:
freq[1] = freq[1]+1
print("The frequency of tails = %s"%str(freq[0])
+" and heads = %s"%str(freq[1]))
rel = [freq[0]/num_rolls,freq[1]/num_rolls]
print("\nThe relative frequencies for Tails and Heads:"+str(rel))
show(bar_chart(freq,axes=False,ymin=0,xmin=-1/2),figsize=[5,4])
Question 1: What do you notice as the number of flips increases?
Question 2: Why do you rarely (if ever) get exactly the same number of Heads and Tails? Would you not "expect" that to happen?
You should have noticed that as the number of flips increases, the relative frequency of Heads (and Tails) stabilized around 0.5. This makes sense intuitively since there are two options for each individual flip and 1/2 of those options are Heads while the other 1/2 is Tails.
Letβs try again by doing a random experiment consisting of rolling a single die one time. Note that the sample space in this case will be the outcomes S = { 1, 2, 3, 4, 5, 6 \.
xxxxxxxxxx
def _(num_rolls = slider([20..5000],label='Number of rolls'),
Number_of_Sides = [4,6,8,12,20]):
die = list((1..Number_of_Sides))
rolls = [choice(die) for roll in range(num_rolls)]
print(rolls)
β
freq = [rolls.count(outcome) for outcome in set(die)]
print('The frequencies of each outcome is %s'%str(freq))
β
print('The relative frequencies of each outcome:')
rel_freq = [freq[outcome-1]/num_rolls for outcome in set(die)]
print(rel_freq)
fs = []
for f in rel_freq:
fs.append(f.n(digits=4))
print(fs)
show(bar_chart(freq,axes=False,ymin=0,xmin=-1/2),figsize=[5,4])
Notice for a single die there are a larger number of options (for example 6 on a regular die) but once again the relative frequencies of each outcome was close to 1/n (i.e. 1/6 for the regular die) as the number of rolls increased.
In general, this suggests a rule: if there are n outcomes and each one has the same chance of occurring on a given trial then on average on a large number of trials the relative frequency of that outcome is 1/n. In general, if a number of outcomes are "equally likely" then this is a good model for measuring the proportion of outcomes that would be expected to have any given outcome. However, it is not always true that outcomes are equally likely. Consider rolling two die and measuring their sum:
xxxxxxxxxx
def _(num_rolls = slider([10..5000],label='Number of rolls'),
num_sides = slider(4,20,1,6,label='Number of sides')):
die = list((1..num_sides))
dice = list((2..num_sides*2))
rolls = [(choice(die),choice(die)) for roll in range(num_rolls)]
sums = [sum(rolls[roll]) for roll in range(num_rolls)]
pretty_print(rolls)
β
freq = [sums.count(outcome) for outcome in set(dice)]
print('The frequencies of each outcome is %s'%str(freq))
print('The relative frequencies of each outcome:')
rel_freq = [freq[outcome-2]/num_rolls for outcome in set(dice)]
print(rel_freq)
show(bar_chart(freq,axes=False,ymin=0, xmin=-1),figsize=[5,4])
print("Relative Frequence of %s"%str(dice[0])
+" is about %s"%str(rel_freq[0].n(digits=4)))
print("Relative Frequence of %s"%str(dice[num_sides-1])
+" is about %s"%str(rel_freq[num_sides-1].n(digits=4)))
Question 1: What do you notice as the number of rolls increases?
Question 2: What do you expect for the relative frequencies and why are they not all exactly the same?
Notice, not only are the answers not the same but they are not even close. To understand why this is different from the examples before, consider the possible outcomes from each pair of die. Since we are measuring the sum of the dice then (for a pair of standard 6-sided dice) the possible sums are from 2 to 12. However, there is only one way to get a 2--namely from a (1,1) pair--while there are 6 ways to get a 7--namely from the pairs (1,6), (2,5), (3,4), (4,3), (5,2), and (6,1). So it might make some sense that the likelihood of getting a 7 is 6 times larger than that of getting a 2. Check to see if that is the case with your experiment above.
Play with the following several times to investigate what you might expect to get when you repeatedly receive a "hand" of 5 standard playing cards. Can you imagine how you might possible enumerate the entire list of possible outcomes by hand? However, using this interactive cell, you can shuffle and deal 5-card hands over and over easily and then count the number of special poker outcomes.
xxxxxxxxxx
var('A C D H J K Q S')
β
suits = [S, D, C, H]
values = [2, 3, 4, 5, 6, 7, 8, 9, 10, J, Q, K, A]
β
full_deck = [(value, suit) for suit in suits for value in values]
def _(num_hands=slider[50..5000]): # Set up the number of hands to create
hands= [] # Start with a blank list.
for i in range(num_hands):
deck = copy(full_deck) # start over
shuffle(deck)
hands.append([deck.pop() for card in range(5)])
freq_values = []
one_pair = 0
two_pair = 0
three_kind = 0
full_house = 0
four_kind = 0
for i in range(num_hands):
hand = hands[i]
hand_values = [hand[k][0] for k in range(5)]
freq_values = [hand_values.count(value) for value in set(values)]
freq_values.sort(reverse=True)
if freq_values[0]==4:
four_kind=four_kind+1
if freq_values[0]==3:
if freq_values[1]==2:
full_house=full_house+1
if freq_values[1]==1:
three_kind=three_kind+1
if freq_values[0]==2:
if freq_values[1]==2:
two_pair=two_pair+1
if freq_values[1]==1:
one_pair=one_pair+1
print(" One Pair frequency = %s"%str(one_pair)
+" with relative frequency %s"%str(one_pair/num_hands))
print(" Two Pair frequency = %s"%str(two_pair)
+" with relative frequency %s"%str(two_pair/num_hands))
print("Three of a Kind frequency = %s"%str(three_kind)
+" with relative frequency %s"%str(three_kind/num_hands))
print(" Full House frequency = %s"%str(full_house)
+" with relative frequency %s"%str(full_house/num_hands))
print(" Four of a Kind frequency = %s"%str(four_kind)
+" with relative frequency %s"%str(four_kind/num_hands))
Sometimes you will find it useful to keep a running total of the relative frequencies. Such a cumulative approach is often called a distribution function.
Letβs consider the cumulative relative frequency with the sum of dice example seen at the beginning of this chapter.
xxxxxxxxxx
def _(num_rolls = slider([10..5000],label='Number of rolls'),
num_sides = slider(4,20,1,6,label='Number of sides')):
die = list((1..num_sides))
dice = list((2..num_sides*2))
rolls = [(choice(die),choice(die)) for roll in range(num_rolls)]
sums = [sum(rolls[roll]) for roll in range(num_rolls)]
pretty_print(rolls)
β
freq = [sums.count(outcome) for outcome in set(dice)]
n = len(freq)
CF = freq
for k in range(1,n):
CF[k] = freq[k] + CF[k-1]
print('The cumulative relative frequencies of each outcome:')
Crel_freq = [CF[outcome-2]/num_rolls for outcome in set(dice)]
print(Crel_freq)
show(bar_chart(CF,axes=False,ymin=0, xmin= -1),figsize=[5,4])
print("Cumulative Relative Frequence of %s"%str(dice[0])
+" is about %s"%str(Crel_freq[0].n(digits=4)))
print("Cumulative Relative Frequence of %s"%str(dice[num_sides-1])
+" is about %s"%str(Crel_freq[num_sides-1].n(digits=4)))