Definition 4.2.1 Cumulative relative frequency
For a collection of ordered events x1<x2<...<xs with corresponding frequencies f1,f2,...,fs, the cumulative relative frequency is the function
When attempting to precisely measure uncertainty one often resorts to examples or experiments that model the theoretical question of interest. Before we investigate statistical experiments, we need to create some notation that we will utilize throughout the rest of this text.
To investigate these terms and to motivate our discussion of probability, consider flipping coins using the interactive cell below. Notice in this case, the sample space S = \{ Heads, Tails \} and the random experiment consists of flipping a fair coin one time. Each trial results in either a Head or a Tail. Since we are measuring both Heads and Tails then we will not worry about which is a success or failure. Further, on each flip the outcomes of Heads or Tails are mutually exclusive events. We count the frequencies and compute the relative frequencies for a varying number of trials selected by you as you move the slider bar. Results are displayed using a histogram.
xxxxxxxxxx
coin = ["Heads", "Tails"]
def _(num_rolls = slider([5..5000],label="Number of Flips")):
rolls = [choice(coin) for roll in range(num_rolls)]
show(rolls)
freq = [0,0]
for outcome in rolls:
if (outcome=='Tails'):
freq[0] = freq[0]+1
else:
freq[1] = freq[1]+1
print("\nThe frequency of tails = "+ str(freq[0]))+" and heads = "+ str(freq[1])+"."
rel = [freq[0]/num_rolls,freq[1]/num_rolls]
print("\nThe relative frequencies for Tails and Heads:"+str(rel))
show(bar_chart(freq,axes=False,ymin=0)) # A histogram of the results
Question 1: What do you notice as the number of flips increases?
Question 2: Why do you rarely (if ever) get exactly the same number of Heads and Tails? Would you not "expect" that to happen?
You should have noticed that as the number of flips increases, the relative frequency of Heads (and Tails) stabilized around 0.5. This makes sense intuitively since there are two options for each individual flip and 1/2 of those options are Heads while the other 1/2 is Tails.
Let's try again by doing a random experiment consisting of rolling a single die one time. Note that the sample space in this case will be the outcomes S = \{ 1, 2, 3, 4, 5, 6 \}.
xxxxxxxxxx
def _(num_rolls = slider([20..5000],label='Number of rolls'),Number_of_Sides = [4,6,8,12,20]):
die = list((1..Number_of_Sides))
rolls = [choice(die) for roll in range(num_rolls)]
show(rolls)
freq = [rolls.count(outcome) for outcome in set(die)] # count the numbers for each outcome
print 'The frequencies of each outcome is '+str(freq)
print 'The relative frequencies of each outcome:'
rel_freq = [freq[outcome-1]/num_rolls for outcome in set(die)] # make frequencies relative
print rel_freq
fs = []
for f in rel_freq:
fs.append(f.n(digits=4))
print fs
show(bar_chart(freq,axes=False,ymin=0))
Notice for a single die there are a larger number of options (for example 6 on a regular die) but once again the relative frequencies of each outcome was close to 1/n (i.e. 1/6 for the regular die) as the number of rolls increased.
In general, this suggests a rule: if there are n outcomes and each one has the same chance of occurring on a given trial then on average on a large number of trials the relative frequency of that outcome is 1/n. In general, if a number of outcomes are "equally likely" then this is a good model for measuring the proportion of outcomes that would be expected to have any given outcome. However, it is not always true that outcomes are equally likely. Consider rolling two die and measuring their sum:
xxxxxxxxxx
def _(num_rolls = slider([20..5000],label='Number of rolls'),num_sides = slider(4,20,1,6,label='Number of sides')):
die = list((1..num_sides))
dice = list((2..num_sides*2))
rolls = [(choice(die),choice(die)) for roll in range(num_rolls)]
sums = [sum(rolls[roll]) for roll in range(num_rolls)]
show(rolls)
freq = [sums.count(outcome) for outcome in set(dice)] # count the numbers for each outcome
print 'The frequencies of each outcome is '+str(freq)
print 'The relative frequencies of each outcome:'
rel_freq = [freq[outcome-2]/num_rolls for outcome in set(dice)] # make frequencies relative
print rel_freq
show(bar_chart(freq,axes=False,ymin=0)) # A histogram of the results
print "Relative Frequence of ",dice[0]," is about ",rel_freq[0].n(digits=4)
print "Relative Frequence of ",dice[num_sides-1]," is about ",rel_freq[num_sides-1].n(digits=4)
Question 1: What do you notice as the number of rolls increases?
Question 2: What do you expect for the relative frequencies and why are they not all exactly the same?
Notice, not only are the answers not the same but they are not even close. To understand why this is different from the examples before, consider the possible outcomes from each pair of die. Since we are measuring the sum of the dice then (for a pair of standard 6-sided dice) the possible sums are from 2 to 12. However, there is only one way to get a 2--namely from a (1,1) pair--while there are 6 ways to get a 7--namely from the pairs (1,6), (2,5), (3,4), (4,3), (5,2), and (6,1). So it might make some sense that the likelihood of getting a 7 is 6 times larger than that of getting a 2. Check to see if that is the case with your experiment above.
Play with the following several times to investigate what you might expect to get when you repeatedly receive a "hand" of 5 standard playing cards. Can you imagine how you might possible enumerate the entire list of possible outcomes by hand? However, using this interactive cell, you can shuffle and deal 5-card hands over and over easily and then count the number of special poker outcomes.
xxxxxxxxxx
var('A C D H J K Q S')
suits = [S, D, C, H]
values = [2, 3, 4, 5, 6, 7, 8, 9, 10, J, Q, K, A]
full_deck = [(value, suit) for suit in suits for value in values]
def _(num_hands=slider[50..5000]): # Set up the number of hands to create
hands= [] # Start with a blank list.
for i in range(num_hands): # This loops the following operation num_hands times.
deck = copy(full_deck) # start over
shuffle(deck)
hands.append([deck.pop() for card in range(5)])
freq_values = []
one_pair = 0
two_pair = 0
three_kind = 0
full_house = 0
four_kind = 0
for i in range(num_hands):
hand = hands[i]
hand_values = [hand[k][0] for k in range(5)]
freq_values = [hand_values.count(value) for value in set(values)]
freq_values.sort(reverse=True)
if freq_values[0]==4:
four_kind=four_kind+1
if freq_values[0]==3:
if freq_values[1]==2:
full_house=full_house+1
if freq_values[1]==1:
three_kind=three_kind+1
if freq_values[0]==2:
if freq_values[1]==2:
two_pair=two_pair+1
if freq_values[1]==1:
one_pair=one_pair+1
print " One Pair frequency = ",one_pair," with relative frequency ",one_pair/num_hands
print " Two Pair frequency = ",two_pair," with relative frequency ",two_pair/num_hands
print "Three of a Kind frequency = ",three_kind," with relative frequency ",three_kind/num_hands
print " Full House frequency = ",full_house," with relative frequency ",full_house/num_hands
print " Four of a Kind frequency = ",four_kind," with relative frequency ",four_kind/num_hands
Sometimes you will find it useful to keep a running total of the relative frequencies. Such a cumulative approach is often called a distribution function.
For a collection of ordered events x1<x2<...<xs with corresponding frequencies f1,f2,...,fs, the cumulative relative frequency is the function
Let's consider the cumulative relative frequency with the sum of dice example seen at the beginning of this chapter.
xxxxxxxxxx
def _(num_rolls = slider([20..5000],label='Number of rolls'),num_sides = slider(4,20,1,6,label='Number of sides')):
die = list((1..num_sides))
dice = list((2..num_sides*2))
rolls = [(choice(die),choice(die)) for roll in range(num_rolls)]
sums = [sum(rolls[roll]) for roll in range(num_rolls)]
show(rolls)
freq = [sums.count(outcome) for outcome in set(dice)] # count the numbers for each outcome
n = len(freq)
CF = freq
for k in range(1,n):
CF[k] = freq[k] + CF[k-1]
print 'The cumulative relative frequencies of each outcome:'
Crel_freq = [CF[outcome-2]/num_rolls for outcome in set(dice)] # make frequencies relative
print Crel_freq
show(bar_chart(CF,axes=False,ymin=0)) # A histogram of the results
print "Cumulative Relative Frequence of ",dice[0]," is about ",Crel_freq[0].n(digits=4)
print "Cumulative Relative Frequence of ",dice[num_sides-1]," is about ",Crel_freq[num_sides-1].n(digits=4)