Section 1.4 Statistical Measures of the Middle
Definition 1.4.1. Arithmetic Mean.
Suppose X is a discrete random variable with range R=x1,x2,...,xn. The arithmetic mean is given by
If this data comes from sample data then we call it a sample mean and denote this value by ¯x. If this data comes from the entire universe of possibilities then we call it a population mean and denote this value by μ. When presented with raw data, it might be good to generally presume that data comes from a sample and utilize ¯x.
xxxxxxxxxx
def _(x = input_box(default=[2, 5, 8, 10, 11],width = 40)):
x.sort()
mu = mean(x)
n = len(x)
pts = [(x[0],0.05)]
M = 0.5
for k in range(1,n):
if x[k]==x[k-1]:
pts.append((x[k],pts[k-1][1]+0.1))
M += 0.1
else:
pts.append((x[k],0.05))
G = points(pts,size=100,ymin=-0.5,ymax = M,
xmin=min(x)-2,xmax=max(x)+2,ticks=[[], []],figsize=[5,4])
G += polygon([(mu,0), (mu+0.2,-0.5), (mu-0.2,-0.5)],color='brown')
G.show(figsize=[5,4])
xxxxxxxxxx
x = [2, 5, 8, 10, 10] # x values
w = [1, 2.5, 2.5, 4,2] # weights for each x
wsum = sum(w)
​
n = len(x)
pts = [(x[0],0.05)]
M = 0.2
mu = 0
for k in range(1,n):
mu += x[k]*w[k]
if x[k]==x[k-1]:
pts.append((x[k],pts[k-1][1]+0.2*w[k]))
M += 0.2
else:
pts.append((x[k],0.05))
mu = mu/wsum
G = Graphics()
for k in range(n):
G += point(pts[k],size=100*w[k])
P = polygon([(mu,0), (mu+0.2,-0.5),
(mu-0.2,-0.5)],color='brown')
(G+P).show(xmin=min(x)-0.5, xmax=max(x)+0.5,
ymin=-2*M, ymax = 2*M, ticks=[[], []],figsize=[5,3])
Example 1.4.2. Computing class final grade.
Suppose in a given class you have a daily grade of 92, exam 1 grade of 85, exam 2 grade of 87, and a final exam grade of 93. IF the daily grade counts 10 percent, the first two exams count 25 percent each and the final counts 40 percent then your final grade would be
It would then appear that you might want to do some bargaining with your teacher about how nice it would be to round that up.
Definition 1.4.3. Mode.
The mode for a given data set is the data value that repeats the greatest number of times. If there are two or more such data values, then each is a mode. If all of the data values are unique, then there is no mode.
Example 1.4.4. Computing mode.
Consider the data set
Notice the number 2 is included 3 times in this set as is the number 5. Hence both are modes and we might say that this data set is "bi-modal".
Definition 1.4.5. Median:.
A positional measure of the middle is often utilized by finding the location of the 50th percentile. This value is also called the median and indicates the value at which approximately half the sorted data lies below and half lies above.
Checkpoint 1.4.6. WeBWorK - Computing Mean, Median and Mode.
Definition 1.4.7. Midrange:.
The midrange is a mixture of the mean and median where one takes the simple average of the maximum and minimum values in the data set. Using the order statistics, this equals
xxxxxxxxxx
data <- c( 1, 2, 5, 7, 7, -1, 3, 2) # concatenate
print(paste("Mean = ", mean(data)))
paste("Median =", median(data))
Example 1.4.8. USA State Population Measures of the Middle.
The US Census Bureau reported the following state populations (in millions) for 2013: Spreadsheet
State | Population |
Wyoming | 0.6 |
Vermont | 0.6 |
District of Columbia | 0.6 |
North Dakota | 0.7 |
Alaska | 0.7 |
South Dakota | 0.8 |
Delaware | 0.9 |
Montana | 1 |
Rhode Island | 1.1 |
New Hampshire | 1.3 |
Maine | 1.3 |
Hawaii | 1.4 |
Idaho | 1.6 |
West Virginia | 1.9 |
Nebraska | 1.9 |
New Mexico | 2.1 |
Nevada | 2.8 |
Kansas | 2.9 |
Utah | 2.9 |
Arkansas | 3 |
Mississippi | 3 |
Iowa | 3.1 |
Connecticut | 3.6 |
Oklahoma | 3.9 |
Oregon | 3.9 |
Kentucky | 4.4 |
Louisiana | 4.6 |
South Carolina | 4.8 |
Alabama | 4.8 |
Colorado | 5.3 |
Minnesota | 5.4 |
Wisconsin | 5.7 |
Maryland | 5.9 |
Missouri | 6 |
Tennessee | 6.5 |
Indiana | 6.6 |
Arizona | 6.6 |
Massachusetts | 6.7 |
Washington | 7 |
Virginia | 8.3 |
New Jersey | 8.9 |
North Carolina | 9.8 |
Michigan | 9.9 |
Georgia | 10 |
Ohio | 11.6 |
Pennsylvania | 12.8 |
Illinois | 12.9 |
Florida | 19.6 |
New York | 19.7 |
Texas | 26.4 |
California | 38.3 |
Notice that these are already in order so you can presume \(y_1 = 0.6\) million is the minimum and \(y_{50} = 38.3\) million is the maximum. Therefore, the midrange is given by
In this collection of "states" data the District of Columbia is included so that the number of data items is n=51. The mean of this data takes a bit of arithmetic but gives
million residents.
Since the number of states is odd, the median is found by looking at the 26th order statistic. In this case, that is the 4.4 million residents of Kentucky, i.e. \(y_{26} = 4.4\text{.}\)