Section 1.3 Statistical Measures of Position
Definition 1.3.1. Order Statistic.
From the data set
assume that when sorted it is denoted
where
Then, yk is known as the kth order statistic.
Example 1.3.2. Age of Presidents - order statistics.
The age at inauguration for presidents from 1981-2019 gives the data
(Reagan, Bush, Clinton, Bush, Obama, Trump). For this data, the order statistics are denoted
Definition 1.3.3. Minimum/Maximum:.
For a given data set, the smallest and largest values are known as the minimum and maximum, respectively. In our notation and presuming a data set of size n, the minimum = y1 and the maximum = yn
Example 1.3.4. Age of Presidents - Minimum/Maximum.
Using the President inauguration data 1.3.2, minimum = y1=46 and maximum = y6=70.
Definition 1.3.5. Percentiles.
For 0<s<1 and for order statistics y1,y2,...,yn define the 100s-th percentile to be
where m is the integer part of (n+1)s, namely
and
the fractional part of (n+1)s.
In Excel, this is PERCENTILE.EXC.
Definition 1.3.6. Alternate Percentile Definition.
For 0<s<1 and for order statistics y1,y2,...,yn define the 100s-th percentile to be
where m is the integer part of (n−1)s+1, namely
and
the fractional part of (n−1)s+1.
In Excel, this is PERCENTILE.INC or just PERCENTILE.
Checkpoint 1.3.7. WeBWorK - Computing Percentiles.
Example 1.3.8. Presidential Percentile.
To compute, say, the 42nd percentile using the definition 1.3.5 for the President inauguration data presented earlier 1.3.2 consider s = 0.42. Since there are 6 numbers in our data set, then
and so m = 2 and r = 0.94. Thus, the percentile will lie between y2=47 and y3=54 and much closer to 54 than 47. Numerically
Definition 1.3.9. Quartiles.
Given a sorted data set, the first, second, and third quartiles are the values of
and
Example 1.3.10. Q1 and Q3 vs Calculators.
Suppose n = 22 = 5(4) + 2. Computing the first quartile as defined above gives (n+1)s = 23(0.25) = 5.75 = 5 + 0.75 = m + r. Therefore,
which is a value closer to y6. Many graphing calculators however quickly approximate this with
so you should be aware of this possible difference. You should also notice that in this case s = 0.25 but r = 0.75 so these values are not required to be the same.
Definition 1.3.11. Deciles:.
Given a sorted data set, the first, second, ..., ninth deciles are the value of
Example 1.3.12. Small Example - Quartiles.
Consider the data set
The 50th percentile should be a numerical value for which approximately 50% of the data is smaller. In this case, that would be some number between 5 and 8. For now, let's just take 6.5 so that two numbers in the set lie below 6.5 and two lie above. This is a perfect 50% for the 50th percentile. In a similar manner, the 25th percentile would be some number between 2 and 5, say 2.75, so that one number lies below 2.75 and three numbers lie above.
Using the percentile definition 1.3.5, the 25th percentile is computed by considering
So, m = 1 and r = 0.25. Therefore
as noted above.
Similarly, the 75th percentile is given by
So, m = 3 and r = 0.75. Therefore
It is interesting to note that 3 also lies between 2 and 5 as does 2.75 and has the same percentages above (75 percent) and below (25 percent). However, it should designate a slightly larger percentile location. Indeed, going backward:
and so 3 would actually be at approximately the 26.7th percentile.
Theorem 1.3.13.
Given any value y with y1<y<yn, then y is at the Ps percentile withProof.
Definition 1.3.14. 5-number summary.
Given a set of data, the 5-number summary is a vector of the order statistics given by
xxxxxxxxxx
data <- c( 1, 2, 5, 7, 7, -1, 3, 2) # concatenate into a list
print(paste("Quartiles:"))
print(quantile(data))
print(paste("Specific Percentiles:"))
print(quantile(data, c(.32, .57, .98))) # 32nd, 57th and 98th percentiles
print(paste("Box and Whisker Diagram:"))
boxplot(data, horizontal=TRUE)
Example 1.3.15. Small example - 5 number summary.
Returning to our previous example, the five number summary 1.3.14 would be
xxxxxxxxxx
data()
# data(faithful)
# head(faithful,10)
# nrow(faithful) # the number of observations (rows)
# ncol(faithful) # the number of variables per observation (columns)
# print(? faithful) # a more exhaustive description of the data
xxxxxxxxxx
data(faithful)
x <- faithful
x1 <- x[,1]
x2 <- x[,2]
m <- min(faithful, 2)
M <- max(faithful, 2)
print(paste("Minimum of the second outcome = ",m))
print(paste("Maximum of the second outcome = ",M))
cat("\n\n") # a couple of blank lines in the displayed output
quantile(x[,1], c(.25,.29,.57))
cat("\n\n") # a couple of blank lines in the displayed output
# mu1 <- mean(x1)
# mu2 <- mean(x2)
# med1 <- median(x1)
# med2 <- median(x2)
# print(paste("Mean of the first outcome = ",mu1))
# print(paste("Mean of the second outcome = ",mu2))
# print(paste("Median of the first outcome = ",med1))
# print(paste("Median of the second outcome = ",med2))