If this data comes from sample data then we call it a sample mean and denote this value by \(\overline{x}\text{.}\) If this data comes from the entire universe of possibilities then we call it a population mean and denote this value by \(\mu\text{.}\) When presented with raw data, it might be good to generally presume that data comes from a sample and utilize \(\overline{x}\text{.}\)
To illustrate, consider the previous data set: {2,5,8,10}. The arithmetic mean is given by
The mean is often called the centroid in the sense that if the x values were locations of objects of equal weight, then the centroid would be the point where this system of n equal masses would balance. Play around with the interactive cell below by entering your own data values into the first list.
The values can all be provided with varying weights if desired and the result is called the weighted arithmetic mean and is given by
Suppose in a given class you have a daily grade of 92, exam 1 grade of 85, exam 2 grade of 87, and a final exam grade of 93. IF the daily grade counts 10 percent, the first two exams count 25 percent each and the final counts 40 percent then your final grade would be
It would then appear that you might want to do some bargaining with your teacher about how nice it would be to round that up.
Definition1.4.3Median:
A positional measure of the middle is often utilized by finding the location of the 50th percentile. This value is also called the median and indicates the value at which approximately half the sorted data lies below and half lies above.
For data sets with an odd number of values, this is the "middle" data value if one were to successively cross off pairs from the two ends of the sorted data. For data sets with an even number of values, this is a average of the two data values left after crossing off all other pairs. Using the order statistics, the median equals
From the Presidential data 1.3.2, note that you are considering an even number of data values and so the median is given by (54+64)/2 = 59.
Definition1.4.4Midrange:
The midrange is a mixture of the mean and median where one takes the simple average of the maximum and minimum values in the data set. Using the order statistics, this equals
From the Presidential data 1.3.2, the maximum is 70 and the minimum is 46 so the midrange is 58, the average of these two.
There are several advantages and disadvantages associated with each of these measures. The mean utilizes all of the data values so each term is important. Utilizes them all even if some of the data values might suffer from collection errors. The median ignores outliers (which might be a result of collection errors) but does not account for the relative differences between terms. The midrange is very easy to compute but ignores the relative differences for all terms but the two extremes. A similar collection of features and drawbacks are associated with all descriptive statistics.
You can again compute many statistics automatically using R...
Notice that these are already in order so you can presume \(y_1 = 0.6\) million is the minimum and \(y_{50} = 38.3\) million is the maximum. Therefore, the midrange is given by
In this collection of "states" data the District of Columbia is included so that the number of data items is n=51. The mean of this data takes a bit of arithmetic but gives
Since the number of states is odd, the median is found by looking at the 26th order statistic. In this case, that is the 4.4 million residents of Kentucky, i.e. \(y_{26} = 4.4\text{.}\)