Definition 2.3.1 Covariance
Given paired (sample) data
with corresponding means and the covariance is given by
and similarly if using population data in which you would use instead the mean of the x-values and the mean of the y-values
You can plot points and plot the resulting best-fit line determined in the previous section but the question remains whether the line is any good. In particular, the real use of the line often is to subsequently predict y-values for a given x-value. However, it is very likely that the best-fit line does not even pass through any of the provided data points. So, how can something that misses every marker still be considered a good fit. To quantify this, we first need to discuss a way to measure how two variables might vary with each other.
Given paired (sample) data
with corresponding means and the covariance is given by
and similarly if using population data in which you would use instead the mean of the x-values and the mean of the y-values
which simplifies to the desired result using the definition of the mean.
This general definition provides a general measure which is a second order term (like variance) but also maintains "units". To provide a unit-less metric, consider the following measure.
Given a collection of data points, the correlation coefficient is given by
where is the standard deviation of the x-values only and is the standard deviation of the y-values only. A similar statistics for population data would instead utilize and as the respective standard deviations of the x-values and y-values.
If the points are colinear with a positive slope then r=1 and if the points are collinear with a negative slope then r=-1.
Assume the data points are colinear with a positive slope. Then the \(TSE(m_0,b_0) = 0\) for some \(m_0\) and \(b_0\text{.}\) For this line notice that \(f(x_k) = y_k\) exactly for all data points. It is easy to show then that \(\overline{y} = m_0 \overline{x} + b_0\) and \(s_y = | m_0 | s_x\text{.}\) Therefore,
Putting these together gives correlation coefficient
A similar proof follows in the second case by noting that \(m_0 / | m_0 | = -1\text{.}\)
Interpreting correlation coefficients.
Interpreting correlation coefficients.
Consider the data points (1,1), (1,2), (2,1), (2,2). Plot these points and consider the nature of the best fit line. Show using software that the correlation coefficient is zero. Justify why TSE(m,b) = 1 must be the minimum.