Section 2.5 Multi-variable Linear Regression
The regression models that we have looked at presumed a single independent variable. It is much more likely when investigating cause and effect relationships that there are perhaps many independent variables that contribute.
Let's consider a linear model with two independent variables. Indeed, a basic two-variable linear model of the form
can be used to approximate data points
Using a linear systems approach similar to the previous section by evaluating at these data points and appending an error term to each equation gives, in matrix form:
where the \(\epsilon_k\) terms are the deviation between the exact data point and the approximation of that point on some plane. Symbolically
If all of the points lie on the same plane (unlikely), then \(\epsilon = 0\text{.}\) Otherwise, once again applying a least squares solution approach is the same as minimizing \(\epsilon^t \epsilon\) and eventually gives
in general. Evaluating this with X and Z as above gives the needed coefficients
Let's first see how this is done automatically in R using one of the built-in data sets
A good example of the usefulness and limitations of multi-variate linear regression is the calculation of the "Heat Index". This measure determines a measure of discomfort relative to the ambient temperature and the relative humidity. Indeed, in warm climates a high temperature is more difficult to bear if the humidity is also high. One reason is that with high humidity the body is less effective in shedding heat through evaporation of body sweat.
The National Weather Service in 1990 published the following multiple regression equation for Heat Index (HI) relative to the ambient temperature (T) and the relative humidity (RH)
Since this model utilizes a linear combination of terms and it's derivation could also be generated using a generalization of the linear regression method presented above. Details on how this equation was determined and other details are available at https://www.wpc.ncep.noaa.gov/html/heatindex_equation.shtml .
Below one can compute a table for various ambient Temperature readings given one value for relative humidity. Notice what happens for a relatively high humidity and relatively high temperature.
Indeed, you cannot roast a turkey by simply turning the oven on 120 and pumping in a lot of humidity since the turkey is not trying to cool itself anymore. Any discomfort measured on the turkey's behalf would certainly be matched by the human since the bird would be a rare bird and remain very much uncooked. The issue is that this model doesn't presume the possibility of 120F and 95% humidity. Often, in situations where the temperature is able to reach that level, such as a desert, then the relative humidity is correspondingly low. This idea of using a model to predict extreme values beyond the measured data is called extrapolation and should be utilized with care. Interpolation to estimate values within the confines of the measured data is however generally a safe bet.