Section 2.5 Multi-variable Regression
The regression models that we have looked at till now have always presumed a single independent variable and with "linear" coefficients. It is much more likely when investigating cause and effect relationships that there are perhaps many variables that contribute or perhaps using a non-linear relationship between the unknown coefficients. To tease you to consider taking another course that covers multi-variate regression, in this section we briefly consider a two-variable model. We also consider an interesting example that illustrates the danger in using models to estimate values well beyond the range of the relevant data that has been used to create the model. Consider then a model of the form
z=α1x+α2y+β
and the data points
(x1,y1,z1),(x2,y2,z2),...,(xn,yn,zn).
Evaluating at these data points gives, in matrix form
[z1z2...zn]=[x1y11x2y21.........xnyn1]⋅[α1α2β]+[ϵ1ϵ2...ϵn]
where the ϵk terms are the deviation between the exact data point and the approximation of that point on some plane. Symbolically
Z=XA+ϵ.
Unless all of the points lie on the same plane (unlikely) then when the vector ϵ=0, this system will overdetermined with more independent equations than unknowns. Applying a least squares solution approach is the same as minimizing ϵtϵ and eventually gives
A=(XtX)−1XtZ
in general. Evaluating this with X and Z as above gives the matrix
A=[α1α2β]
A good example of the usefulness and limitations of multi-variate regression is the calculation of the "Heat Index". This measure determines a measure of discomfort relative to the ambient temperature and the relative humidity. Indeed, in warm climates a high temperature is more difficult to bear if the humidity is also high. One reason is that with high humidity the body is less effective in shedding heat through evaporation of body sweat.
The National Weather Service in 1990 published the following multiple regression equation for Heat Index (HI) relative to the ambient temperature (T) and the relative humidity (RH)
H=−42.379+2.04901523⋅T+10.14333127⋅R−0.22475541⋅T⋅R−6.83783⋅10−3⋅T2−5.481717⋅10−2⋅R2+1.22874⋅10−3⋅T2⋅R+8.5282⋅10−4⋅T⋅R2−1.99⋅10−6⋅T2⋅R2.
Notice, their model utilizes quadratic terms and therefore uses a generalization of the linear result presented above. Details on how this equation was determined and other details are available at https://www.wpc.ncep.noaa.gov/html/heatindex_equation.shtml .
xxxxxxxxxx
def _(T = (90),R = (95)):
H = (-42.379+2.04901523*T+10.14333127*R-0.22475541*T*R
-6.83783*10^(-3)*T^2-5.481717*10^(-2)*R^2+1.22874*10^(-3)*T^2*R
+8.5282*10^(-4)*T*R^2-1.99*10^(-6)*T^2*R^2)
pretty_print(html("At %s"% str(T)+"$ ^o $ with %s"%str(R)
+" percent relative humidity,"))
pretty_print(html("Heat Index =%s"%str(H)))
xxxxxxxxxx
def _(R = input_box(default=95,width=10,label="Relative Humidity")):
for T in range(80,121):
H = (-42.379+2.04901523*T+10.14333127*R-0.22475541*T*R
-6.83783*10^(-3)*T^2-5.481717*10^(-2)*R^2+1.22874*10^(-3)*T^2*R
+8.5282*10^(-4)*T*R^2-1.99*10^(-6)*T^2*R^2)
H = H.n(digits=4)
pretty_print(html("At %s"%str(T)+"$ ^o $ with relative humidity"+
" %s "%str(R)+"the Heat Index = %s "%str(H)+" degrees."))