Higher Degree Linear Regression

Section 2.4 Higher Degree Linear Regression

Continuing in a similar fashion to the previous section, consider now an approximation using a quadratic function $f(x) = a x^2 + b x + c\text{.}$ In this case, the total squared error would be of the form

T S E (a, b, c) = \sum_{k = 0}^{n} (a x_{k}^{2} + b x_{k} + c - y_{k})^{2} .

$\begin{equation*} TSE(a,b,c) = \sum_{k=0}^n (a x_k^2 + b x_k + c - y_k)^2. \end{equation*}$

Taking all three partials gives

T S E_{a} = \sum_{k = 0}^{n} 2 (a x_{k}^{2} + b x_{k} + c - y_{k}) \cdot x_{k}^{2}

$\begin{equation*} TSE_a = \sum_{k=0}^n 2(a x_k^2 + b x_k + c - y_k) \cdot x_k^2 \end{equation*}$

T S E_{b} = \sum_{k = 0}^{n} 2 (a x_{k}^{2} + b x_{k} + c - y_{k}) \cdot x_{k}

$\begin{equation*} TSE_b = \sum_{k=0}^n 2(a x_k^2 + b x_k + c - y_k) \cdot x_k \end{equation*}$

T S E_{c} = \sum_{k = 0}^{n} 2 (a x_{k}^{2} + b x_{k} + c - y_{k}) \cdot 1 .

$\begin{equation*} TSE_c = \sum_{k=0}^n 2(a x_k^2 + b x_k + c - y_k) \cdot 1 . \end{equation*}$

Once again, setting equal to zero and solving gives the normal equations for the best-fit quadratic

a \sum_{k = 0}^{n} x_{k}^{4} + b \sum_{k = 0}^{n} x_{k}^{3} + c \sum_{k = 0}^{n} x_{k}^{2} = \sum_{k = 0}^{n} x_{k}^{2} y_{k}

$\begin{equation*} a \sum_{k=0}^n x_k^4 + b \sum_{k=0}^n x_k^3 + c \sum_{k=0}^n x_k^2 = \sum_{k=0}^n x_k^2 y_k \end{equation*}$

a \sum_{k = 0}^{n} x_{k}^{3} + b \sum_{k = 0}^{n} x_{k}^{2} + c \sum_{k = 0}^{n} x_{k} = \sum_{k = 0}^{n} x_{k} y_{k}

$\begin{equation*} a \sum_{k=0}^n x_k^3 + b \sum_{k=0}^n x_k^2 + c \sum_{k=0}^n x_k = \sum_{k=0}^n x_k y_k \end{equation*}$

a \sum_{k = 0}^{n} x_{k}^{2} + b \sum_{k = 0}^{n} x_{k} + c \sum_{k = 0}^{n} 1 = \sum_{k = 0}^{n} y_{k} .

$\begin{equation*} a \sum_{k=0}^n x_k^2 + b \sum_{k=0}^n x_k + c \sum_{k=0}^n 1 = \sum_{k=0}^n y_k. \end{equation*}$

Notice that even though you are creating the best-fit quadratic, to find that quadratic boils down to solving a (slightly larger) linear system. In other words, linear regression again. Indeed, we can also approach the derivation of regression forumulas directly using a linear algebra approach. To do this, consider the equations generated by plugging in the data points $(x_0,y_0), (x_1, y_1), ... , (x_n, y_n)$ into the quadratic model. This yields a (likely overdetermined) system of equations. Appending an error term $\epsilon_k$ for each equation gives the following matrix form:

[\begin{matrix} y_{0} \\ y_{1} \\ y_{2} \\ . . . \\ y_{n} \end{matrix}] = [\begin{matrix} x_{0}^{2} & x_{0} & 1 \\ x_{1}^{2} & x_{1} & 1 \\ x_{2}^{2} & x_{2} & 1 \\ . . . & . . . & . . . \\ x_{n}^{2} & x_{n} & 1 \end{matrix}] \cdot [\begin{matrix} a \\ b \\ c \end{matrix}] + [\begin{matrix} ϵ_{0} \\ ϵ_{1} \\ ϵ_{2} \\ . . . \\ ϵ_{n} \end{matrix}]

$\begin{equation*} \begin{bmatrix} y_0 \\y_1 \\ y_2 \\ ... \\ y_n \end{bmatrix} = \begin{bmatrix} x_0^2 \amp x_0 \amp 1 \\ x_1^2 \amp x_1 \amp 1 \\ x_2^2 \amp x_2 \amp 1 \\ ... \amp ... \amp ... \\ x_n^2 \amp x_n \amp 1 \end{bmatrix} \cdot \begin{bmatrix} a \\ b \\ c \end{bmatrix} + \begin{bmatrix} \epsilon_0 \\ \epsilon_1 \\ \epsilon_2 \\ ... \\ \epsilon_n \end{bmatrix} \end{equation*}$

which in matrix form looks something like

Y = X A + ϵ .

$\begin{equation*} Y = XA + \epsilon. \end{equation*}$

Solving for $\epsilon$ and then minimizing $\epsilon^t \epsilon$ yields the same solution as above. In matrix form, after some work this becomes

A = (X^{t} X)^{- 1} X^{t} Y

$\begin{equation*} A = (X^t X)^{-1} X^t Y \end{equation*}$

with the matrix A containing the three unknowns a, b, and c.

We can also use Matlab (or the opensource alternative "octave") to compute this linear algebra for us. The graph here using the sagecell is a text graph and is very rudimentary but plugging this code into Matlab or a desktop version of octave should present a very nice graph.

xxxxxxxxxx
 
x = [-1 0 1 3 5 5]
y = [5 3 0 -1 3 6]
n = max(size(x));
for k = 1:n
  X(k,1) = x(k)^2;
  X(k,2) = x(k);
  X(k,3) = 1;
end
Y = y';  # transpose the set of y-values to be a column vector
A = inv(X'*X)*X'*Y;
a = A(1)
b = A(2)
c = A(3)
u = min(x):0.1:max(x);  # create a set of input values for plotting
v = a * u.^2 + b * u + c;
plot(x,y,'o',u,v,'.')

Cutting and pasting this code into perhaps http://octave-online.net gives a nice, non ASCII graph. Below, we do the same thing but using Sage.

xxxxxxxxxx
 
var('x')
xpts = vector(RR,(-1, 0, 1, 3, 5, 5))
ypts = vector(RR,(5, 3, 0, -1, 3, 6))
ones = vector(RR,(1, 1, 1, 1, 1, 1))
xpts2 = []     # accumulate the squares
pts = []       # accumulate the (x,y) pairs for plotting purposes
for k in range(len(xpts)):
    xpts2.append(xpts[k]^2)
    pts.append((xpts[k],ypts[k]))
xpts2 = vector(xpts2)
​
X = matrix(RR,[xpts2]).stack(xpts).stack(ones).transpose()  # create X
Y = matrix(RR,ypts).transpose()
Xt = X.transpose()
A = (Xt*X).inverse()*Xt*Y
[a,b,c] = [A[0][0], A[1][0], A[2][0]]
f = a*x^2+b*x+c
banner = "The quadratic interpolant is given by \(%s\)"%str(latex(f))
G = points(pts,size=20)
H = plot(f,x,min(xpts)-0.2,max(xpts)+0.2,title=banner)
show(G+H)

Checkpoint 2.4.1 Creating a Cubic Linear Regression Interpolant

Modify the Sage code above to give a best-fit cubic interpolant.

Solution

var('x')
xpts = vector((-1, 0, 1, 3, 5, 5))
ypts = vector((5, 3, 0, 4, 3, -1))
ones = vector((1, 1, 1, 1, 1, 1))
xpts3 = []
xpts2 = []
pts = []
for k in range(len(xpts)):
    xpts3.append(xpts[k]^3)
    xpts2.append(xpts[k]^2)
    pts.append((xpts[k],ypts[k]))
xpts3 = vector(xpts3)
xpts2 = vector(xpts2)

X = matrix([xpts3]).stack(xpts2).stack(xpts).stack(ones).transpose()
Y = matrix(ypts).transpose()
Xt = X.transpose()

A = (Xt*X).inverse()*Xt*Y
[a,b,c,d] = [A[0][0],A[1][0],A[2][0],A[3][0]]


f = a*x^3 + b*x^2 + c*x + d
banner = "The cubic interpolant is given by $%s$"%str(latex(f))
G = points(pts,size=20)
H = plot(f,x,min(xpts)-0.2,max(xpts)+0.2,title=banner)
show(G+H)

Checkpoint 2.4.2 WebWork

Doing Cubic Linear Regression...use your Sage work from above.