Skip to main content

Section 2.4 Higher Degree Linear Regression

Continuing in a similar fashion to the previous section, consider now an approximation using a quadratic function \(f(x) = a x^2 + b x + c\text{.}\) In this case, the total squared error would be of the form

\begin{equation*} TSE(a,b,c) = \sum_{k=0}^n (a x_k^2 + b x_k + c - y_k)^2. \end{equation*}

Taking all three partials gives

\begin{equation*} TSE_a = \sum_{k=0}^n 2(a x_k^2 + b x_k + c - y_k) \cdot x_k^2 \end{equation*}
\begin{equation*} TSE_b = \sum_{k=0}^n 2(a x_k^2 + b x_k + c - y_k) \cdot x_k \end{equation*}
\begin{equation*} TSE_c = \sum_{k=0}^n 2(a x_k^2 + b x_k + c - y_k) \cdot 1 . \end{equation*}

Once again, setting equal to zero and solving gives the normal equations for the best-fit quadratic

\begin{equation*} a \sum_{k=0}^n x_k^4 + b \sum_{k=0}^n x_k^3 + c \sum_{k=0}^n x_k^2 = \sum_{k=0}^n x_k^2 y_k \end{equation*}
\begin{equation*} a \sum_{k=0}^n x_k^3 + b \sum_{k=0}^n x_k^2 + c \sum_{k=0}^n x_k = \sum_{k=0}^n x_k y_k \end{equation*}
\begin{equation*} a \sum_{k=0}^n x_k^2 + b \sum_{k=0}^n x_k + c \sum_{k=0}^n 1 = \sum_{k=0}^n y_k. \end{equation*}

Notice that even though you are creating the best-fit quadratic, to find that quadratic boils down to solving a (slightly larger) linear system. In other words, linear regression again. Indeed, we can also approach the derivation of regression forumulas directly using a linear algebra approach. To do this, consider the equations generated by plugging in the data points \((x_0,y_0), (x_1, y_1), ... , (x_n, y_n)\) into the quadratic model. This yields a (likely overdetermined) system of equations. Appending an error term \(\epsilon_k\) for each equation gives the following matrix form:

\begin{equation*} \begin{bmatrix} y_0 \\y_1 \\ y_2 \\ ... \\ y_n \end{bmatrix} = \begin{bmatrix} x_0^2 \amp x_0 \amp 1 \\ x_1^2 \amp x_1 \amp 1 \\ x_2^2 \amp x_2 \amp 1 \\ ... \amp ... \amp ... \\ x_n^2 \amp x_n \amp 1 \end{bmatrix} \cdot \begin{bmatrix} a \\ b \\ c \end{bmatrix} + \begin{bmatrix} \epsilon_0 \\ \epsilon_1 \\ \epsilon_2 \\ ... \\ \epsilon_n \end{bmatrix} \end{equation*}

which in matrix form looks something like

\begin{equation*} Y = XA + \epsilon. \end{equation*}

Solving for \(\epsilon\) and then minimizing \(\epsilon^t \epsilon\) yields the same solution as above. In matrix form, after some work this becomes

\begin{equation*} A = (X^t X)^{-1} X^t Y \end{equation*}

with the matrix A containing the three unknowns a, b, and c.

We can also use Matlab (or the opensource alternative "octave") to compute this linear algebra for us. The graph here using the sagecell is a text graph and is very rudimentary but plugging this code into Matlab or a desktop version of octave should present a very nice graph.

Cutting and pasting this code into perhaps http://octave-online.net gives a nice, non ASCII graph. Below, we do the same thing but using Sage.

Modify the Sage code above to give a best-fit cubic interpolant.

Solution
var('x')
xpts = vector((-1, 0, 1, 3, 5, 5))
ypts = vector((5, 3, 0, 4, 3, -1))
ones = vector((1, 1, 1, 1, 1, 1))
xpts3 = []
xpts2 = []
pts = []
for k in range(len(xpts)):
    xpts3.append(xpts[k]^3)
    xpts2.append(xpts[k]^2)
    pts.append((xpts[k],ypts[k]))
xpts3 = vector(xpts3)
xpts2 = vector(xpts2)

X = matrix([xpts3]).stack(xpts2).stack(xpts).stack(ones).transpose()
Y = matrix(ypts).transpose()
Xt = X.transpose()

A = (Xt*X).inverse()*Xt*Y
[a,b,c,d] = [A[0][0],A[1][0],A[2][0],A[3][0]]


f = a*x^3 + b*x^2 + c*x + d
banner = "The cubic interpolant is given by $%s$"%str(latex(f))
G = points(pts,size=20)
H = plot(f,x,min(xpts)-0.2,max(xpts)+0.2,title=banner)
show(G+H)

Doing Cubic Linear Regression...use your Sage work from above.