4.2 Estimating the Multiple Regression Model

4.2.1 Minimizing the SSE

As we did in the simple linear regression case in Section 1.3.2, we want to fit model (4.1)

\begin{align*} Y_{i}= & \beta_{0}+\sum_{k=1}^{p-1}\beta_{k}X_{ik}+\varepsilon_{i}\\ & \varepsilon\overset{iid}{\sim}N\left(0,\sigma^{2}\right)\qquad\qquad\qquad(4.1) \end{align*}

to the observed data.

The fitted line in the multiple regression case is $$ \hat{Y}_i = b_0 + b_1 X_{i1} + b_2 X_{i2}+\cdots +b_{p-1}X_{i,p-1}\qquad\qquad(4.2) $$ The estimates $b_0,b_1,\ldots,b_{p-1}$ are found by minimizing the squared distances between the observed values $Y_i$ and the fitted values $\hat{Y}_i$. The sum of the squared distances in (1.2)

\begin{align*} $$ Q=\sum \left(Y_i-\left(b_0+b_1 X_i\right)\right)^2 \qquad\qquad\qquad (1.2) $$

for the simple regression case is now $$ Q=\sum \left(Y_i-\left(b_0 + b_1 X_{i1} + b_2 X_{i2}+\cdots +b_{p-1}X_{i,p-1}\right)\right)^2 \qquad (4.3) $$ in the multiple regression case.

Note that model (4.1)

\begin{align*} Y_{i}= & \beta_{0}+\sum_{k=1}^{p-1}\beta_{k}X_{ik}+\varepsilon_{i}\\ & \varepsilon\overset{iid}{\sim}N\left(0,\sigma^{2}\right)\qquad\qquad\qquad(4.1) \end{align*}

is no longer a line. It is a plane when $p=3$ and a hyperplane when $p>3$.

Case with Two Predictor Variables

When there are two predictor variables ($p=3$), we will take three partial derivatives of (4.3) with respect to $b_0$, $b_1$, and $b-2$. This leads us to \begin{align*} \frac{\partial Q}{\partial b_{0}} & =-2\sum\left(Y_{i}-b_{0}-b_{1}X_{i1}-bX_{2i}\right)\\ \frac{\partial Q}{\partial b_{1}} & =-2\sum X_{i1}\left(Y_{i}-b_{0}-b_{1}X_{i1}-bX_{2i}\right)\qquad(4.4)\\ \frac{\partial Q}{\partial b_{2}} & =-2\sum X_{i2}\left(Y_{i}-b_{0}-b_{1}X_{i1}-bX_{2i}\right) \end{align*}

The Normal Equations

Setting the partial derivative equal to zero and rearranging the terms lead us to the normal equations \begin{align*} \sum Y_{i} & =nb_{0}\sum X_{i1}+b_{2}\sum X_{i2}\\ \sum X_{i1}Y_{i} & =b_{0}\sum X_{i1}+b_{1}\sum X_{i1}^{2}+b_{2}\sum X_{i1}X_{i2}\qquad(4.5)\\ \sum X_{i2}Y_{i} & =b_{0}\sum X_{i2}+b_{1}\sum X_{i1}X_{i2}+b_{2}\sum X_{i2}^{2} \end{align*}

The Least Squares Estimators

Solving the normal equations for $b_0$, $b_1$, and $b_2$ gives use the least squares estimators \begin{align*} b_{1} & =\frac{\left(\sum X_{i2}^{2}\right)\left(\sum X_{i1}Y_{i}\right)-\left(\sum X_{i1}X_{i2}\right)\left(\sum X_{i2}Y_{i}\right)}{\left(\sum X_{i1}^{2}\right)\left(\sum X_{i2}^{2}\right)-\left(\sum X_{i1}X_{i2}\right)^{2}}\\ b_{2} & =\frac{\left(\sum X_{i1}^{2}\right)\left(\sum X_{i2}Y_{i}\right)-\left(\sum X_{i1}X_{i2}\right)\left(\sum X_{i1}Y_{i}\right)}{\left(\sum X_{i1}^{2}\right)\left(\sum X_{i2}^{2}\right)-\left(\sum X_{i1}X_{i2}\right)^{2}}\qquad(4.6)\\ b_{0} & =\bar{Y}-b_{1}\bar{X}_{1}-b_{2}\bar{X}_{2} \end{align*} We see that the expression for the least squares estimators become cumbersome even for $p=3$. As more variables are added to the model, the equations become even more cumbersome.

We can simplify notation by utilizing matrices to represent the model. We will present some basic notation and operations for matrices in the next section and then present the model using matrices in Section 4.4.

« 4.1: More Than One Predictor Variable 4.3: A Primer on Matrices »