Fill in Blanks
Home
1.1 Bivariate Relationships
1.2 Probabilistic Models
1.3 Estimation of the Line
1.4 Properties of the Least Squares Estimators
1.5 Estimation of the Variance
2.1 The Normal Errors Model
2.2 Inferences for the Slope
2.3 Inferences for the Intercept
2.4 Correlation and Coefficient of Determination
2.5 Estimating the Mean Response
2.6 Predicting the Response
3.1 Residual Diagnostics
3.2 The Linearity Assumption
3.3 Homogeneity of Variance
3.4 Checking for Outliers
3.5 Correlated Error Terms
3.6 Normality of the Residuals
4.1 More Than One Predictor Variable
4.2 Estimating the Multiple Regression Model
4.3 A Primer on Matrices
4.4 The Regression Model in Matrix Terms
4.5 Least Squares and Inferences Using Matrices
4.6 ANOVA and Adjusted Coefficient of Determination
4.7 Estimation and Prediction of the Response
5.1 Multicollinearity and Its Effects
5.2 Adding a Predictor Variable
5.3 Outliers and Influential Cases
5.4 Residual Diagnostics
5.5 Remedial Measures
4.5 Least Squares and Inferences Using Matrices
"Statistics is the grammar of science." - Karl Pearson
For model (4.18)
We will express the sum of the squared distances in (4.3)
\begin{align*}
{\textbf{Y}}= & {\textbf{X}}{\boldsymbol{\beta}}+{\boldsymbol{\varepsilon}}\\
& \boldsymbol{\varepsilon} \sim N_{n}\left({\bf 0},\sigma^{2}{\bf I}\right) \qquad (4.18)
\end{align*}
we want to minimize the squared distances between the observed $Y_i$ and the fitted $\hat{Y}_i$.
We will express the sum of the squared distances in (4.3)
$$
Q=\sum \left(Y_i-\left(b_0 + b_1 X_{i1} + b_2 X_{i2}+\cdots +b_{p-1}X_{i,p-1}\right)\right)^2 \qquad (4.3)
$$
in matrix terms as
\begin{align*}
Q & =\left({\bf Y}-{\bf X}{\bf b}\right)^{\prime}\left({\bf Y}-{\bf X}{\bf b}\right)\qquad(4.19)
\end{align*}
where ${\bf Y}$ is the response vector, ${\bf X}$ is the design
matrix, and ${\bf b}$ is the vector of estimators
\begin{align*}
{\bf b} & =\left[\begin{array}{c}
b_{0}\\
b_{1}\\
\vdots\\
b_{p-1}
\end{array}\right]
\end{align*}
We will minimize $Q$ in (4.19) by taking the derivative with respect
to ${\bf b}$ and setting it equal to the vecor of zeros. The derivative
is done using the properties listed in Section 4.3.6 and using the
chain rule:
\begin{align*}
\frac{\partial\left({\bf Y}-{\bf X}{\bf b}\right)^{\prime}\left({\bf Y}-{\bf X}{\bf b}\right)}{\partial{\bf b}} & =-2{\bf X}^{\prime}\left({\bf Y}-{\bf X}{\bf b}\right)
\end{align*}
Setting this derivative equal to the vector of zeros gives us
\begin{align*}
-2{\bf X}^{\prime}\left({\bf Y}-{\bf X}{\bf b}\right)={\bf 0} & \Longrightarrow{\bf X}^{\prime}\left({\bf Y}-{\bf X}{\bf b}\right)={\bf 0}
\end{align*}
We can distribute ${\bf X}^{\prime}$ through the parentheses
\begin{align*}
{\bf X}^{\prime}\left({\bf Y}-{\bf X}{\bf b}\right)={\bf 0} & \Longrightarrow{\bf X}^{\prime}{\bf Y}-{\bf X}^{\prime}{\bf X}{\bf b}={\bf 0}\\
& \Longrightarrow{\bf X}^{\prime}{\bf X}{\bf b}={\bf X}^{\prime}{\bf Y}\qquad\qquad(4.20)
\end{align*}
Equation (4.20) represents the normal equations in matrix format.
Solving the normal equations for ${\bf b}$ gives us the least squares
estimators:
\begin{align*}
{\bf X}^{\prime}{\bf X}{\bf b}={\bf X}^{\prime}{\bf Y} & \Longrightarrow\left({\bf X}^{\prime}{\bf X}\right)^{-1}{\bf X}^{\prime}{\bf X}{\bf b}=\left({\bf X}^{\prime}{\bf X}\right)^{-1}{\bf X}^{\prime}{\bf Y}\\
& \Longrightarrow{\bf b}=\left({\bf X}^{\prime}{\bf X}\right)^{-1}{\bf X}^{\prime}{\bf Y}\qquad\qquad\qquad\qquad(4.21)
\end{align*}
The fitted values are
\begin{align*}
\hat{{\bf Y}} & ={\bf X}{\bf b}\qquad(4.22)
\end{align*}
Substituting (4.21) for ${\bf b}$ gives us
\begin{align*}
\hat{{\bf Y}}={\bf X}\left({\bf X}^{\prime}{\bf X}\right)^{-1}{\bf X}^{\prime}{\bf Y}\qquad(4.23)
\end{align*}
The $n\times n$ matrix ${\bf X}\left({\bf X}^{\prime}{\bf X}\right)^{-1}{\bf X}^{\prime}$
pops up a number of times in multiple regression analysis. We call
this matrix the hat matrix and denote it as
\begin{align*}
{\bf H} & ={\bf X}\left({\bf X}^{\prime}{\bf X}\right)^{-1}{\bf X}^{\prime}\qquad(4.24)
\end{align*}
Thus, the fitted values can be expressed as
\begin{align*}
\hat{{\bf Y}} & ={\bf H}{\bf Y}\qquad(4.25)
\end{align*}
The residuals can be expressed in matrix terms as
\begin{align*}
{\bf e} & ={\bf Y}-\hat{{\bf Y}}\\
& ={\bf Y}-{\bf X}{\bf b}\qquad(4.26)
\end{align*}
Using (4.25), the residuals can be expressed in terms of the hat matrix
as well
\begin{align*}
{\bf e} & ={\bf Y}-{\bf H}{\bf Y}\\
& =\left({\bf I}-{\bf H}\right){\bf Y}\qquad(4.27)
\end{align*}
We have seen in (1.17)
In matrix terms, we can express SSE as \begin{align*} SSE & ={\bf e}^{\prime}{\bf e}\\ & ={\bf Y}^{\prime}\left({\bf I}-{\bf H}\right)^{\prime}\left({\bf I}-{\bf H}\right){\bf Y}\\ & ={\bf Y}^{\prime}\left({\bf I}-{\bf H}\right){\bf Y}\qquad\qquad\qquad(4.28) \end{align*} The term $\left({\bf I}-{\bf H}\right)^{\prime}\left({\bf I}-{\bf H}\right)$ simplifies to $\left({\bf I}-{\bf H}\right)$ since $\left({\bf I}-{\bf H}\right)$ is idempotent.
The MSE is \begin{align*} MSE & =\frac{SSE}{n-p} \end{align*} Here the degrees of freedom are now $n-p$ since there are $p$ parameters $\left(\beta_{0},\beta_{1},\ldots,\beta_{p-1}\right)$ that need to be estimated first in order to get $SSE$. As before, MSE is an estimator for $\sigma^{2}$.
$$
s^2 = \frac{SSE}{n-2}\qquad\qquad\qquad(1.17)
$$
that $s^{2}$ is an estimate of the variance
$\sigma^{2}$. Recall that we call $s^{2}$ the MSE which is just
the SSE divided by the degrees of freedom ($n-2$ in simple linear
regression).
In matrix terms, we can express SSE as \begin{align*} SSE & ={\bf e}^{\prime}{\bf e}\\ & ={\bf Y}^{\prime}\left({\bf I}-{\bf H}\right)^{\prime}\left({\bf I}-{\bf H}\right){\bf Y}\\ & ={\bf Y}^{\prime}\left({\bf I}-{\bf H}\right){\bf Y}\qquad\qquad\qquad(4.28) \end{align*} The term $\left({\bf I}-{\bf H}\right)^{\prime}\left({\bf I}-{\bf H}\right)$ simplifies to $\left({\bf I}-{\bf H}\right)$ since $\left({\bf I}-{\bf H}\right)$ is idempotent.
The MSE is \begin{align*} MSE & =\frac{SSE}{n-p} \end{align*} Here the degrees of freedom are now $n-p$ since there are $p$ parameters $\left(\beta_{0},\beta_{1},\ldots,\beta_{p-1}\right)$ that need to be estimated first in order to get $SSE$. As before, MSE is an estimator for $\sigma^{2}$.
As they were in the simple regression case, the least squares estimators
${\bf b}$ are unbiased
\begin{align*}
{\bf E}\left[{\bf b}\right] & =\boldsymbol{\beta}\qquad(4.29)
\end{align*}
The $p\times p$ covariance matrix is
\begin{align*}
{\bf Cov}\left[{\bf b}\right] & =\sigma^{2}\left({\bf X}^{\prime}{\bf X}\right)^{-1}\qquad(4.30)
\end{align*}
We can estimate the covariance matrix by using MSE as an estimate
for $\sigma^{2}$. We will denote this estimated covariance matrix
as
\begin{align*}
{\bf s}^{2}\left[{\bf b}\right] & =MSE\left({\bf X}^{\prime}{\bf X}\right)^{-1}\qquad(4.31)
\end{align*}
From ${\bf s}^{2}\left[{
\bf b}\right]$, we can obtain $s^{2}\left[b_{0}\right]$,
$s^{2}\left[b_{1}\right]$, or the variance for any other coefficient
to use in the confidence intervals and hypothesis tests.
A $\left(1-\alpha\right)100\%$ confidence interval for the parameter
$\beta_{k}$ is
\begin{align*}
b_{k} & \pm t_{\alpha/2,n-p}s\left[b_{k}\right]\qquad(4.32)
\end{align*}
We can test the hypothesis
\begin{align*}
H_{0}: & \beta_{k}=0\\
H_{a}: & \beta_{k}\ne0
\end{align*}
with the test statistic
\begin{align*}
t^{*} & =\frac{b_{k}}{s\left[b_{k}\right]}\qquad(4.33)
\end{align*}
where $t^{*}\sim t\left(n-p\right)$.
The bodyfat data from Kutner
Kutner, M. H., Nachtsheim, C. J., Neter, J., & Li, W. (2005). Applied linear statistical models (Vol. 5). Boston: McGraw-Hill Irwin.
. In this dataset the bodyfat, tricep skinfold thickness, thigh circumference, and midarm circumference are found for 20 healthy females 25-34 years old. We want to model bodyfat based on the other three variables.
library(tidyverse)
dat = read.table("http://www.jpstats.org/Regression/data/BodyFat.txt", header=T)
#turn the data frame into a matrix
dat.mat = as.matrix(dat)
dat.mat
tri thigh midarm bfat
[1,] 19.5 43.1 29.1 11.9
[2,] 24.7 49.8 28.2 22.8
[3,] 30.7 51.9 37.0 18.7
[4,] 29.8 54.3 31.1 20.1
[5,] 19.1 42.2 30.9 12.9
[6,] 25.6 53.9 23.7 21.7
[7,] 31.4 58.5 27.6 27.1
[8,] 27.9 52.1 30.6 25.4
[9,] 22.1 49.9 23.2 21.3
[10,] 25.5 53.5 24.8 19.3
[11,] 31.1 56.6 30.0 25.4
[12,] 30.4 56.7 28.3 27.2
[13,] 18.7 46.5 23.0 11.7
[14,] 19.7 44.2 28.6 17.8
[15,] 14.6 42.7 21.3 12.8
[16,] 29.5 54.4 30.1 23.9
[17,] 27.7 55.3 25.7 22.6
[18,] 30.2 58.6 24.6 25.4
[19,] 22.7 48.2 27.1 14.8
[20,] 25.2 51.0 27.5 21.1
#get the fit of the data
fit = lm(bfat~tri+thigh+midarm, data=dat)
fit %>% summary
Call:
lm(formula = bfat ~ tri + thigh + midarm, data = dat)
Residuals:
Min 1Q Median 3Q Max
-3.7263 -1.6111 0.3923 1.4656 4.1277
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 117.085 99.782 1.173 0.258
tri 4.334 3.016 1.437 0.170
thigh -2.857 2.582 -1.106 0.285
midarm -2.186 1.595 -1.370 0.190
Residual standard error: 2.48 on 16 degrees of freedom
Multiple R-squared: 0.8014, Adjusted R-squared: 0.7641
F-statistic: 21.52 on 3 and 16 DF, p-value: 7.343e-06
#to get the confidence intervals for all the coefficients
confint(fit)
2.5 % 97.5 %
(Intercept) -94.444550 328.613940
tri -2.058507 10.726691
thigh -8.330476 2.616780
midarm -5.568367 1.196247
#to get the covariance matrix of the coefficents
vcov(fit)
(Intercept) tri thigh midarm
(Intercept) 9956.5279 300.197963 -257.382315 -158.670413
tri 300.1980 9.093309 -7.779145 -4.788026
thigh -257.3823 -7.779145 6.666803 4.094616
midarm -158.6704 -4.788026 4.094616 2.545617
# obtain the least squares estimates using
# matrices
# get the Y and X matrices
Y = dat.mat[,4]
X = cbind(1, dat.mat[,1:3])
X
tri thigh midarm
[1,] 1 19.5 43.1 29.1
[2,] 1 24.7 49.8 28.2
[3,] 1 30.7 51.9 37.0
[4,] 1 29.8 54.3 31.1
[5,] 1 19.1 42.2 30.9
[6,] 1 25.6 53.9 23.7
[7,] 1 31.4 58.5 27.6
[8,] 1 27.9 52.1 30.6
[9,] 1 22.1 49.9 23.2
[10,] 1 25.5 53.5 24.8
[11,] 1 31.1 56.6 30.0
[12,] 1 30.4 56.7 28.3
[13,] 1 18.7 46.5 23.0
[14,] 1 19.7 44.2 28.6
[15,] 1 14.6 42.7 21.3
[16,] 1 29.5 54.4 30.1
[17,] 1 27.7 55.3 25.7
[18,] 1 30.2 58.6 24.6
[19,] 1 22.7 48.2 27.1
[20,] 1 25.2 51.0 27.5
#now get the least squares estimates
b = solve(t(X)%*%X)%*%t(X)%*%Y
b
[,1]
117.084695
tri 4.334092
thigh -2.856848
midarm -2.186060
#note that the solve function finds the inverse of a matrix
#compare this the estimates from lm
fit
Coefficients:
(Intercept) tri thigh midarm
117.085 4.334 -2.857 -2.186