Fill in Blanks
Home
1.1 Bivariate Relationships
1.2 Probabilistic Models
1.3 Estimation of the Line
1.4 Properties of the Least Squares Estimators
1.5 Estimation of the Variance
2.1 The Normal Errors Model
2.2 Inferences for the Slope
2.3 Inferences for the Intercept
2.4 Correlation and Coefficient of Determination
2.5 Estimating the Mean Response
2.6 Predicting the Response
3.1 Residual Diagnostics
3.2 The Linearity Assumption
3.3 Homogeneity of Variance
3.4 Checking for Outliers
3.5 Correlated Error Terms
3.6 Normality of the Residuals
4.1 More Than One Predictor Variable
4.2 Estimating the Multiple Regression Model
4.3 A Primer on Matrices
4.4 The Regression Model in Matrix Terms
4.5 Least Squares and Inferences Using Matrices
4.6 ANOVA and Adjusted Coefficient of Determination
4.7 Estimation and Prediction of the Response
5.1 Multicollinearity and Its Effects
5.2 Adding a Predictor Variable
5.3 Outliers and Influential Cases
5.4 Residual Diagnostics
5.5 Remedial Measures
4.7 Estimation and Prediction of the Response
"Those who ignore Statistics are condemned to reinvent it." - Brad Efron
We estimate the mean response as we did in the simple linear case (see Section 2.5) except now we will estimate at a vector of values:
\begin{align*}
{\bf X}_{h} & =\left[\begin{array}{c}
1\\
X_{h1}\\
X_{h2}\\
\vdots\\
X_{h,p-1}
\end{array}\right]
\end{align*}
So the estimated mean response will be the regression function evaluated
at ${\bf X}_{h}$:
\begin{align*}
\hat{Y}_{h} & ={\bf X}_{h}{\bf b}\qquad(4.45)
\end{align*}
The variance of the estimated mean response is
\begin{align*}
Var\left[\hat{Y}_{h}\right] & =\sigma^{2}{\bf X}_{h}^{\prime}\left({\bf X}^{\prime}{\bf X}\right)^{-1}{\bf X}_{h}\qquad(4.46)
\end{align*}
We can estimate the variance as
\begin{align*}
s^{2}\left[\hat{Y}_{h}\right] & =MSE{\bf X}_{h}^{\prime}\left({\bf X}^{\prime}{\bf X}\right)^{-1}{\bf X}_{h}\qquad(4.47)
\end{align*}
We can then obtain a $\left(1-\alpha\right)100\%$ confidence interval
for the mean response at ${\bf X}_{h}$ as
\begin{align*}
\hat{Y}_{h} & \pm t_{\alpha/2}s\left[\hat{Y}_{h}\right]\qquad(4.48)
\end{align*}
where $t_{\alpha/2}$ has $n-p$ degrees of freedom.
We can predict a new response $Y_{h\left(new\right)}$ at some ${\bf X}_{h}$
with a $\left(1-\alpha\right)100\%$ prediction interval
\begin{align*}
\hat{Y}_{h} & \pm t_{\alpha/2}s\left[Y_{h\left(pred\right)}\right]\qquad(4.49)
\end{align*}
where
\begin{align*}
s^{2}\left[Y_{h\left(pred\right)}\right] & =MSE\left(1+{\bf X}_{h}^{\prime}\left({\bf X}^{\prime}{\bf X}\right)^{-1}{\bf X}_{h}\right)\qquad(4.50)
\end{align*}
For the bodyfat data from Example 4.5.1, suppose we want to predict and estimate at tri=25 , thigh=51.2 , and midarm=24.9 . We will use the predict function as we did in the simple linear case.
library(tidyverse)
dat = read.table("http://www.jpstats.org/Regression/data/BodyFat.txt", header=T)
fit = lm(bfat~tri+thigh+midarm, data=dat)
xnew = data.frame(tri = 25, thigh = 51.2, midarm = 24.9)
predict(fit, xnew, interval="confidence", level = 0.90)
fit lwr upr
1 24.73348 18.80705 30.65991
predict(fit, xnew, interval="prediction", level = 0.90)
fit lwr upr
1 24.73348 17.3939 32.07306