4.7 Estimation and Prediction of the Response

"Those who ignore Statistics are condemned to reinvent it." - Brad Efron

4.7.1 Estimating the Mean Response

We estimate the mean response as we did in the simple linear case (see Section 2.5) except now we will estimate at a vector of values: \begin{align*} {\bf X}_{h} & =\left[\begin{array}{c} 1\\ X_{h1}\\ X_{h2}\\ \vdots\\ X_{h,p-1} \end{array}\right] \end{align*} So the estimated mean response will be the regression function evaluated at ${\bf X}_{h}$: \begin{align*} \hat{Y}_{h} & ={\bf X}_{h}{\bf b}\qquad(4.45) \end{align*} The variance of the estimated mean response is \begin{align*} Var\left[\hat{Y}_{h}\right] & =\sigma^{2}{\bf X}_{h}^{\prime}\left({\bf X}^{\prime}{\bf X}\right)^{-1}{\bf X}_{h}\qquad(4.46) \end{align*} We can estimate the variance as \begin{align*} s^{2}\left[\hat{Y}_{h}\right] & =MSE{\bf X}_{h}^{\prime}\left({\bf X}^{\prime}{\bf X}\right)^{-1}{\bf X}_{h}\qquad(4.47) \end{align*} We can then obtain a $\left(1-\alpha\right)100\%$ confidence interval for the mean response at ${\bf X}_{h}$ as \begin{align*} \hat{Y}_{h} & \pm t_{\alpha/2}s\left[\hat{Y}_{h}\right]\qquad(4.48) \end{align*} where $t_{\alpha/2}$ has $n-p$ degrees of freedom.

4.7.2 Predicting the Response

We can predict a new response $Y_{h\left(new\right)}$ at some ${\bf X}_{h}$ with a $\left(1-\alpha\right)100\%$ prediction interval \begin{align*} \hat{Y}_{h} & \pm t_{\alpha/2}s\left[Y_{h\left(pred\right)}\right]\qquad(4.49) \end{align*} where \begin{align*} s^{2}\left[Y_{h\left(pred\right)}\right] & =MSE\left(1+{\bf X}_{h}^{\prime}\left({\bf X}^{\prime}{\bf X}\right)^{-1}{\bf X}_{h}\right)\qquad(4.50) \end{align*}

Example 4.7.1

For the bodyfat data from Example 4.5.1, suppose we want to predict and estimate at tri=25, thigh=51.2, and midarm=24.9. We will use the predict function as we did in the simple linear case.

library(tidyverse)

dat = read.table("http://www.jpstats.org/Regression/data/BodyFat.txt", header=T)

fit = lm(bfat~tri+thigh+midarm, data=dat)


xnew = data.frame(tri = 25, thigh = 51.2, midarm = 24.9)
predict(fit, xnew, interval="confidence", level = 0.90)

       fit      lwr      upr
1 24.73348 18.80705 30.65991

predict(fit, xnew, interval="prediction", level = 0.90)  

       fit     lwr      upr
1 24.73348 17.3939 32.07306

« 4.6: ANOVA and Adjusted Coefficient of Determination 5.1: Multicollinearity and Its Effects »