1.5 Estimation of $\sigma^2$

"In manufacturing, we try to stamp out variance. With people, variance is everything."
- Jack Welch

1.5.1 The Sum of Squared Errors

Recall from Section 1.3.2 that the least squares estimates minimized $$ \sum \left(Y_i - \hat{Y}_i\right)^2 $$ When the least squares estimates are used in $\hat{Y}_i$ and the sum is calculated for the sample, we call this quantity the sum of squares error (SSE): $$ SSE = \sum \left(Y_i - \hat{Y}_i\right)^2\qquad\qquad\qquad(1.16) $$ We call the values $\hat{Y}_i$, $i=1,\ldots,n$, the fitted values.

Note that SSE is the sum of the squared vertical distance of the points to the least squares line. That is, it is the sum of the squares of the observed $Y_i$ and the corresponding fitted value $\hat{Y}_i$.

SSE as a measure of variability

Note that SSE is a measure of variability. In particular, it describes how the points are spread about the line.

If we were to divide SSE by $n-2$, we obtain an estimate of the variance of $\varepsilon$. We will denote it as $s^2$: $$ s^2 = \frac{SSE}{n-2}\qquad\qquad\qquad(1.17) $$ Note that $s^2$ is sometimes called the mean squared error (MSE).

The reason for dividing by $n-2$ instead of $n$ will be evident in the expected value below.

If we take the square root of $s^2$, we obtain the standard error of the line: $$ s = \sqrt{\frac{SSE}{n-2}}\qquad\qquad\qquad(1.18) $$

1.5.2 Residuals

If we had the true line $\beta_{0}+\beta_{1}X_{i}$, then we would have observed values of the random errors since \begin{align*} \varepsilon_{i} & =Y_{i}-\left(\beta_{0}+\beta_{1}X_{i}\right) \end{align*} Since we do not know the true line, but have the fitted line, we can calculate the difference between the observed $Y_{i}$ and the fitted line $\hat{Y}_{i}$. These differences are called the residuals which are denoted as $e_{i}$: \begin{align*} e_{i} & =Y_{i}-\hat{Y}_{i}\qquad\qquad\qquad(1.19) \end{align*} Thus, we can express SSE as \begin{align*} SSE & =\sum e_{i}^2 \end{align*}

1.5.3 Properties of the Residuals and Fitted Line

Since the least squares fitted line minimizes $Q$ in (1.2)

$$ Q=\sum \left(Y_i-\left(b_0+b_1 X_i\right)\right)^2 \qquad\qquad\qquad (1.2) $$

, then it should be clear that SSE is a minimum.

We also note that \begin{align*} \sum e_{i} & =0 & \qquad\qquad\qquad(1.20)\\ \sum X_{i}e_{i} & =0 & \qquad\qquad\qquad(1.21)\\ \sum\hat{Y}_{i}e_{i} & =0 & \qquad\qquad\qquad(1.22)\\ \sum Y_{i} & =\sum\hat{Y}_{i} & \qquad\qquad\qquad(1.23) \end{align*} Proof of (1.20) can be found here

\begin{align*} \sum e_{i} & =\sum\left(Y_{i}-\hat{Y}_{i}\right)\\ & =\sum Y_{i}-\sum\hat{Y}_{i}\\ & =\sum Y_{i}-\sum\left(b_{0}+b_{1}X_{i}\right)\\ & =\sum Y_{i}-nb_{0}-b_{1}\sum X_{i}\\ & =\sum Y_{i}-\underbrace{\sum Y_{i}}_{(1.3)}\\ & =0 \end{align*}

.
Proof of (1.21) can be found here

\begin{align*} \sum X_{i}e_{i} & =\sum X_{i}\left(Y_{i}-\hat{Y}_{i}\right)\\ & =\sum X_{i}Y_{i}-\sum X_{i}\hat{Y}_{i}\\ & =\underbrace{b_{0}\sum X_{i}+b_{1}\sum X_{i}^{2}}_{(1.3)}-\sum X_{i}\left(b_{0}+b_{1}X_{i}\right)\\ & =b_{0}\sum X_{i}+b_{1}\sum X_{i}^{2}-b_{0}\sum X_{i}-b_{1}\sum X_{i}^{2}\\ & =0 \end{align*}

.
Proof of (1.22) can be found here

\begin{align*} \sum\hat{Y}_{i}e_{i} & =\sum\left(b_{0}+b_{1}X_{i}\right)\left(e_{i}\right)\\ & =b_{0}\underbrace{\sum e_{i}}_{(1.20)}+b_{1}\underbrace{\sum X_{i}e_{i}}_{(1.21)}\\ & =0 \end{align*}

.
Proof of (1.23) can be found here

\begin{align*} \sum Y_{i} & =\sum\hat{Y}_{i}\\ & =\underbrace{nb_{0}+b_{1}\sum X_{i}}_{(1.3)}\\ & =\sum Y_{i} \end{align*}

The point $\left(\bar{X},\bar{Y}\right)$

Another property of the least squares line is that it will always go through the point $\left(\bar{X},\bar{Y}\right)$.

This can be seen by evaluating $\hat{Y}_{i}$ at $X=\bar{X}$: \begin{align*} b_{0}+b_{1}\bar{X} & =\underbrace{\left(\bar{Y}-b_{1}\bar{X}\right)}_{(1.4)}+b_{1}\bar{X}\\ & =\bar{Y} \end{align*}

1.5.4 Expectation of $s^2$

Recall from Section 1.2.3, that $E\left[\varepsilon_{i}\right]=0$ and $Var\left[\varepsilon_{i}\right]=E\left[\left(\varepsilon_{i}-E\left[\varepsilon_{i}\right]\right)^{2}\right]=E\left[\varepsilon_{i}^{2}\right]=\sigma^{2}$.

We next see that the sample mean of the $Y$s can be expressed as \begin{align*} \bar{Y} & =\frac{1}{n}\sum\left(\beta_{0}+\beta_{1}X_{i}+\varepsilon_{i}\right)\\ & =\beta_{0}+\beta_{1}\bar{X}+\overline{\varepsilon}\qquad\qquad\qquad(1.24) \end{align*} Subtracting (1.24) from each side of model (1.1)

gives us \begin{align*} Y_{i}-\bar{Y} & =\beta_{1}\left(X_{i}-\bar{X}\right)+\left(\varepsilon_{i}-\bar{\varepsilon}\right)\qquad\qquad\qquad(1.25) \end{align*} We also note that the residuals can be expressed as \begin{align*} e_{i} & =Y_{i}-\hat{Y}_{i}\\ & =Y_{i}-\underbrace{b_{0}}_{(1.4)}-b_{1}X_{i}\\ & =Y_{i}-\left(\bar{Y}-b_{1}\overline{X}\right)-b_{1}X_{i}\\ & =\left(Y_{i}-\overline{Y}\right)-b_{1}\left(X_{i}-\bar{X}\right)\qquad\qquad\qquad(1.26) \end{align*} Substituting (1.25) into (1.26) gives us \begin{align*} e_{i} & =\beta_{1}\left(X_{i}-\bar{X}\right)+\left(\varepsilon_{i}-\bar{\varepsilon}\right)-b_{1}\left(X_{i}-\bar{X}\right)\\ & =\left(\varepsilon_{i}-\bar{\varepsilon}\right)-\left(b_{1}-\beta_{1}\right)\left(X_{i}-\bar{X}\right)\qquad\qquad\qquad(1.27) \end{align*} We can express the squares of the residuals as \begin{align*} \sum e_{i}^{2} & =\sum\left(\varepsilon_{i}-\bar{\varepsilon}\right)^{2}-2\left(b_{1}-\beta_{1}\right)\sum\left(X_{i}-\bar{X}\right)\left(\varepsilon_{i}-\bar{\varepsilon}\right)\\ &\quad+\left(b_{1}-\beta_{1}\right)^{2}\sum\left(X_{i}-\bar{X}\right)^{2}\qquad\qquad\qquad\qquad(1.28) \end{align*} We will take the expectation of (1.28). The expectation will be distributed through the three terms which we will examine individually.

We first examine $E\left[\sum\left(\varepsilon_{i}-\bar{\varepsilon}\right)^{2}\right]$ in (1.28) : \begin{align*} E\left[\sum\left(\varepsilon_{i}-\bar{\varepsilon}\right)^{2}\right] & =E\left[\sum\varepsilon_{i}^{2}-2\overline{\varepsilon}\sum\varepsilon_{i}+\sum\left(\overline{\varepsilon}\right)^{2}\right]\\ & =\sum\underbrace{E\left[\varepsilon_{i}^{2}\right]}_{=\sigma^{2}}-2nE\left[\overline{\varepsilon}\frac{1}{n}\sum\varepsilon_{i}\right]+nE\left[\left(\overline{\varepsilon}\right)^{2}\right]\\ & =n\sigma^{2}-nE\left[\left(\overline{\varepsilon}\right)^{2}\right]\\ & =n\sigma^{2}-n\left(\frac{Var\left(\varepsilon_{i}\right)}{n}\right)\\ & =n\sigma^{2}-\sigma^{2}\\ & =\left(n-1\right)\sigma^{2} \end{align*} Below, we will use the expectation of a quantity squared: \begin{align*} E\left[Y^{2}\right] & =Var\left[Y\right]+\left(E\left[Y\right]\right)^{2} \end{align*} where $Y$ is a random variable. We now examine the expectation of the second term in (1.28). \begin{align*} & E\left[-2\left(b_{1}-\beta_{1}\right)\sum\left(X_{i}-\bar{X}\right)\left(\varepsilon_{i}-\bar{\varepsilon}\right)\right]\\ & =E\left[-2\left(\underbrace{\sum k_{i}Y_{i}}_{(1.5)}-\beta_{1}\right)\sum\left(X_{i}-\bar{X}\right)\left(\varepsilon_{i}-\bar{\varepsilon}\right)\right]\\ & =E\left[-2\left(\sum k_{i}\left(\beta_{0}+\beta_{1}X_{i}+\varepsilon_{i}\right)-\beta_{1}\right)\sum\left(X_{i}-\bar{X}\right)\left(\varepsilon_{i}-\bar{\varepsilon}\right)\right]\\ & =-2E\left[\left(\beta_{0}\sum k_{i}+\beta_{1}\underbrace{\sum k_{i}X_{i}}_{(1.8)}+\sum k_{i}\varepsilon_{i}-\beta_{1}\right)\sum\left(X_{i}-\bar{X}\right)\left(\varepsilon_{i}-\bar{\varepsilon}\right)\right]\\ & =-2E\left[\left(\sum k_{i}\varepsilon_{i}\right)\sum\left(X_{i}-\bar{X}\right)\left(\varepsilon_{i}-\bar{\varepsilon}\right)\right]\\ & =-2E\left[\left(\sum\underbrace{k_{i}}_{(1.5)}\varepsilon_{i}\right)\sum\left(X_{i}-\bar{X}\right)\varepsilon_{i}-\bar{\varepsilon}\left(\sum k_{i}\varepsilon_{i}\right)\underbrace{\sum\left(X_{i}-\bar{X}\right)}_{=0}\right]\\ & =-2E\left[\left(\frac{\sum\left(X_{i}-\overline{X}\right)\varepsilon_{i}}{\sum\left(X_{i}-\overline{X}\right)^{2}}\right)\sum\left(X_{i}-\bar{X}\right)\varepsilon_{i}\right]\\ & =-2E\left[\frac{\left(\sum\left(X_{i}-\overline{X}\right)\varepsilon_{i}\right)^{2}}{\sum\left(X_{i}-\overline{X}\right)^{2}}\right]\\ & =-2\frac{1}{\sum\left(X_{i}-\overline{X}\right)^{2}}E\left[\left(\sum\left(X_{i}-\overline{X}\right)\varepsilon_{i}\right)^{2}\right]\\ & =-2\frac{1}{\sum\left(X_{i}-\overline{X}\right)^{2}}\left(Var\left[\sum\left(X_{i}-\overline{X}\right)\varepsilon_{i}\right]+\left(E\left[\sum\left(X_{i}-\overline{X}\right)\varepsilon_{i}\right]\right)^{2}\right)\\ & =-2\frac{1}{\sum\left(X_{i}-\overline{X}\right)^{2}}\left(\sum\left(X_{i}-\overline{X}\right)^{2}Var\left[\varepsilon_{i}\right]+\left(\sum\left(X_{i}-\overline{X}\right)\underbrace{E\left[\varepsilon_{i}\right]}_{=0}\right)^{2}\right)\\ & =-2\frac{1}{\sum\left(X_{i}-\overline{X}\right)^{2}}\sum\left(X_{i}-\overline{X}\right)^{2}\sigma^{2}\\ & =-2\sigma^{2} \end{align*} For the third term in (1.28), we have the expectation as \begin{align*} E\left[\left(b_{1}-\beta_{1}\right)^{2}\sum\left(X_{i}-\bar{X}\right)^{2}\right] & =\sum\left(X_{i}-\bar{X}\right)^{2}E\left[\left(b_{1}-\beta_{1}\right)^{2}\right]\\ & =\sum\left(X_{i}-\bar{X}\right)^{2}\underbrace{Var\left[b_{1}\right]}_{(1.14)}\\ & =\sum\left(X_{i}-\bar{X}\right)^{2}\frac{\sigma^{2}}{\sum\left(X_{i}-\bar{X}\right)^{2}}\\ & =\sigma^{2} \end{align*} Now, we have the expectation of (1.28) as \begin{align*} E\left[\sum e_{i}^{2}\right] & =\left(n-1\right)\sigma^{2}-2\sigma^{2}+\sigma^{2}\\ & =\left(n-2\right)\sigma^{2} \end{align*} Therefore, the expectation of $s^{2}$ is \begin{align*} E\left[s^{2}\right] & =E\left[MSE\right]\\ & =E\left[\frac{\sum e_{i}^{2}}{n-2}\right]\\ & =\frac{E\left[\sum e_{i}^{2}\right]}{n-2}\\ & =\sigma^{2}\qquad\qquad\qquad(1.29) \end{align*} Therefore, dividing SSE by $n-2$ in (1.17) makes $s^{2}$ an unbiased estimator for $\sigma^{2}$.

Example 1.5.1

We now estimate the variance for the data in Table 1.3.1

Recall that we fit the least squares line to the data in Example 1.3.1


                                library(tidyverse)

x = c(1,2 ,2.75, 4, 6, 7, 8, 10)
y = c(2, 1.4, 1.6, 1.25, 1, 0.5, 0.5, 0.4)

dat =  tibble(x,y)

#the least squares fit
fit = lm(y~x, data=dat)
fit
                                
                                
                                    Call:
lm(formula = y ~ x, data = dat)

Coefficients:
(Intercept)            x  
1.9808              -0.1766  
                                
                                
                                #to get the residuals
fit$residuals
                                
                                
                                     1           2           3           4           5           6           7           8 
0.19577520 -0.22762027  0.10483313 -0.02441121  0.07879786 -0.24459761 -0.06799308  0.18521598

We could sum the square of the residuals to obtain SSE:


                                sse = fit$residuals^2 %>% sum()
sse
                                
                                
                                    [1] 0.2066899

And then divide by $n-2$ to obtain an estimate of $\sigma^2$. Taking the square root of this will give us $s$.


                                #s^2
sse / (8-2)
                                
                                
                                    [1] 0.03444832
                                
                                
                                #s
sqrt( sse / (8-2) )
                                
                                
                                    [1] 0.1856026

We can also pass the lm object to the summary() function.


                                fit %>% summary()
                                
                                
                                    Call:
lm(formula = y ~ x, data = dat)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.24460 -0.10790  0.02719  0.12493  0.19578 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.98083    0.13068  15.158  5.2e-06 ***
x           -0.17660    0.02218  -7.961 0.000209 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1856 on 6 degrees of freedom
Multiple R-squared:  0.9135,	Adjusted R-squared:  0.8991 
F-statistic: 63.37 on 1 and 6 DF,  p-value: 0.0002091

We see that $s$ is given as Residual standard error.

We can obtain just $s$ with


                                summary(fit)$sigma
                                
                                
                                    [1] 0.1856026

« 1.4: Properties of the Least Squares Estimators 2.1: The Normal Errors Model »