Fill in Blanks
Home

1.4 Properties of the Least Squares Estimators

"The most important questions of life are, for the most part, really only problems of probability."
- Pierre Simon, Marquis de Laplace
We first note that the least squares estimators in (1.4)
\begin{align*} b_{0} & =\bar{Y}-b_{1}\bar{X}\\ b_{1} & =\frac{\sum \left(X_{i}-\bar{X}\right)\left(Y_{i}-\bar{Y}\right)}{\sum \left(X_{i}-\bar{X}\right)^{2}}\qquad\qquad\qquad(1.4) \end{align*}
are linear functions of the observations $Y_{1},\ldots,Y_{n}$. That is, both $b_{0}$ and $b_{1}$ can be written as a linear combination of the $Y$'s.

Since $Y$ is the variable that we want to model in (1.1)
$$ Y_i=\beta_0+\beta_1X_i+\varepsilon_i\qquad\qquad\qquad(1.1) $$
, we call an estimator for some parameter that takes the form of a linear combination of $Y$ a linear estimator.
We first note that $\sum\left(X_{i}-\bar{X}\right)=0$. The proof can be found here
\begin{align*} \sum \left(X_{i}-\bar{X}\right) & =\sum X_{i}-\sum \bar{X}\\ & =\sum X_{i}-n\bar{X}\\ & =\sum X_{i}-n\frac{1}{n}\sum X_{i}\\ & =\sum X_{i}-\sum X_{i}\\ & =0. \end{align*}
.

We now rewrite $b_{1}$ as \begin{align*} b_{1} & =\frac{\sum\left(X_{i}-\bar{X}\right)\left(Y_{i}-\bar{Y}\right)}{\sum\left(X_{i}-\bar{X}\right)^{2}}\\ & =\frac{\sum\left(X_{i}-\bar{X}\right)Y_{i}-\bar{Y}\sum\left(X_{i}-\bar{X}\right)}{\sum\left(X_{i}-\bar{X}\right)^{2}}\\ & =\frac{\sum\left(X_{i}-\bar{X}\right)Y_{i}}{\sum\left(X_{i}-\bar{X}\right)^{2}}\\ & =\left(\frac{1}{\sum\left(X_{i}-\bar{X}\right)^{2}}\right)\sum\left(X_{i}-\bar{X}\right)Y_{i} \end{align*}

Thus, we can write $b_{1}$ as \begin{align*} b_{1} & =\sum k_{i}Y_{i}\qquad\qquad\qquad(1.5) \end{align*} where \begin{align*} k_{i} & =\frac{X_{i}-\bar{X}}{\sum \left(X_{i}-\bar{X}\right)^{2}} \end{align*} From (1.5), we see that $b_{1}$ is a linear combination of the $Y$'s since $k_{i}$ are known constants (recall that $X_{i}$ are treated as a known constants).
We can rewrite $b_{0}$ as \begin{align*} b_{0} & =\bar{Y}-b_{1}\bar{X}\\ & =\frac{1}{n}\sum Y_{i}-\bar{X}\sum k_{i}Y_{i}\\ & =\sum\left(\frac{1}{n}-\bar{X}k_{i}\right)Y_{i} \end{align*}

Thus, we can write $b_{0}$ as \begin{align*} b_{0} & =\sum c_{i}Y_{i}\qquad\qquad\qquad(1.6) \end{align*} where \begin{align*} c_{i} & =\frac{1}{n}-\bar{X}k_{i} \end{align*} Therefore, $b_{0}$ is a linear combination of $Y_{i}$.
The coefficients $k_{i}$ have the following properties: \begin{align*} \sum k_{i} & =0 & \qquad\qquad\qquad(1.7)\\ \sum k_{i}X_{i} & =1 & \qquad\qquad\qquad(1.8)\\ \sum k_{i}^{2} & =\frac{1}{\sum \left(X_{i}-\bar{X}\right)^{2}} & \qquad\qquad\qquad(1.9) \end{align*} The proof of (1.7) can be found here
\begin{align*} \sum k_{i} & =\sum \frac{\left(X_{i}-\bar{X}\right)}{\sum \left(X_{i}-\bar{X}\right)^{2}}\\ & =\frac{1}{\sum \left(X_{i}-\bar{X}\right)^{2}}\underbrace{\sum \left(X_{i}-\bar{X}\right)}_{=0}\\ & =0 \end{align*}
.
The proof of (1.8) can be found here
\begin{align*} \sum k_{i}X_{i} & =\sum \frac{\left(X_{i}-\bar{X}\right)}{\sum \left(X_{i}-\bar{X}\right)^{2}}X_{i}\\ & =\frac{1}{\sum \left(X_{i}-\bar{X}\right)^{2}}\sum \left(X_{i}-\bar{X}\right)X_{i}\\ & =\frac{1}{\sum \left(X_{i}-\bar{X}\right)^{2}}\left[\sum X_{i}^{2}-\bar{X}\sum X_{i}\right]\\ & =\frac{1}{\sum \left(X_{i}-\bar{X}\right)^{2}}\left[\sum X_{i}^{2}-\bar{X}\sum X_{i}\underbrace{-\bar{X}\sum X_{i}+\bar{X}\sum X_{i}}_{\text{completing the square}}\right]\\ & =\frac{1}{\sum \left(X_{i}-\bar{X}\right)^{2}}\left[\sum X_{i}^{2}-2\bar{X}\sum X_{i}+\bar{X}n\left(\frac{1}{n}\sum X_{i}\right)\right]\\ & =\frac{1}{\sum \left(X_{i}-\bar{X}\right)^{2}}\left[\sum X_{i}^{2}-2\bar{X}\sum X_{i}+n\left(\bar{X}\right)^{2}\right]\\ & =\frac{1}{\sum \left(X_{i}-\bar{X}\right)^{2}}\left[\sum \left(X_{i}-\bar{X}\right)^{2}\right]\\ & =1 \end{align*}
.
The proof of (1.9) can be found here
\begin{align*} \sum k_{i}^{2} & =\sum \left(\frac{\left(X_{i}-\bar{X}\right)}{\sum \left(X_{i}-\bar{X}\right)^{2}}\right)^{2}\\ & =\frac{1}{\left(\sum \left(X_{i}-\bar{X}\right)^{2}\right)^{2}}\sum \left(X_{i}-\bar{X}\right)^{2}\\ & =\frac{1}{\sum \left(X_{i}-\bar{X}\right)^{2}} \end{align*}
.

Likewise, the coefficients $c_{i}$ have the following properties: \begin{align*} \sum c_{i} & =1 & \qquad\qquad\qquad(1.10)\\ \sum c_{i}X_{i} & =0 & \qquad\qquad\qquad(1.11)\\ \sum c_{i}^{2} & =\frac{1}{n}+\frac{\left(\bar{X}\right)^{2}}{\sum \left(X_{i}-\bar{X}\right)^{2}} & \qquad\qquad\qquad(1.12) \end{align*} The proof of (1.10) can be found here
\begin{align*} \sum c_{i} & =\sum \left(\frac{1}{n}-\bar{X}k_{i}\right)\\ & =\sum \frac{1}{n}-\bar{X}\underbrace{\sum k_{i}}_{(1.7)}\\ & =\frac{n}{n}\\ & =1 \end{align*}
.
The proof of (1.11) can be found here
\begin{align*} \sum c_{i}X_{i} & =\sum \left(\frac{1}{n}-\bar{X}k_{i}\right)X_{i}\\ & =\frac{1}{n}\sum X_{i}-\bar{X}\underbrace{\sum k_{i}X_{i}}_{(1.8)}\\ & =\bar{X}-\bar{X}\\ & =0 \end{align*}
.
The proof of (1.12) can be found here
\begin{align*} \sum c_{i}^{2} & =\sum \left(\frac{1}{n}-\bar{X}k_{i}\right)^{2}\\ & =\sum \frac{1}{n^{2}}-2\frac{1}{n}\bar{X}\underbrace{\sum k_{i}}_{(1.7)}+\left(\bar{X}\right)^{2}\underbrace{\sum k_{i}^{2}}_{(1.9)}\\ & =\frac{1}{n}+\frac{\left(\bar{X}\right)^{2}}{\sum \left(X_{i}-\bar{X}\right)^{2}} \end{align*}
.

These properties will be used to find the expectations and variances of $b_{1}$ and $b_{0}$.
Before finding the expectations, recall $E\left[Y_{i}\right]=\beta_{0}+\beta_{1}X_{i}$ from Section 1.2.3.
The expected value of $b_{1}$ is \begin{align*} E\left[b_{1}\right] & =E\left[\underbrace{\sum k_{i}Y_{i}}_{(1.5)}\right]\\ & =\sum k_{i}\left(\beta_{0}+\beta_{1}X_{i}\right)\\ & =\beta_{0}\underbrace{\sum k_{i}}_{(1.7)}+\beta_{1}\underbrace{\sum k_{i}X_{i}}_{(1.8)}\\ & =\beta_{1} \end{align*}
The expected value of $b_{0}$ is \begin{align*} E\left[b_{0}\right] & =E\left[\underbrace{\sum c_{i}Y_{i}}_{(1.6)}\right]\\ & =\sum c_{i}\left(\beta_{0}+\beta_{1}X_{i}\right)\\ & =\beta_{0}\underbrace{\sum c_{i}}_{(1.10)}+\beta_{1}\underbrace{\sum c_{i}X_{i}}_{(1.11)}\\ & =\beta_{0} \end{align*}
To find the variances, we will use a result from mathematical statistics:
Let $Y_{1},\ldots,Y_{n}$ be uncorrelated random variables and let $a_{1},\ldots,a_{n}$ be constants. Then \begin{align*} Var\left[\sum a_{i}Y_{i}\right] & =\sum a_{i}^{2}Var\left[Y_{i}\right]\qquad\qquad\qquad(1.13) \end{align*} Recall that in model (1.1)
$$ Y_i=\beta_0+\beta_1X_i+\varepsilon_i\qquad\qquad\qquad(1.1) $$
, we assume the response variables $Y_{i}$'s are uncorrelated.

Also, recall that $Var[Y_i]=\sigma^2$ from Section 1.2.3.
The variance of $b_{1}$ is \begin{align*} Var\left[b_{1}\right] & =Var\left[\underbrace{\sum k_{i}Y_{i}}_{(1.5)}\right]\\ & =\underbrace{\sum k_{i}^{2}}_{(1.9)}Var\left[Y_{i}\right]\\ & =\frac{\sigma^{2}}{\sum \left(X_{i}-\bar{X}\right)^{2}}\qquad\qquad\qquad(1.14) \end{align*}
The variance of $b_{0}$ is \begin{align*} Var\left[b_{0}\right] & =Var\left[\underbrace{\sum c_{i}Y_{i}}_{(1.6)}\right]\\ & =\underbrace{\sum c_{i}^{2}}_{(1.12)}Var\left[Y_{i}\right]\\ & =\sigma^{2}\left[\frac{1}{n}+\frac{\left(\bar{X}\right)^{2}}{\sum \left(X_{i}-\bar{X}\right)^{2}}\right]\qquad\qquad\qquad(1.15) \end{align*}
We see from (1.5)
\begin{align*} b_{1} & =\sum k_{i}Y_{i}\qquad\qquad\qquad(1.5) \end{align*} where \begin{align*} k_{i} & =\frac{X_{i}-\bar{X}}{\sum \left(X_{i}-\bar{X}\right)^{2}} \end{align*}
and (1.6)
\begin{align*} b_{0} & =\sum c_{i}Y_{i}\qquad\qquad\qquad(1.6) \end{align*} where \begin{align*} c_{i} & =\frac{1}{n}-\bar{X}k_{i} \end{align*}
that $b_{0}$ and $b_{1}$ are linear estimators.

Any estimator for $\beta_{1}$, which we will denote as $\tilde{\beta}_{0}$, that takes the form \begin{align*} \tilde{\beta}_{1} & =\sum a_{i}Y_{i} \end{align*} where $a_{i}$ is some constants, is called a linear estimator.

For all linear estimators that are unbiased, we must have \begin{align*} E\left[\tilde{\beta}_{1}\right] & =E\left[\sum a_{i}Y_{i}\right]\\ & =\sum a_{i}E\left[Y_{i}\right]\\ & =\beta_{1} \end{align*} Since $E\left[Y_{i}\right]=\beta_{0}+\beta_{1}X_{i}$ from Section Section 1.2.3., then we must have \begin{align*} E\left[\tilde{\beta}_{1}\right] & =\sum a_{i}\left(\beta_{0}+\beta_{1}X_{i}\right)\\ & =\beta_{0}\sum a_{i}+\beta_{1}\sum a_{i}X_{i}\\ & =\beta_{1} \end{align*} Therefore, \begin{align*} \sum a_{i} & =0\\ \sum a_{i}X_{i} & =1 \end{align*} We now examine the variance of $\tilde{\beta}_{1}$: \begin{align*} Var\left[\tilde{\beta}_{1}\right] & =\sum a_{i}^{2}Var\left[Y_{i}\right]\\ & =\sigma^{2}\sum a_{i}^{2} \end{align*} Let's now define $a_{i}=k_{i}+d_{i}$ where $k_{i}$ is defined in (1.5)
and $d_{i}$ is some arbitrary constant.

We will show that adding a constant (whether negative or positive) to $k_i$ cannot make the variance smaller. Thus, the smallest variance of the linear estimator $\tilde{\beta}_1$ is when $a_i=k_i$.

The variance of $\tilde{\beta}_{1}$ can now be written as \begin{align*} Var\left[\tilde{\beta}_{1}\right] & =\sigma^{2}\sum a_{i}^{2}\\ & =\sigma^{2}\sum\left(k_{i}+d_{i}\right)^{2}\\ & =\sigma^{2}\sum\left(k_{i}^{2}+2k_{i}d_{i}+d_{i}^{2}\right)\\ & =Var\left[b_{1}\right]+2\sigma^{2}\sum k_{i}d_{i}+\sigma^{2}\sum d_{i}^{2} \end{align*} Examining the second term and using the expression of $k_{i}$ in (1.5)
\begin{align*} b_{1} & =\sum k_{i}Y_{i}\qquad\qquad\qquad(1.5) \end{align*} where \begin{align*} k_{i} & =\frac{X_{i}-\bar{X}}{\sum \left(X_{i}-\bar{X}\right)^{2}} \end{align*}
, we see that \begin{align*} \sum k_{i}d_{i} & =\sum k_{i}\left(a_{i}-k_{i}\right)\\ & =\sum a_{i}k_{i}-\underbrace{\sum k_{i}^{2}}_{(1.9)}\\ & =\sum a_{i}\frac{X_{i}-\bar{X}}{\sum\left(X_{i}-\bar{X}\right)^{2}}-\frac{1}{\sum\left(X_{i}-\bar{X}\right)^{2}}\\ & =\frac{\sum a_{i}X_{i}-\bar{X}\sum a_{i}}{\sum\left(X_{i}-\bar{X}\right)^{2}}-\frac{1}{\sum\left(X_{i}-\bar{X}\right)^{2}}\\ & =\frac{1-\bar{X}\left(0\right)}{\sum\left(X_{i}-\bar{X}\right)^{2}}-\frac{1}{\sum\left(X_{i}-\bar{X}\right)^{2}}\\ & =0 \end{align*} We now have the variance of $\tilde{\beta}_{1}$ as \begin{align*} Var\left[\tilde{\beta}_{1}\right] & =Var\left[b_{1}\right]+\sigma^{2}\sum d_{i}^{2} \end{align*} This variance is minimized when $\sum d_{i}^{2}=0$ which only happens when $d_{i}=0$.

Thus the unbiased linear estimator with the smallest variance is when $a_{i}=k_{i}$. That is, the least squares estimator $b_{1}$ in (1.5)
\begin{align*} b_{1} & =\sum k_{i}Y_{i}\qquad\qquad\qquad(1.5) \end{align*} where \begin{align*} k_{i} & =\frac{X_{i}-\bar{X}}{\sum \left(X_{i}-\bar{X}\right)^{2}} \end{align*}
has the smallest variance of all unbiased linear estimators of $\beta_{1}$.

A similar argument can be used to show that $b_{0}$ in (1.6)
\begin{align*} b_{0} & =\sum c_{i}Y_{i}\qquad\qquad\qquad(1.6) \end{align*} where \begin{align*} c_{i} & =\frac{1}{n}-\bar{X}k_{i} \end{align*}
has the smallest variance of all unbiased linear estimators of $\beta_{0}$.

These arguments lead us to the following Theorem:

Theorem 1.1 Gauss Markov theorem:
For the model in (1.1)
$$ Y_i=\beta_0+\beta_1X_i+\varepsilon_i\qquad\qquad\qquad(1.1) $$
, the least squares estimators $b_0$ and $b_1$ in (1.4)
\begin{align*} b_{0} & =\bar{Y}-b_{1}\bar{X}\\ b_{1} & =\frac{\sum \left(X_{i}-\bar{X}\right)\left(Y_{i}-\bar{Y}\right)}{\sum \left(X_{i}-\bar{X}\right)^{2}}\qquad\qquad\qquad(1.4) \end{align*}
are unbiased and have minimum variance among all unbiased linear estimators.

An estimator that is linear, unbiased, and has the smallest variance of all unbiased linear estimators is called the best linear unbiased estimator (BLUE).
We now have the expectations and variances of the least squares estimators $b_{0}$ and $b_{1}$. We next examine the sampling distribution of these estimators.

In model (1.1)
$$ Y_i=\beta_0+\beta_1X_i+\varepsilon_i\qquad\qquad\qquad(1.1) $$
, we did not make any assumptions about the distribution of $\varepsilon$ other than its mean and variance and that they are uncorrelated. We did not assume anything about the shape of its distribution.

In Section 2.1, we will make an assumption about the shape which will allow us to determine the shape of the sampling distributions of $b_{0}$ and $b_{1}$.

For model (1.1)
$$ Y_i=\beta_0+\beta_1X_i+\varepsilon_i\qquad\qquad\qquad(1.1) $$
, the shape of the distribution of $\varepsilon$ is not specified so we cannot determine the shape of the sampling distributions of $b_{0}$ and $b_{1}$. We can approximate these sampling distributions by either repeated sampling from the population of interest or using a techniques such as the bootstrap.
If we were to repeatedly take a sample of size $n$ from the population of interest and then find the least squares estimates each time, we could plot these estimates to estimate their sampling distributions. This is known as repeated sampling.

For example, let's consider the handspan and height measurements from Section 1.1.1. These measurements were from 834 college students. Suppose these students are now the population of interest.

We will take a random sample of $n=30$ from this population of 834 college students. We could look at all possible samples of size $n=30$ and the resulting least squares estimates $b_0$ and $b_1$. This would give us the exact sampling distribution for each. In this example with a relatively small population size of 834, the number of possible samples of size 30 is \begin{align*} {834 \choose 30} & =9.596\times10^{54} \end{align*}

It would be infeasible to examine this many samples, however, we could look at enough samples (tens of thousands) to get an estimate of the sampling distributions.

In Figure 1.4.1 below, we see the scatterplot of all 834 population measurements on the left. They are colored gray. The least squares line for the entire population is shown in blue.

A random sample of size $n=30$ is shown as red points in the scatterplot. The red line represents the fitted line for the sample.

The plots on the right show the histograms of the least squares estimates $b_0$ and $b_1$ from the repeated sampling. Clicking on the buttons above the plot will conduct the random sampling and update the histograms. The blue and red vertical lines in the histograms represent the population parameters and the mean of the estimates, respectively.



Population line:
$Y_i=\beta_0+\beta_1 X_i$
$Y_i=-3.1322+0.3457 X_i$
Least squares line of last sample:
$\hat{Y}_i=b_0+b_1 X_i$
Number of samples:

Mean of the $b_0$'s:

Mean of the $b_1$'s:

Figure 1.4.1: Sampling Distributions of $b_0$ and $b_1$ by Repeated Sampling



We can get a good estimate of the sampling distributions by examining just a few ten thousands of samples. The downside to using repeated sampling is that we usually only have one sample. Thus, we need a different approach that will allow us to estimate the sampling distributions with just the information in our one sample. We will explore one way to do this next.
Bootstrapping is a method for estimating the sampling distribution of a statistic based on the observations of one sample.

This estimation is done by sampling with replacement of size $n$ from the observed data. For each of these "bootstrap" samples, the estimator, both $b_0$ and $b_1$ in this case, is computed. This is done many times (usually thousands) so that the resulting distribution of these bootstrapped statistics provides an estimate for the sampling distribution of the statistic.

Suppose we only had a sample of size $n=30$ from the handspan and height data. This sample is plotted in Figure 1.4.2 below with the least squares line (both in red).

A sample of size $n=30$ is taken from the red dots, with replacement. This bootstrapped sample are the black dots. Note that some of the observations can be selected more than once due to sampling with replacement. This is why some of the points are still red.

The $b_0$ and $b_1$ for the bootstrap sample are then plotted in the histograms to the right. Clicking on the buttons above the figure will generate more bootstrap samples and estimates.



Sample line:
$\hat{Y}_i=-5.8796+0.3804 X_i$
Last bootstrap line:
Number of bootstrap samples:

Mean of the $b_0$'s:

Mean of the $b_1$'s:

Figure 1.4.2: Sampling Distributions of $b_0$ and $b_1$ by Bootstrap Sampling



After generating a few thousand bootstrap estimates, we can get a good estimate of the sampling distributions of $b_0$ and $b_1$. Note, however, how these estimated distributions are centered at the least squares estimates of the observed sample (the red vertical lines) and not at the true population values (the blue vertical lines).

In Section 2.1, we will have an additional assumption in model (1.1)
$$ Y_i=\beta_0+\beta_1X_i+\varepsilon_i\qquad\qquad\qquad(1.1) $$
in which the distribution of the errors will be specified. This assumption will allow us to know the sampling distributions of $b_0$ and $b_1$ without the need of repeated sampling or bootstrap sampling.

We next discuss how to estimate the last parameter in model (1.1)
$$ Y_i=\beta_0+\beta_1X_i+\varepsilon_i\qquad\qquad\qquad(1.1) $$
: the variance, $\sigma^2$, of the error term.