Fill in Blanks
Home
1.1 Bivariate Relationships
1.2 Probabilistic Models
1.3 Estimation of the Line
1.4 Properties of the Least Squares Estimators
1.5 Estimation of the Variance
2.1 The Normal Errors Model
2.2 Inferences for the Slope
2.3 Inferences for the Intercept
2.4 Correlation and Coefficient of Determination
2.5 Estimating the Mean Response
2.6 Predicting the Response
3.1 Residual Diagnostics
3.2 The Linearity Assumption
3.3 Homogeneity of Variance
3.4 Checking for Outliers
3.5 Correlated Error Terms
3.6 Normality of the Residuals
4.1 More Than One Predictor Variable
4.2 Estimating the Multiple Regression Model
4.3 A Primer on Matrices
4.4 The Regression Model in Matrix Terms
4.5 Least Squares and Inferences Using Matrices
4.6 ANOVA and Adjusted Coefficient of Determination
4.7 Estimation and Prediction of the Response
5.1 Multicollinearity and Its Effects
5.2 Adding a Predictor Variable
5.3 Outliers and Influential Cases
5.4 Residual Diagnostics
5.5 Remedial Measures
2.5 Estimating the Mean Response
"Be approximately right rather than exactly wrong."
- John Tukey
Now that we have fit model (2.1)
estimation and prediction .
Recall from (2.2)
So we canestimate the mean of $Y_{i}$ for some value of $X_{i}$
by evaluating the model estimated with the least squares estimators:
\begin{align*}
\hat{Y}_{i} & =b_{0}+b_{1}X_{i}
\end{align*}
We say $\hat{Y}_{i}$ is a point estimator for the population mean
$\beta_{0}+\beta_{1}X_{i}$.
\begin{align*}
Y_i=&\beta_0+\beta_1X_i+\varepsilon_i\\
\varepsilon_i\overset{iid}{\sim}& N\left(0,\sigma^2\right)\qquad\qquad\qquad\qquad(2.1)
\end{align*}
and assessed how good of fit the
model is, we can now use the model for Recall from (2.2)
\begin{align*}
Y_i{\sim} N\left(\beta_0+\beta_1X_i,\sigma^2\right)\qquad\qquad\qquad(2.2)
\end{align*}
that the mean of $Y_{i}$ for some value of $X_{i}$
is the population line $\beta_{0}+\beta_{1}X_{i}$ evaluated at $X_{i}$.
So we can
We will want to make an inference for the population mean response
at some value of the predictor variable $X_{i}$.
We have a point estimator $\hat{Y}_{i}$. We will now examine thesampling distribution of $\hat{Y}_{i}$ and use it to make a confidence
interval for the mean response $\beta_{0}+\beta_{1}X_{i}$.
We have a point estimator $\hat{Y}_{i}$. We will now examine the
We will denote the value of $X$ at which we want to estimate the
mean response as $X_{h}$. So the value of $Y$ at $X_{h}$ will be
$Y_{h}$
We write $\hat{Y}_{h}$ as \begin{align*} \hat{Y}_{h} & =\underbrace{b_{0}}_{(1.6)}+\underbrace{b_{1}}_{(1.5)}X_{h}\\ & =\sum c_{i}Y_{i}+\sum k_{i}Y_{h}X_{h}\\ & =\sum\left(c_{i}+k_{i}X_{h}\right)Y_{h} \end{align*} Thus, $\hat{Y}_{j}$ is a linear combination of the observed $Y_{i}$ which are normally distributed. Then by Theorem 2.1
We write $\hat{Y}_{h}$ as \begin{align*} \hat{Y}_{h} & =\underbrace{b_{0}}_{(1.6)}+\underbrace{b_{1}}_{(1.5)}X_{h}\\ & =\sum c_{i}Y_{i}+\sum k_{i}Y_{h}X_{h}\\ & =\sum\left(c_{i}+k_{i}X_{h}\right)Y_{h} \end{align*} Thus, $\hat{Y}_{j}$ is a linear combination of the observed $Y_{i}$ which are normally distributed. Then by Theorem 2.1
Theorem 2.1 Sum of Independent Normal Random Variables:
If $$ X_i\sim N\left(\mu_i,\sigma_i^2\right) $$ are independent, then the linear combination $\sum_i a_iX_i$ is also normally distributed where $a_i$ are constants. In particular $$ \sum_i a_iX_i \sim N\left(\sum_i a_i\mu_i, \sum_i a_i^2\sigma_i^2\right) $$
, $\hat{Y}_{j}$
is normally distributed.
If $$ X_i\sim N\left(\mu_i,\sigma_i^2\right) $$ are independent, then the linear combination $\sum_i a_iX_i$ is also normally distributed where $a_i$ are constants. In particular $$ \sum_i a_iX_i \sim N\left(\sum_i a_i\mu_i, \sum_i a_i^2\sigma_i^2\right) $$
Using Theorem 2.1
Theorem 2.1 Sum of Independent Normal Random Variables:
If $$ X_i\sim N\left(\mu_i,\sigma_i^2\right) $$ are independent, then the linear combination $\sum_i a_iX_i$ is also normally distributed where $a_i$ are constants. In particular $$ \sum_i a_iX_i \sim N\left(\sum_i a_i\mu_i, \sum_i a_i^2\sigma_i^2\right) $$
, we have the mean as
\begin{align*}
\sum\left(c_{i}+k_{i}X_{h}\right)E\left[Y_{h}\right] & =\left(\underbrace{\sum c_{i}}_{(1.10)}+X_{h}\underbrace{\sum k_{i}}_{(1.7)}\right)\left(\beta_{0}+\beta_{1}X_{h}\right)\\
& =\beta_{0}+\beta_{1}X_{h}
\end{align*}
If $$ X_i\sim N\left(\mu_i,\sigma_i^2\right) $$ are independent, then the linear combination $\sum_i a_iX_i$ is also normally distributed where $a_i$ are constants. In particular $$ \sum_i a_iX_i \sim N\left(\sum_i a_i\mu_i, \sum_i a_i^2\sigma_i^2\right) $$
Using Theorem 2.1
We will need to estimate $\sigma^{2}$ with $s^{2}$. This will mean that the confidence interval is a $t$ interval.
Theorem 2.1 Sum of Independent Normal Random Variables:
If $$ X_i\sim N\left(\mu_i,\sigma_i^2\right) $$ are independent, then the linear combination $\sum_i a_iX_i$ is also normally distributed where $a_i$ are constants. In particular $$ \sum_i a_iX_i \sim N\left(\sum_i a_i\mu_i, \sum_i a_i^2\sigma_i^2\right) $$
, we have the variance as
\begin{align*}
Var\left[\hat{Y}_{h}\right] & =\sum\left(\underbrace{c_{i}}_{(1.6)}+\underbrace{k_{i}}_{(1.5)}X_{h}\right)^{2}\underbrace{Var\left[Y_{h}\right]}_{(2.2)}\\
& =\sum\left(\frac{1}{n}-\bar{X}\frac{\left(X_{i}-\bar{X}\right)}{\sum\left(X_{i}-\bar{X}\right)^{2}}+\frac{\left(X_{i}-\bar{X}\right)}{\sum\left(X_{i}-\bar{X}\right)^{2}}X_{h}\right)^{2}\sigma^{2}\\
& =\sigma^{2}\sum\left(\frac{1}{n}+\frac{\left(X_{i}-\bar{X}\right)\left(X_{h}-\bar{X}\right)}{\sum\left(X_{i}-\bar{X}\right)^{2}}\right)^{2}\\
& =\sigma^{2}\sum\left(\frac{1}{n^{2}}+2\left(\frac{1}{n}\right)\frac{\left(X_{i}-\bar{X}\right)\left(X_{h}-\bar{X}\right)}{\sum\left(X_{i}-\bar{X}\right)^{2}}+\frac{\left(X_{i}-\bar{X}\right)^{2}\left(X_{h}-\bar{X}\right)^{2}}{\left(\sum\left(X_{i}-\bar{X}\right)^{2}\right)^{2}}\right)\\
& =\sigma^{2}\left(\frac{1}{n}+2\left(\frac{1}{n}\right)\frac{\left(X_{h}-\bar{X}\right)\sum\left(X_{i}-\bar{X}\right)}{\sum\left(X_{i}-\bar{X}\right)^{2}}+\frac{\left(X_{h}-\bar{X}\right)^{2}\sum\left(X_{i}-\bar{X}\right)^{2}}{\left(\sum\left(X_{i}-\bar{X}\right)^{2}\right)^{2}}\right)\\
& =\sigma^{2}\left(\frac{1}{n}++\frac{\left(X_{h}-\bar{X}\right)^{2}}{\sum\left(X_{i}-\bar{X}\right)^{2}}\right)
\end{align*}
So the sampling distribution of $\hat{Y}_{h}$ is
\begin{align*}
\hat{Y}_{h} & \sim N\left(\beta_{0}+\beta_{1}X_{h},\sigma^{2}\left(\frac{1}{n}+\frac{\left(X_{h}-\bar{X}\right)^{2}}{\sum\left(X_{i}-\bar{X}\right)^{2}}\right)\right)\qquad\qquad(2.19)
\end{align*}
If $$ X_i\sim N\left(\mu_i,\sigma_i^2\right) $$ are independent, then the linear combination $\sum_i a_iX_i$ is also normally distributed where $a_i$ are constants. In particular $$ \sum_i a_iX_i \sim N\left(\sum_i a_i\mu_i, \sum_i a_i^2\sigma_i^2\right) $$
We will need to estimate $\sigma^{2}$ with $s^{2}$. This will mean that the confidence interval is a $t$ interval.
A $\left(100-\alpha\right)100\%$ confidence interval for the mean response is
\begin{align*}
\hat{Y}_{h} & \pm t_{\alpha/2}\sqrt{s^{2}\left(\frac{1}{n}+\frac{\left(X_{h}-\bar{X}\right)^{2}}{\sum\left(X_{i}-\bar{X}\right)^{2}}\right)}\qquad\qquad(2.20)
\end{align*}
An example is provided in Section 2.6.