Fill in Blanks
Home
1.1 Bivariate Relationships
1.2 Probabilistic Models
1.3 Estimation of the Line
1.4 Properties of the Least Squares Estimators
1.5 Estimation of the Variance
2.1 The Normal Errors Model
2.2 Inferences for the Slope
2.3 Inferences for the Intercept
2.4 Correlation and Coefficient of Determination
2.5 Estimating the Mean Response
2.6 Predicting the Response
3.1 Residual Diagnostics
3.2 The Linearity Assumption
3.3 Homogeneity of Variance
3.4 Checking for Outliers
3.5 Correlated Error Terms
3.6 Normality of the Residuals
4.1 More Than One Predictor Variable
4.2 Estimating the Multiple Regression Model
4.3 A Primer on Matrices
4.4 The Regression Model in Matrix Terms
4.5 Least Squares and Inferences Using Matrices
4.6 ANOVA and Adjusted Coefficient of Determination
4.7 Estimation and Prediction of the Response
5.1 Multicollinearity and Its Effects
5.2 Adding a Predictor Variable
5.3 Outliers and Influential Cases
5.4 Residual Diagnostics
5.5 Remedial Measures
8.3 Ordinal Regression
"The Scientist must set in order. Science is built up with facts, as a house is with stones. But a collection of facts is no more a science than a heap of stones is a house."
- Henri Poincare
When the response variable $Y$ takes on categorical responses that can be put into order, then the multinomial logistic regression discussed in Section 8.2 can still be used.
However, a more effective strategy, yielding a more parsimonious and more easily interpreted model, results if theordering of the categories is taken into account explicitly.
The model that is usually employed is called theproportional odds model
The proportional odds model for ordinal logistic regression models the cumulative probabilities $$ P(Y_i\le j) $$ rather than the specific category probabilities $$ P(Y_i=j) $$ as was the case for nominal logistic regression.
Again, we will skip the details and illustrate with R using the pregnancy data from the last section.
However, a more effective strategy, yielding a more parsimonious and more easily interpreted model, results if the
The model that is usually employed is called the
The proportional odds model for ordinal logistic regression models the cumulative probabilities $$ P(Y_i\le j) $$ rather than the specific category probabilities $$ P(Y_i=j) $$ as was the case for nominal logistic regression.
Again, we will skip the details and illustrate with R using the pregnancy data from the last section.
library(MASS)
dat = read.table("http://users.stat.ufl.edu/~rrandles/sta4210/Rclassnotes/data/textdatasets/KutnerData/Chapter%2014%20Data%20Sets/CH14TA13.txt")
names(dat) = c("id", "Y", "Y1", "Y2", "Y3", "X1", "X2", "X3", "X4", "X5")
#tell R the order of the factors
dat$Y = factor(dat$Y,levels = c("1", "2", "3"), ordered=T)
reg.ord = polr(Y~X1+X2+X3+X4+X5,data=dat)
summary(reg.ord)
Call:
polr(formula = Y ~ X1 + X2 + X3 + X4 + X5, data = dat)
Coefficients:
Value Std. Error t value
X1 0.04887 0.01182 4.133
X2 -1.97601 0.57616 -3.430
X3 -1.36348 0.54648 -2.495
X4 -1.66987 0.47537 -3.513
X5 -1.59154 0.45165 -3.524
Intercepts:
Value Std. Error t value
1|2 2.9301 1.4929 1.9627
2|3 5.0249 1.5445 3.2535
Residual Deviance: 173.5122
AIC: 187.5122
#to get the predicted probability for each observation
preds.ord = fitted(reg.ord)
preds.ord[1:10,]
1 2 3
1 0.056827016 0.27179270 0.67138028
2 0.239742367 0.47950086 0.28075677
3 0.150070798 0.43915298 0.41077622
4 0.560196430 0.35167985 0.08812372
5 0.423470637 0.43299765 0.14353172
6 0.459606967 0.41396070 0.12643233
7 0.009988634 0.06576689 0.92424447
8 0.384644784 0.45082830 0.16452692
9 0.751897195 0.20907082 0.03903198
10 0.121488681 0.40757743 0.47093389
# if we pedict the category with the highest probability:
ind = apply(preds.ord, 1, which.max)
preds.ord.cat = colnames(preds.ord)[ind]
preds.ord.cat
[1] "3" "2" "2" "1" "2" "1" "3" "2" "1" "3" "3" "1" "2" "1" "1" "3" "2" "2" "1" "1" "1"
[22] "1" "1" "2" "1" "1" "3" "3" "1" "2" "2" "2" "2" "2" "3" "2" "2" "3" "2" "2" "2" "2"
[43] "1" "2" "2" "2" "3" "2" "2" "1" "2" "2" "1" "3" "1" "2" "1" "1" "2" "2" "3" "3" "3"
[64] "3" "3" "3" "3" "3" "3" "3" "2" "3" "3" "3" "3" "3" "3" "3" "3" "3" "3" "3" "3" "3"
[85] "3" "3" "3" "3" "2" "3" "2" "3" "3" "3" "3" "2" "2" "3" "2" "1" "3" "2"
#compare to observed categories:
table(preds.ord.cat, dat$Y)
preds.ord.cat 1 2 3
1 13 7 1
2 8 21 7
3 5 7 33