Fill in Blanks
Home

8.3 Ordinal Regression

"The Scientist must set in order. Science is built up with facts, as a house is with stones. But a collection of facts is no more a science than a heap of stones is a house."
- Henri Poincare
When the response variable $Y$ takes on categorical responses that can be put into order, then the multinomial logistic regression discussed in Section 8.2 can still be used.

However, a more effective strategy, yielding a more parsimonious and more easily interpreted model, results if the ordering of the categories is taken into account explicitly.

The model that is usually employed is called the proportional odds model

The proportional odds model for ordinal logistic regression models the cumulative probabilities $$ P(Y_i\le j) $$ rather than the specific category probabilities $$ P(Y_i=j) $$ as was the case for nominal logistic regression.

Again, we will skip the details and illustrate with R using the pregnancy data from the last section.
library(MASS)

dat = read.table("http://users.stat.ufl.edu/~rrandles/sta4210/Rclassnotes/data/textdatasets/KutnerData/Chapter%2014%20Data%20Sets/CH14TA13.txt")
names(dat) = c("id", "Y", "Y1", "Y2", "Y3", "X1", "X2", "X3", "X4", "X5")


#tell R the order of the factors
dat$Y = factor(dat$Y,levels = c("1", "2", "3"), ordered=T)

reg.ord = polr(Y~X1+X2+X3+X4+X5,data=dat)
summary(reg.ord)

Call:
polr(formula = Y ~ X1 + X2 + X3 + X4 + X5, data = dat)

Coefficients:
      Value Std. Error t value
X1  0.04887    0.01182   4.133
X2 -1.97601    0.57616  -3.430
X3 -1.36348    0.54648  -2.495
X4 -1.66987    0.47537  -3.513
X5 -1.59154    0.45165  -3.524

Intercepts:
    Value   Std. Error t value
1|2  2.9301  1.4929     1.9627
2|3  5.0249  1.5445     3.2535

Residual Deviance: 173.5122 
AIC: 187.5122 


#to get the predicted probability for each observation
preds.ord = fitted(reg.ord)
preds.ord[1:10,]

             1          2          3
1  0.056827016 0.27179270 0.67138028
2  0.239742367 0.47950086 0.28075677
3  0.150070798 0.43915298 0.41077622
4  0.560196430 0.35167985 0.08812372
5  0.423470637 0.43299765 0.14353172
6  0.459606967 0.41396070 0.12643233
7  0.009988634 0.06576689 0.92424447
8  0.384644784 0.45082830 0.16452692
9  0.751897195 0.20907082 0.03903198
10 0.121488681 0.40757743 0.47093389


# if we pedict the category with the highest probability:
ind = apply(preds.ord, 1, which.max)
preds.ord.cat = colnames(preds.ord)[ind]
preds.ord.cat

  [1] "3" "2" "2" "1" "2" "1" "3" "2" "1" "3" "3" "1" "2" "1" "1" "3" "2" "2" "1" "1" "1"
 [22] "1" "1" "2" "1" "1" "3" "3" "1" "2" "2" "2" "2" "2" "3" "2" "2" "3" "2" "2" "2" "2"
 [43] "1" "2" "2" "2" "3" "2" "2" "1" "2" "2" "1" "3" "1" "2" "1" "1" "2" "2" "3" "3" "3"
 [64] "3" "3" "3" "3" "3" "3" "3" "2" "3" "3" "3" "3" "3" "3" "3" "3" "3" "3" "3" "3" "3"
 [85] "3" "3" "3" "3" "2" "3" "2" "3" "3" "3" "3" "2" "2" "3" "2" "1" "3" "2"


#compare to observed categories:
table(preds.ord.cat, dat$Y)

preds.ord.cat  1  2  3
            1 13  7  1
            2  8 21  7
            3  5  7 33