8.2 Multinomial Logistic Regression

"Never tell me the odds!" - Han Solo

8.2.1 Nominal Response

Logistic regression is most frequently used to model the relationship between a dichotomous response variable and a set of predictor variables.

On occasion, however. the response variable may have more than two levels.

Logistic regression can still be employed by means of a polytomous-or multinomial-logistic regression model.

Multinomial logistic regression models are used in many fields. In business, for instance, a market researcher may wish to relate a consumer's choice of product (product A, product B. product C) to the consumer's age, gender, geographic location and several other potential explanatory variables.

This is an example of nominal multinomial regression because the response categories are purely qualitative and not ordered in any way.

Ordinal response categories can also be modeled using multinomial regression.

For example, the relation between severity of disease measured on an ordinal scale (mild, moderate, severe) and age of patient, gender of patient, and some other explanatory variables may be of interest.

The response variable $Y$ must be coded in the same way as indicator variables except we will need an indicator variable for each category of $Y$.

Instead of going through the details, we will illustrate with R.

Example 8.2.1

A study was undertaken to determine the strength of association between several risk factors and the duration of pregnancies.

The risk factors considered were mother's age, nutritional status, history of tobacco use, and history of alcohol use.

The response of interest, pregnancy duration, is a three-category variable that was coded as follows: \begin{align*} Y_{i} &\quad \textbf{Pregnancy Duration Category}\\ 1 &\quad \text{Preterm (less than 36 weeks)}\\ 2 &\quad \text{Intermediate term (36 to 37 weeks)}\\ 3 &\quad \text{Full term (38 weeks or greater)} \end{align*} Relevant data for 102 women who had recently given birth at a large metropolitan hospital were obtained.

X variables include:
- Nutritional status ($X_1$), is an index of nutritional status (higher score denotes better nutritional status).
- Age was categorized into three groups: less than 20 years of age (coded 1), from 21 to 30 years of age (coded 2), and greater than 30 years of age (coded 3). It is represented by two indicator variables (X2 and X3): \begin{align*} X_{2} &\quad X_{3} & \textbf{Class}\\ 1 &\quad 0 & \text{Less than or equal to 20}\\ 0 &\quad 0 & \text{21 to 30}\\ 0 &\quad 1 & \text{Greater than 30} \end{align*} - Alcohol ($X_4$) and smoking history ($X_5$) were also qualitative predictors; the categories were "Yes" (coded 1) and "No" (coded 0).

dat = read.table("http://users.stat.ufl.edu/~rrandles/sta4210/Rclassnotes/data/
      textdatasets/KutnerData/Chapter%2014%20Data%20Sets/CH14TA13.txt")
names(dat) = c("id", "Y", "Y1", "Y2", "Y3", "X1", "X2", "X3", "X4", "X5")

head(dat)

  id Y Y1 Y2 Y3  X1 X2 X3 X4 X5
1  1 1  1  0  0 150  0  0  0  1
2  2 1  1  0  0 124  1  0  0  0
3  3 1  1  0  0 128  0  0  0  1
4  4 1  1  0  0 128  1  0  0  1
5  5 1  1  0  0 133  0  0  1  1
6  6 1  1  0  0 130  0  0  1  1

Note that the Y variable is already coded with the variables Y1, Y2, and Y3 in the dataset. We do not have to do this ourselves, R will code the indicator variables for us if we tell R the variable is a factor.

library(nnet)

reg = multinom(factor(Y)~X1+X2+X3+X4+X5, data=dat)
summary(reg)

Call:
multinom(formula = factor(Y) ~ X1 + X2 + X3 + X4 + X5, data = dat)

Coefficients:
  (Intercept)         X1          X2         X3         X4         X5
2   -1.517009 0.01897219 -0.04355881 -0.1721573 -0.9758857 -0.2218629
3   -5.475353 0.06542052 -2.95693397 -2.0596355 -2.0428964 -2.4522828

Std. Errors:
  (Intercept)         X1        X2        X3        X4        X5
2    2.047984 0.01631570 0.7373261 0.6943946 0.5869073 0.5856117
3    2.271691 0.01823932 0.9644844 0.8947673 0.7097457 0.7315046

Residual Deviance: 168.6754 
AIC: 192.6754

Note in the summary, we have two regression fits. What we actually have done is model the odds of the probability of being in category 2 to the probability of being in category 1.

We also modeled the odds of the probability of being in category 3 to the probability of being in category 1.

So we call category 1 the baseline category.

To get the effect on the odds ratio for each coefficient:

exp(coef(reg))

  (Intercept)       X1         X2        X3        X4         X5
2 0.219366942 1.019153 0.95737625 0.8418467 0.3768584 0.80102519
3 0.004188751 1.067608 0.05197804 0.1275004 0.1296526 0.08609682

We can use a different baseline category such as full term:

dat$Y = as.factor(dat$Y)
dat$Y = relevel(dat$Y, ref = "3")

reg.nom = multinom(factor(Y)~X1+X2+X3+X4+X5, data=dat)
summary(reg.nom)

Call:
multinom(formula = factor(Y) ~ X1 + X2 + X3 + X4 + X5, data = dat)

Coefficients:
  (Intercept)          X1       X2       X3       X4       X5
1    5.475147 -0.06541919 2.957028 2.059662 2.042900 2.452362
2    3.958370 -0.04644903 2.913475 1.887550 1.067001 2.230492

Std. Errors:
  (Intercept)         X1        X2        X3        X4        X5
1    2.271677 0.01823916 0.9644921 0.8947727 0.7097461 0.7315106
2    1.941063 0.01488581 0.8575544 0.8088255 0.6495262 0.6681955

Residual Deviance: 168.6754 
AIC: 192.6754 


#to get the predicted probability for each observation
preds.nom = fitted(reg.nom)
preds.nom[1:10,]

            3           1          2
1  0.62078219 0.094217907 0.28499991
2  0.18455857 0.254210377 0.56123105
3  0.34297651 0.219535548 0.43748794
4  0.02716468 0.334553819 0.63828151
5  0.13335384 0.474690286 0.39195587
6  0.11480795 0.497290719 0.38790133
7  0.95144461 0.009569423 0.03898596
8  0.09738287 0.335195611 0.56742152
9  0.02775016 0.658559402 0.31369044
10 0.40441329 0.186642234 0.40894447


# if we pedict the category with the highest probability:
ind = apply(preds.nom, 1, which.max)
preds.nom.cat = colnames(preds.nom)[ind]
preds.nom.cat

  [1] "3" "2" "2" "2" "1" "1" "3" "2" "1" "2" "3" "1" "2" "1" "1" "3" "2" "2" "1" "1" "1"
 [22] "1" "1" "2" "1" "2" "3" "3" "2" "2" "2" "2" "2" "2" "3" "2" "2" "2" "2" "3" "2" "2"
 [43] "2" "2" "2" "2" "3" "2" "1" "2" "2" "1" "2" "3" "2" "2" "2" "1" "3" "1" "3" "3" "3"
 [64] "3" "3" "3" "3" "3" "3" "3" "3" "3" "3" "3" "3" "3" "3" "3" "3" "3" "3" "3" "3" "3"
 [85] "2" "3" "3" "3" "1" "3" "3" "3" "3" "3" "2" "2" "2" "2" "1" "1" "3" "1"



#compare to observed categories:
table(preds.nom.cat, dat$Y)

preds.nom.cat  3  1  2
            1  4 12  4
            2  5 10 23
            3 32  4  8

« 8.1: Logistic Regression 8.3: Ordinal Regression »