8.4 Poisson Regression

"As far as the laws of mathematics refer to reality, they are not certain; and as far as they are certain, they do not refer to reality."
- Albert Einstein

Introduction

We consider now another nonlinear regression model where the response outcomes are discrete.

Poisson regression is useful when the outcome is a count, with large-count outcomes being rare events.

For instance, the number of times a household shops at a particular supermarket in a week is a count, with a large number of shopping trips to the store during the week being a rare event. A researcher may wish to study the relation between a family's number of shopping trips to the store during a particular week and the family's income, number of children, distance from the store, and some other explanatory variables.

As another example, the relation between the number of hospitalizations of a member of a health maintenance organization during the past year and the member's age, income, and previous health status may be of interest.

The Poisson distribution can be utilized for outcomes that are counts ($Y_i = 0, 1,2, ...$ ), with a large count or frequency being a rare event.

8.4.1 The Poisson Distribution

The Poisson probability distribution is $$ f(Y) = \frac{\mu^Y \exp(-\mu)}{Y!} $$ The mean and variance of a Poisson distribution are $$ \begin{align*} E\{Y\} &= \mu\\ \sigma^2\{Y\} &=\mu \end{align*} $$ Note that the variance is the same as the mean.

Hence, if the number of store trips follows the Poisson distribution and the mean number of store trips for a family with three children is larger than the mean number of trips for a family with no children, the variances of the distributions of outcomes for the two families will also differ.

8.4.2 Poisson Regression

We start with the regression model \begin{align*} Y_{i} & =E\left\{ Y_{i}\right\} +\varepsilon_{i}\qquad i=1,2,\ldots,n \end{align*} The mean response for the $i$th case, to be denoted now my $\mu_{i}$ for simplicity, is assumed as always to be a function of the set of predictor variables $X_{1},\ldots,X_{p-1}$.

We use the notation $\mu\left(\textbf{X}_{i},\boldsymbol{\beta}\right)$ to denote the function that relates the mean response $\mu_{i}$ to $\textbf{X}_{i}$, the values of the predictor variables for case $i$, and $\boldsymbol{\beta}$, the values of the regression coefficients.

Some commonly used functions for Poisson regression are \begin{align*} \mu_{i}= & \mu\left(\textbf{X}_{i},\boldsymbol{\beta}\right)=\textbf{X}_{i}^{\prime}\boldsymbol{\beta}\\ \mu_{i}= & \mu\left(\textbf{X}_{i},\boldsymbol{\beta}\right)=\exp\left(\textbf{X}_{i}^{\prime}\boldsymbol{\beta}\right)\\ \mu_{i}= & \mu\left(\textbf{X}_{i},\boldsymbol{\beta}\right)=\ln\left(\textbf{X}_{i}^{\prime}\boldsymbol{\beta}\right) \end{align*} In all three cases, the mean response $\mu_{i}$ must be nonnegative.

Since the distribution of the error terms $\varepsilon_{i}$ for Poisson regression is a function of the distribution of the response $Y_{i}$, which is Poisson, it is easiest to state the Poisson regression model in the following form:
$Y_{i}$ are independent Poisson random variables with expected values $\mu_{i}$ where \begin{align*} \mu_{i} & =\mu\left(\textbf{X}_{i},\boldsymbol{\beta}\right) \end{align*} The most commonly used response function is $\mu_{i}=\exp\left(\textbf{X}_{i}^{\prime}\boldsymbol{\beta}\right)$.

Example 8.4.1

The Miller Lumber Company is a large retailer of lumber and paint, as well as of plumbing, electrical, and other household supplies.

During a representative two-week period, in-store surveys were conducted and addresses of customers were obtained. The addresses were then used to identify the metropolitan area census tracts in which the customers reside.

At the end of the survey period, the total number of customers who visited the store from each census tract within a 10-mile radius was determined and relevant demographic information for each tract (average income, number of housing units, etc.) was obtained.

Several other variables expected to be related to customer counts were constructed from maps, including distance from census tract to nearest competitor and distance to store.

Initial screening of the potential predictor variables was conducted which led to the retention of five predictor variables:
- $X_1$: Number of housing units
- $X_2$: Average income, in dollars
- $X_3$: Average housing unit age, in years
- $X_4$ : Distance to nearest competitor, in miles
- $X_5$: Distance to store, in miles
- $Y_i$ : Number of customers who visited store from census tract

dat = read.table("http://users.stat.ufl.edu/~rrandles/sta4210/Rclassnotes/data/
               textdatasets/KutnerData/Chapter%2014%20Data%20Sets/CH14TA14.txt")
names(dat) = c("Y", "X1", "X2", "X3", "X4", "X5")

reg = glm(Y~., family = "poisson", data=dat)
summary(reg)

Call:
glm(formula = Y ~ ., family = "poisson", data = dat)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-2.93195  -0.58868  -0.00009   0.59269   2.23441  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)  2.942e+00  2.072e-01  14.198  < 2e-16 ***
X1           6.058e-04  1.421e-04   4.262 2.02e-05 ***
X2          -1.169e-05  2.112e-06  -5.534 3.13e-08 ***
X3          -3.726e-03  1.782e-03  -2.091   0.0365 *  
X4           1.684e-01  2.577e-02   6.534 6.39e-11 ***
X5          -1.288e-01  1.620e-02  -7.948 1.89e-15 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 422.22  on 109  degrees of freedom
Residual deviance: 114.99  on 104  degrees of freedom
AIC: 571.02

Number of Fisher Scoring iterations: 4

predict(reg)

       1        2        3        4        5        6        7 
2.512666 2.171006 3.336689 2.129078 1.982460 2.184004 1.458190 
       8        9       10 
2.397791 2.670277 2.453965

« 8.3: Ordinal Regression