Some political outcomes only take non-negative integer values, \(Y \in \mathbb{Z}^+\). For example, the number of civilians killed daily/weekly/monthly/yearly in a civil war; number of female candidate at every election; the number of terrorist attacks within a time period. King(1988) shows that estimating a model with count outcomes using OLS leads to
nonsensical predictions (negative predicted counts)
Heteroskedasticity and thus inefficient standard errors
Thus, do not use OLS for count models, though labor economists have a different view (Angrist & Pischke 2009).
These types of outcomes are often modeled using a Poisson distribution, \(Y \sim Poisson(\lambda)\). Parameter \(\lambda\) here simply present the expected number of occurrences. Does \(\lambda\) need to be an integer value?
The probability function of Poisson distribution is:
\[\begin{equation} Poisson(\lambda)=\frac{\lambda^k e^{-\lambda}}{k!} \end{equation}\]The mean and variance of this distribution are \(\lambda>0\). There is a straightforward interpretation for \(\lambda\): the expected number of events in any particular time period t. Poisson distribution is a memoriless distribution; that is, the occurrence of an event is independent.
x = 0:20; pdf = dpois(x, 4)
yrange <- range(dpois(x, 2),dpois(x, 4),dpois(x, 7))
plot(range(x), yrange, type="n", lwd=3, col="blue",
main=TeX('Poisson Distributions for different values of $\\lambda$'))
lines(x, dpois(x, 2), type="b", lwd=2, col='blue')
lines(x, dpois(x, 4), type="b", lwd=2, col='orange')
lines(x, dpois(x, 7), type="b", lwd=2, col='maroon')
lines(x, dpois(x, 10), type="b", lwd=2, col='green')
legend("topright", legend=c(TeX('$\\lambda$=2'),
TeX('$\\lambda$=4'),
TeX('$\\lambda$=7'),
TeX('$\\lambda$=10')),
col=c("blue","orange","maroon","green"),
lty=1, box.lty=0, cex=1,lwd=2)
GLM method extend the linear estimation method to variables that are not normally distributed. Although GLM helps us to model non-linear outcomes, it is made up of a linear estimation:
\[\begin{equation} \eta_i=\beta_0+\beta_1x_{1i}+\dots+\beta_px_{pi} \end{equation}\]and two functions:
where \(\phi\) is a constant parameter that measures dispersion.
The likelihood function of a Poisson distribution, assuming that \(log(\lambda_i)=x_i' \beta \equiv \lambda_i=exp(x_i' \beta)\), can be written as follow:
\[\begin{align} P(\lambda|Y)=\prod_{i=1}^{N}\frac{e^{-\lambda} \lambda^Y}{Y!} \\ ln L(\lambda|Y)=ln[\prod_{i=1}^{N} \frac{e^{-e^{X\beta}}e^{(X\beta) Y}}{Y!}] \end{align}\]We can re-arrange the above equation:
\[\begin{align} ln L(\lambda|Y)=\sum_{i=1}^N [-e^{X\beta}+Y(X\beta)-log(Y!)] \end{align}\]This maximum likelihood model can be estimated fairly easy by numerical optimization of the above likelihood model.
The signs of the vector \(beta\) indicate the effect of \(X\) on \(\lambda\):
Positive signs indicate a positive effect
Negative signs indicate a negative effect
The above link function/log-transformation resolves the problem of negative predicted values. Cameron and Trivedi (2005) also recommend using robust standard errors to correct for heteroskedasticity when we use Poisson model for estimation.
Here, we use the data from “Armed intervention and civilian victimization in intrastate conflicts” by Wood, Kathman, and Gent. The full replication data is available here: https://www.prio.org/JPR/Datasets
WKGdata=read.csv("https://raw.githubusercontent.com/babakrezaee/MethodsCourses/master/DataSets/WKG_JPR_2012.csv")
hist(WKGdata$rebbest2010, col="grey", border="maroon",
main="Histogram for Conflict Deaths (SCAD data set: modified)",
xlab="Deaths",
breaks = 0:summary(WKGdata$rebbest2010)[6])
library(glmnet)
poisson_ml <- glm(rebbest2010 ~ 1 , family="poisson", data=WKGdata)
summary(poisson_ml)
##
## Call:
## glm(formula = rebbest2010 ~ 1, family = "poisson", data = WKGdata)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -12.77 -12.77 -12.77 -7.27 514.44
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 4.401508 0.003252 1353 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 596617 on 1158 degrees of freedom
## Residual deviance: 596617 on 1158 degrees of freedom
## (7 observations deleted due to missingness)
## AIC: 598550
##
## Number of Fisher Scoring iterations: 8
exp(poisson_ml$coefficients[[1]]+0)
## [1] 81.57377
poissonModel <- glm(rebbest2010 ~ lnintv_ratiolag, family="poisson", data=WKGdata)
summary(poissonModel)
##
## Call:
## glm(formula = rebbest2010 ~ lnintv_ratiolag, family = "poisson",
## data = WKGdata)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -15.646 -11.286 -11.286 -3.631 254.864
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 4.153921 0.004455 932.50 <2e-16 ***
## lnintv_ratiolag -0.126376 0.003725 -33.92 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 230394 on 824 degrees of freedom
## Residual deviance: 229372 on 823 degrees of freedom
## (341 observations deleted due to missingness)
## AIC: 231038
##
## Number of Fisher Scoring iterations: 7
exp(poissonModel$coefficients[[1]]+0)
## [1] 63.68323
exp(poissonModel$coefficients[[1]]+poissonModel$coefficients[[2]]*1)
## [1] 56.12299
The results show that each unit of intervention decreases the average number of rebel violence against civilians by 7.5602458.
Here, \(\beta=log(\lambda x+1)-log(\lambda x)\).
After estimating your Poisson regression, you can predict outcomes using any dataset containing the features used in the estimated model.
newData=data.frame(lnintv_ratiolag=3)
predict(poissonModel,newdata = newData, type='response')
## 1
## 43.58852
library(jtools)
plot_summs(poissonModel, scale=TRUE, exp = FALSE,
model.names = c("Poisson"),
coefs = c("log(Intervention)" = "lnintv_ratiolag"))
We can also compare the coefficients of two models:
linearModel <- lm(rebbest2010 ~ lnintv_ratiolag, data=WKGdata)
plot_summs(poissonModel,linearModel, scale=TRUE, exp = FALSE,
model.names = c("Poisson","OLS"),
coefs = c("log(Intervention)" = "lnintv_ratiolag"))
library(sjPlot)
library(ggplot2)
plot_model(poissonModel, type = "pred", terms = "lnintv_ratiolag")+xlab('log(Intervention)')+ylab('Rebel violence')
See here for more info on plotting marginal effects: https://cran.r-project.org/web/packages/sjPlot/vignettes/plot_marginal_effects.html
To compute robust standard errors, as suggested by Cameron and Trivedi (2005), we can use \({\tt sandwich}\) library:
poissonModel <- glm(rebbest2010 ~ lnintv_ratiolag, family="poisson", data=WKGdata)
library(sandwich)
cov.pois <- vcovHC(poissonModel, type="HC0")
# HC0 are White's correction
se <- sqrt(diag(cov.pois))
robust <- cbind(estimate=poissonModel$coefficients, se,
pvalue=round(2*(1-pnorm(abs(poissonModel$coefficients/se))),
5), lower=poissonModel$coefficients-1.96*se,
upper=poissonModel$coefficients+1.96*se)
robust
## estimate se pvalue lower upper
## (Intercept) 4.1539213 0.1943300 0.00000 3.7730345 4.5348082
## lnintv_ratiolag -0.1263758 0.1750579 0.47035 -0.4694894 0.2167377
One of the major assumptions of Poisson distribution/model is that the mean and variance of the data are equal: \(mean(Y)=Var(Y)=\lambda\)
However, when we evaluate this assumption, it is rejected:
mean(WKGdata$rebbest2010, na.rm=TRUE)
## [1] 81.57377
var(WKGdata$rebbest2010,na.rm=TRUE)
## [1] 751373.2
When the variance is greater than the mean of the distribution, there is overdispersion. MLE is still consistent when there is overdispersion, but the estimates will be:
Inefficient with downward biased standard errors
Large Z-scores and false positives
Thus, we use a Negative Binomial MLE for overdispersed, and underdispersed, count data.
Question: what is an underdispersed distribution?
We can statistically test dispersiontest using \(\mathcal{R}\) package \({\tt AER}\).
library(AER)
poissonModel <- glm(rebbest2010 ~ lnintv_ratiolag, family="poisson", data=WKGdata)
dispersiontest(poissonModel,trafo=1)
##
## Overdispersion test
##
## data: poissonModel
## z = 1.3384, p-value = 0.09039
## alternative hypothesis: true alpha is greater than 0
## sample estimates:
## alpha
## 1762.699
In the Poisson regression, we assumed that \(\lambda=e^{X\beta}\). We can add a second component to \(X\beta\). This component is a random effct component,, to capture:
The combined effects of omitted variables, causing misspecification
Random errors associated with \(\lambda\), but uncorrelated with \(X\).
Thus, the link function of a negative binomial would be as follow:
\[\begin{align} \tilde{\lambda_i}=e^{\beta X_i+\epsilon_i}=\lambda_i e^\epsilon_i \end{align}\]Now, we need to specify the distribution of \(\epsilon\) to identify the above model. It is common to use the Gamma distribution for \(\epsilon\). There are two main reasons for it:
The distribution is defined over continuous and non-negative values.
This distribution leads to a nice closed form solution.
If we assume that \(\epsilon\) is distributed as a 1-parameter Gamma distribution with \(E(\epsilon)=1\) and \(Var(\epsilon)=\frac{1}{\alpha}\), then the marginal density of Y distributed as a Negative Binomial model is:
\[\begin{equation} P(Y|\lambda,\alpha)=[\frac{\Gamma(\alpha^{-1}+Y)}{\Gamma(\alpha^{-1})\Gamma(Y+1)}][\frac{\alpha^{-1}}{\alpha^{-1}+\lambda}]^{\alpha^{-1}}[\frac{\lambda}{\lambda+\alpha^{-1}}]^Y \end{equation}\]Under this assumption, \(E(Y)=\lambda\), but \(Var(Y)=\lambda(1+\alpha \lambda)\):
The variance still is proportional to \(\lambda\), depending on \(\alpha\), which is the over dispersion parameter.
If \(\alpha \rightarrow 0\), then \(Var(Y) \rightarrow \lambda\), so the model converge to a Poisson distribution.
library(MASS)
negBinomModel=glm.nb(formula = rebbest2010 ~ lnratio_scale, data = WKGdata)
summary (negBinomModel) # Model summary
##
## Call:
## glm.nb(formula = rebbest2010 ~ lnratio_scale, data = WKGdata,
## init.theta = 0.05230984157, link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.9534 -0.8967 -0.8625 -0.1485 4.2675
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 5.1801 0.2478 20.902 < 2e-16 ***
## lnratio_scale 0.2766 0.0782 3.538 0.000404 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(0.0523) family taken to be 1)
##
## Null deviance: 578.51 on 932 degrees of freedom
## Residual deviance: 564.32 on 931 degrees of freedom
## (233 observations deleted due to missingness)
## AIC: 5017
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 0.05231
## Std. Err.: 0.00339
##
## 2 x log-likelihood: -5010.98200
y_hat=predict (negBinomModel, WKGdata, type="response") # predict on new data
We also can use Likelihood Ratio Test: \(LR = 2(lnL1-lnL2)\) for overdispersion. Estimate model 1 as Negative Binomial, then estimate model 2 as Poisson with the same specification as model 1 (although it will naturally be without the dispersion parameter):
#Perform likelihood ratio test
m1=glm(rebbest2010 ~ lnratio_scale, family="poisson", data=WKGdata)
m2=glm.nb(formula = rebbest2010 ~ lnratio_scale, data = WKGdata)
lrtest(m2, m1)
## Warning in modelUpdate(objects[[i - 1]], objects[[i]]): original model was
## of class "negbin", updated model is of class "glm"
## Likelihood ratio test
##
## Model 1: rebbest2010 ~ lnratio_scale
## Model 2: rebbest2010 ~ lnratio_scale
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 3 -2505
## 2 2 -265690 -1 526369 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Poisson is nested within Negative Binomial (\(\alpha=0\)): the null hypothesis is that the restricted model (i.e., Poisson) is a better fit of the data.
Sometimes count data will have a preponderance of zeros and very few observed counts. Zero-inflated models two distinct phases of the data generating process:
We can model these two stages as follow:
\[\begin{align} Pr(Y=0|X,Z)&=[1-\Phi(Z\gamma)]+\Phi(Z\gamma)[e^-\lambda] \\ Pr(Y=j>0|X,Z)&=[\Phi(Z\gamma)][\frac{e^{-\lambda} \lambda^Y}{Y!}] \end{align}\]where \(\Phi\) is either the Logit or Probit probability function.
\(Z \gamma\) describes how a set of independent variables predicts whether a unit move from zero into the Poisson or Negative Binomial process
\(\lambda\) is either the parameter from the Poisson or Negative Binomial estimator that we can set to (\(X\beta\)) or (\(X\beta+\epsilon\))
The \({\tt pscl}\) package contains a command to estimate zero inflated count models:
library(pscl)
# zeroinfl(formula, data, dist=c("poisson", "negbin", "geometric"), link=c("logit", "probit", "cloglog"))
ZIP=zeroinfl(formula = rebbest2010 ~ lnratio_scale, data = WKGdata, dist=c("poisson"), link=c("logit"))
summary(ZIP)
##
## Call:
## zeroinfl(formula = rebbest2010 ~ lnratio_scale, data = WKGdata,
## dist = c("poisson"), link = c("logit"))
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## -0.7576 -0.6734 -0.6371 -0.3330 97.9662
##
## Count model coefficients (poisson with log link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 6.651260 0.004435 1499.6 <2e-16 ***
## lnratio_scale 0.394998 0.001895 208.5 <2e-16 ***
##
## Zero-inflation model coefficients (binomial with logit link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.98692 0.12612 7.825 5.08e-15 ***
## lnratio_scale 0.06297 0.03901 1.614 0.106
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Number of iterations in BFGS optimization: 15
## Log-likelihood: -1.476e+05 on 4 Df
vuong(m2, ZIP)
## NA or numerical zeros or ones encountered in fitted probabilities
## dropping these 13 cases, but proceed with caution
## Vuong Non-Nested Hypothesis Test-Statistic:
## (test-statistic is asymptotically distributed N(0,1) under the
## null that the models are indistinguishible)
## -------------------------------------------------------------
## Vuong z-statistic H_A p-value
## Raw 11.86509 model1 > model2 < 2.22e-16
## AIC-corrected 11.86579 model1 > model2 < 2.22e-16
## BIC-corrected 11.86749 model1 > model2 < 2.22e-16
In the vuong test, the null hypothesis is that the models are similar so rejection of the null would indicate that the Zero Inflated model is a better fit of the data
Copyright 2019. This is an in-progress project, please do not cite or reproduce without my permission.↩