The assignment is due on May 1, 2020 by 17:00
Submit your responses as an R markdown file via Blackboard. Good luck!
In homework 1, you explored some questions about the number of Corona cases in the world. For this homework, I merged that data with some measures of democracy form the V-Dem project1. Below, I will ask you to explore and visualize the data and estimate some regression models as well as analyze your findings. Before starting to work on this assignment:
These days some argue that the Corona pandemic hit democracies harder as democratic institutions do not allow governments to limit the freedom of their citizens. In this assignment, we are going to do some preliminary analysis of this claim.
Let’s start withe taking a look at the data and see what variables are available:
names(rawData)
## [1] "country" "ISOcode" "cases"
## [4] "deaths" "popdata2018" "popdata2018_log"
## [7] "cases_log" "deaths_log" "ccode"
## [10] "X_merge" "country_name" "country_id"
## [13] "v2x_polyarchy" "v2x_libdem" "v2x_partipdem"
head(rawData)
country <fctr> | ISOcode <fctr> | cases <int> | deaths <int> | popdata2018 <int> | popdata2018_log <dbl> | cases_log <dbl> | ||
---|---|---|---|---|---|---|---|---|
1 | United_States_of_America | USA | 609516 | 26057 | 327167434 | 8.514770 | 5.784986 | |
2 | Canada | CAN | 27046 | 903 | 37058856 | 7.568892 | 4.432119 | |
3 | Cuba | CUB | 766 | 21 | 11338138 | 7.054542 | 2.884795 | |
4 | Haiti | HTI | 40 | 3 | 11123176 | 7.046229 | 1.612784 | |
5 | Dominican_Republic | DOM | 3286 | 183 | 10627165 | 7.026417 | 3.516800 | |
6 | Jamaica | JAM | 105 | 5 | 2934855 | 6.467587 | 2.025306 |
Assume that we want use a regression analysis to explore how democracy is associated with the number of deaths caused by Corona virus. Before conducting a regression analysis, it is always a good idea to visualize the variables that you want to analyze and how they are associated.
For example:
summary(rawData$deaths_log)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.4771 1.0414 1.3170 2.0170 4.4159
library(ggplot2)
ggplot(rawData,
aes(x = deaths_log)) +
geom_histogram(bins = 10, col='navy', fill='maroon')+
labs(x = "Corona deaths",
y = "Count",
caption="Data sources: https://www.ecdc.europa.eu/en")
V-Dem project offers different measures of democracy. I picked v2x_polyarch,v2x_libdem, and v2x_partipdem, which are electoral democracy index, liberal democracy index, and participatory democracy index, respectively. Let’s take a look at the distribution of electoral democracy in our sample:
summary(rawData$v2x_polyarchy)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0230 0.3115 0.5580 0.5350 0.7835 0.9000
ggplot(rawData,
aes(x = rawData$v2x_polyarchy))+
geom_histogram(bins = 10, col='navy', fill='maroon')+
labs(x = "Corona deaths",
y = "Count",
caption="https://www.ecdc.europa.eu/en")
Now, let’s see how electoral democracy and corona cases are associated. For visualizing the association between two variables, scatter plot often is the best option:
ggplot(data = rawData, aes(x = v2x_polyarchy, y = deaths_log)) +
geom_point(alpha = 0.6, color = "navy")+
labs(x = "Electoral democracy",
y = "Corona deaths")
In comparative politics literature, there is a debate about on whether we should use continuous or categorical variables to measure democracy. Therefore, it is good to if using a categorical measure of democracy can help us to have a better picture of the association between electoral democracy and Corona cases. To do so, we need to create a new categorical variable:
# If v2x_polyarchy>.5: democracy(1); v2x_polyarchy<.5: nodemocracy(0)
rawData$polyarchy_cat=ifelse(rawData$v2x_polyarchy> .5, "Democracy", "Autocracy")
rawData$polyarchy_cat<-factor(rawData$polyarchy_cat)
First, let’s take a look at the distribution of this new categorical measure of electoral democracy.
ggplot(rawData,
aes(x = rawData$polyarchy_cat))+
geom_bar(col='navy', fill='maroon')+
labs(x = "Democracy",
y = "Count",
caption="Data sources: V-Dem project")
Now, let’s check the association between our new categorical measure of electoral democracy and Corona death cases.
boxplot(rawData$deaths_log~rawData$polyarchy_cat,
col='cyan',
xlab='', ylab='Corona deaths')
After visualizing, now we can statistically evaluate the association between democracy, measured by the electoral democracy index from the V-Dem project, and the number of deaths caused by Corona virus.
OLS_m1=lm(deaths_log~v2x_polyarchy, data=rawData)
summary(OLS_m1)
##
## Call:
## lm(formula = deaths_log ~ v2x_polyarchy, data = rawData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.66666 -0.75001 -0.07312 0.61424 2.92899
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.4632 0.1880 2.463 0.0148 *
## v2x_polyarchy 1.5961 0.3190 5.004 0.00000146 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.009 on 161 degrees of freedom
## Multiple R-squared: 0.1346, Adjusted R-squared: 0.1292
## F-statistic: 25.04 on 1 and 161 DF, p-value: 0.00000146
The results show that there is a positive association between democracy and corona caused death cases- in the following weeks, you will learn that this association is statistically significant. The coefficient equal to 1.596 shows that if the democracy index increases by one unit, the log of Corona death increases by 1.5962.
Now, here are the questions for your assignment:
a. Visualize the distribution of cases_log,v2x_polyarchy,v2x_libdem,v2x_partipdem.
b. Reproduce the below plot. (tips: use ggpairs from GGally package).
c. Define the categorical variables of democracy indices: larger than .5 democracy, otherwise autocracy. Plot the association of these variables with cases_log.
d. Estimate below models and discuss their results:
cases_log=β0+β1v2x_polyarchy+ϵ
cases_log=β0+β1v2x_libdem+ϵ
cases_log=β0+β1v2x_partipdem+ϵ
cases_log=β0+β1v2x_polyarchy+β2v2x_libdem+β3v2x_partipdem+ϵ
Some rightly argue that the population size should be considered when analyzing the effect of democracy on the reported Corona cases and deaths. For example, the US has one of the largest number of cases and deaths, and is a democracy. This push the estimated results toward more support for the claim that democracies are less efficient in fighting Corona. There are two solutions to this problem in regression analysis. First, you can directly add population as one of the control variables:
cases_log=β0+β1v2x_polyarchy+β2popdata2018+ϵ
Another solution is to normalize the number of cases by dividing the number Corona cases by population size, the cases per population, and use it as dependent variable.
Conduct the above solutions and compare the results. Repeat your analysis using popdata2018. Do the results change? How?
log(y2)−log(y1)=β1(x2−x1)⇒log(y2y1)= 1.596; one unit increase in democracy multiplies the cases cases by 39.45↩