Tobler (1970: 236): "I invoke the first law of geography: everything is related to everything else, but near things are more related than distant things."
Spatial Analysis has been used widely by political scientists in different sub-fields for answering a wide variety of questions. Social scientists including political scientists employ spatial analysis in their analysis to either study/evaluate the spillover of a political outcome/mechanism from a neighboring country (Salehyan & Gleditsch 2006; Braithwaite, 2006), or control the spatial correlation and its adverse effect on the estimations of regression models. Below, I briefly discuss how to use STATA to present your spatial data and conduct spatial analysis. This note is in-progress; so if you found any errors, or you had any comments or suggestions, please email me srezaeed@ASU.edu.
The data that I discuss below is a simplified version of the data used in "Food Price Volatility and Rebel Violence in Africa" (RezaeeDaryakenari, Landis, and Thies forthcomin)
The topics discussed in this note are as follow:
## Loading stata kernel in jupyter notebook, using python code. IGNORE this part if you are using STATA.
import ipystata
from ipystata.config import config_stata
config_stata('C:\Program Files (x86)\Stata14\StataSE-64.exe')
from IPython.display import display, HTML
display(HTML(data="""
<style>
div#notebook-container { width: 95%; }
div#menubar-container { width: 65%; }
div#maintoolbar-container { width: 99%; }
</style>
"""))
First, install follwoing STATA user-written packages:
%%stata
ssc install geoinpoly, replace
ssc install sppack, replace
ssc install spwmatrix, replace
ssc install splagvar, replace
*install spatgsa using findit spatgsa
Second, download required datasets from here
Then, set the wordking directory and some options to get (visually) cleaner results.
%%stata
* Set your working directory
qui cd "C:\Users\Babak-Lenovo2017\Dropbox\ASU-Babak\PoliSci\POS 604\Babak_CMPS"
clear
set cformat %5.4f
set pformat %5.3f
set more off
Assume that we want to study how civilian victimization/ violence against civilians are affected by the variations in food price, as is explored in RezaeeDaryakenari et al. (2018). I start with loading and preparing the data for the dependent variable, i.e. civilian victimzation by rebel groups/non-state actors from The Armed Conflict Location & Event Data Project (ACLED) dataset.
%%stata
***Open tha ACLED's raw data: ACLED Version 6 All Africa 1997-2015_csv_monadic.csv
import delimited "ACLED Version 6 All Africa 1997-2015_csv_dyadic.csv", clear
**limiting data to violence aganist civilians by non-state actors
keep if event_typ=="Violence against civilians"
keep if inter1==2 | inter1==3 | inter1==4
** Get year, month, and day from the event_date field
split event_date, p(/)
gen Day=real(event_date1)
gen Month=real(event_date2)
gen Year=real(event_date3)
** Most packages use y and x for the geographic coordinates, so I rename latitude and longitude
rename latitude y
rename longitude x
save "ACLED_ViolenceCivilians.dta" , replace
I want to present spatial/geographic distribution of civilian victimization. To do so, I need a map.
I load a shape file of Africa at 1st adminstration level to STATA and change its format from shape to STATA .dta.
The command that does this for us is shp2dta, as follow:
%%stata
shp2dta using "Africa_admin1", ///
database(AfricaAdmin) coordinates(coor) genid(id) gencentroids(centroid) replace
use AfricaAdmin, clear
* describe
%%stata
spmap using coor.dta, id(id) fcolor(gold) ocolor(maroon) osize(thin)
graph export GoMaroonNGold.pdf, replace
%%stata
format Cultivated %5.1f
spmap Cultivated using coor, id(id) fcolor(BuRd) ///
ocolor(black ..) osize(thin ..) legend(position(2)) ///
legtitle("Cultivated lands %") clmethod(quantile)
%%stata
spmap Cultivated using coor, id(id) fcolor(BuRd) ///
ocolor(black ..) osize(thin ..) legend(position(2)) ///
legtitle("Cultivated lands %") clmethod(quantile) ///
label( xcoord(x_centroid) ///
ycoord(y_centroid) label(HASC_1) color(black) size(*0.3))
Now, I want to show the locations that rebels attacked/targeted civilians on our map. You can use point as an option to load any geo-coded event and add it to your base map.
%%stata
spmap using "coor.dta", id(id) ///
fcolor(gold) ocolor(black) osize(thin) ///
point(data("ACLED_ViolenceCivilians.dta") x(x) y(y) ///
size(*0.8) fcolor(maroon) ocolor(white) osize(vvthin)) ///
title("Civilian Victimization by non-State Actors 1997-2015") ///
subtitle("ACLED Dataset")
%%stata
** Specific year
** limit to year 2000 (as an example)
qui use "ACLED_ViolenceCivilians.dta", clear
qui keep if Year==2000
qui save "ACLED_ViolenceCivilians_2000.dta", replace
use AfricaAdmin, clear
spmap using "coor.dta", id(id) ///
fcolor(gs12) ocolor(white) osize(thin) ///
point(data("ACLED_ViolenceCivilians_2000.dta") x(x) y(y) ///
size(*0.8) fcolor(black) ocolor(red) osize(vvthin)) ///
title("Civilian Victimization by non-State Actors in 2000")
%%stata
spmap Cultivated using "coor.dta", id(id) ///
fcolor(Blues) ocolor(black) osize(thin) clmethod(quantile) ///
point(data("ACLED_ViolenceCivilians.dta") x(x) y(y) ///
size(0.8) fcolor(gold) ocolor(maroon) ///
osize(vvthin)) legend(position(8)) ///
legtitle("Cultivated lands %")
graph export RezaeeEtAl2018.pdf, replace
%%stata
forvalues i=1997(1)2015 {
qui use "ACLED_ViolenceCivilians.dta" , clear
qui keep if Year==`i'
qui save ACLED_ViolenceCivilians_`i'.dta, replace
display as text "year " `i' ": done"
}
I want to mark each conflict event in year t with its administration ID based on coor.dta file that I made using shp2dta
%%stata
use ACLED_ViolenceCivilians_1997.dta, clear
* Match the points to census tract polygons
geoinpoly y x using "coor.dta"
collapse (count) inter1, by(_ID Year)
rename inter1 Violence_count
label var Violence_count "Violence (count)"
save "ACLED_ViolenceCivilians_1997_count.dta", replace
forvalues i=1997(1)2015 {
use ACLED_ViolenceCivilians_`i'.dta, clear
geoinpoly y x using "coor.dta"
collapse (count) inter1, by(_ID Year)
rename inter1 Violence_count
label var Violence_count "Violence (count)"
save "ACLED_ViolenceCivilians_`i'_count.dta", replace
rm "ACLED_ViolenceCivilians_`i'.dta"
display as text " year " `i' ": done"
}
use "ACLED_ViolenceCivilians_1997_count.dta", clear
forvalues i=1998(1)2015 {
append using "ACLED_ViolenceCivilians_`i'_count.dta"
rm "ACLED_ViolenceCivilians_`i'_count.dta"
}
rename _ID id
save "ACLED_ViolenceCivilians_count.dta", replace
Now, we need to merge all data sets.
***Base data
use AfricaAdmin, clear
* Changing upper-case names to lower-case names
foreach v of varlist _all {
capture rename `v' `=lower("`v'")'
}
gen Year=1997
replace Year=2015 if objectid==65
xtset objectid Year
tsfill, full
xtset objectid Year
forvalues i=-50(1)50 {
replace iso=iso[_n+`i'] if objectid[_n]==objectid[_n+`i'] & iso==""
replace name_0=name_0[_n+`i'] if objectid[_n]==objectid[_n+`i'] & name_0==""
replace type_1=type_1[_n+`i'] if objectid[_n]==objectid[_n+`i'] & type_1==""
replace engtype_1=engtype_1[_n+`i'] if objectid[_n]==objectid[_n+`i'] & engtype_1==""
replace iso_2=iso_2[_n+`i'] if objectid[_n]==objectid[_n+`i'] & iso_2==""
replace cultivated=cultivated[_n+`i'] if objectid[_n]==objectid[_n+`i'] & cultivated==.
replace area=area[_n+`i'] if objectid[_n]==objectid[_n+`i'] & area==.
replace x_centroid=x_centroid[_n+`i'] if objectid[_n]==objectid[_n+`i'] & x_centroid==.
replace y_centroid=y_centroid[_n+`i'] if objectid[_n]==objectid[_n+`i'] & y_centroid==.
replace id=id[_n+`i'] if objectid[_n]==objectid[_n+`i'] & id==.
}
***Now merge the prepared datasets with main dataset
*Violence aganst civilian data (This data comes from ACLED, but I aggregated it by month to 1st admin level)
joinby id Year using "ACLED_ViolenceCivilians_count.dta", unmatched(master)
drop _merge
replace Violence_count=0 if Violence_count==.
** Nightlight data
joinby objectid Year using "Africa_admin1_Nightlight.dta", unmatched(master)
drop _merge
** Population data
joinby objectid Year using "Africa_admin1_Population.dta", unmatched(master)
drop _merge
by objectid: ipolate Population_mean Year, gen(Population_mean_ipolate)
** Road denisty
joinby objectid using "Africa_admin1_RoadDensity.dta", unmatched(master)
drop _merge
joinby objectid using "Africa_admin1_Diamond.dta", unmatched(master)
drop _merge
joinby objectid using "Africa_admin1_InfantMR.dta", unmatched(master)
drop _merge
joinby objectid using "Africa_admin1_MineralSources.dta", unmatched(master)
drop _merge
joinby objectid using "Africa_admin1_Petroleum.dta", unmatched(master)
drop _merge
replace Petroleum=0 if Petroleum==.
** Local Foof price
joinby objectid Year using "RCK2015_Price_yearly.dta", unmatched(master)
drop _merge
xtset objectid Year
gen Violence_count_1=l.Violence_count
save "MainDataAllMerged.dta", replace
%%stata
use "MainDataAllMerged.dta", clear
set matsize 1000
spwmatrix gecon x_centroid y_centroid if Year==1997, wname(W) wtype(inv) cart row
$$\mathcal{I}=\frac{n \sum_i \sum_{j\neq i} w_{ij}(y_i-\bar{y})(y_j-\bar{y})}{(\sum_i \sum_{j \neq i} w_{ij}) \sum_i (y_i-\bar{y})^2}$$
%%stata
keep if Year==1997
spatgsa Violence_count, weights(W) moran
Based on these results, we can reject the null hypothesis that there is zero spatial autocorrelation present in the variable Violence if p_value< alpha = .05 (or, 1%, 10%).
$$ y_i=\textbf{x}_i \beta +\epsilon_i $$
$$ \epsilon_i=\phi \textbf{w}_i y_{-i}+u_i $$
$$ y_i=\textbf{x}_i \beta +\textbf{w}_i y_{-i}+u_i $$
use "MainDataAllMerged.dta", clear
keep if Year==1997
quietly splagvar Violence_count, wname(W) wfrom(Stata) moran(Violence_count)
save "MainDataAllMergedWeight_1997_1.dta", replace
************************************** NOTE: it takes about 26 minutes to calculate these spatially weighted variables **************** You may skip to the point NEXT
timer on 1
forvalues i=1997(1)2015 {
use "MainDataAllMerged.dta", clear
keep if Year==`i'
quietly splagvar Violence_count, wname(W) wfrom(Stata) moran(Violence_count)
capture noisily splagvar Violence_count_1 , wname(W) wfrom(Stata) moran(Violence_count_1)
save "MainDataAllMergedWeight_`i'.dta", replace
display as text "year " `i' ": done"
}
timer off 1
timer list 1
* Append all and start estimations
use "MainDataAllMergedWeight_1997.dta", clear
rm "MainDataAllMergedWeight_1997.dta"
forvalues i=1998(1)2015 {
append using "MainDataAllMergedWeight_`i'.dta"
rm "MainDataAllMergedWeight_`i'.dta"
}
save "MainDataAllMergedWeight.dta", replace
Using data to merge the remaining data sets
%%stata
use "MainDataAllMergedWeight.dta", clear
*** Retrieve ccode using country name
rename name_0 country
qui do CountryToCcode
***
qui xtset objectid Year
joinby ccode Year using "p4v2015.dta", unmatched(master)
drop _merge
joinby objectid using "Africa_admin1_GrowingSeason.dta", unmatched(master)
drop _merge
****
qui gen comm1_log=log(comm1)
qui gen comm1_log_1=l.comm1_log
qui replace cultivated=cultivated/100
qui xtile cultivated_quart = cultivated, nq(4)
qui gen Violence_dum=0
qui replace Violence_dum=1 if Violence_count>0
qui replace area=area/1000000
label variable Violence_dum "Violence(dummy)"
label variable Violence_count "Violence"
label variable wy_Violence_count_1 "Spatially weighted lag violence"
label variable polity2 "Polity"
label variable Nightlight_mean "Stable nightlight"
label variable RoadDensity_mean "Road density"
label variable Diamond "Diamond mines"
label variable InfantMortality_mean "Infant mortality"
label variable Mineral "Mineral mines"
label variable Petroleum "Petroleum fields"
label variable Population_mean_ipolate "Population"
label variable area "District size"
label variable cultivated "Cultivated land"
label variable comm1_log "Local food price"
%%stata
eststo clear
qui eststo: logit Violence_dum Violence_count_1 wy_Violence_count_1 c.comm1_log_1 ///
polity2 c.polity2#c.polity2 Nightlight_mean RoadDensity_mean Diamond InfantMortality_mean Mineral Petroleum Population_mean_ipolate area, r cluster(objectid)
qui eststo: logit Violence_dum Violence_count_1 c.comm1_log_1 ///
polity2 c.polity2#c.polity2 Nightlight_mean RoadDensity_mean Diamond InfantMortality_mean Mineral Petroleum Population_mean_ipolate area, r cluster(objectid)
qui eststo: logit Violence_dum Violence_count_1 wy_Violence_count_1 c.cultivated##c.comm1_log_1 ///
polity2 c.polity2#c.polity2 Nightlight_mean RoadDensity_mean Diamond InfantMortality_mean Mineral Petroleum Population_mean_ipolate area, r cluster(objectid)
qui eststo: logit Violence_dum Violence_count_1 c.cultivated##c.comm1_log_1 ///
polity2 c.polity2#c.polity2 Nightlight_mean RoadDensity_mean Diamond InfantMortality_mean Mineral Petroleum Population_mean_ipolate area, r cluster(objectid)
qui eststo: logit Violence_dum Violence_count_1 wy_Violence_count_1 c.comm1_log_1##cultivated_quart ///
polity2 c.polity2#c.polity2 Nightlight_mean RoadDensity_mean Diamond InfantMortality_mean Mineral Petroleum Population_mean_ipolate area, r cluster(objectid)
qui eststo: logit Violence_dum Violence_count_1 c.comm1_log_1##cultivated_quart ///
polity2 c.polity2#c.polity2 Nightlight_mean RoadDensity_mean Diamond InfantMortality_mean Mineral Petroleum Population_mean_ipolate area, r cluster(objectid)
esttab, s(N aic bic)
esttab using Table1.csv, s(N aic bic) replace
**poisson vs nbreg
poisson Violence_count Violence_count_1 wy_Violence_count_1 c.comm1_log_1 ///
polity2 c.polity2#c.polity2 Nightlight_mean RoadDensity_mean Diamond InfantMortality_mean Mineral Petroleum Population_mean_ipolate area
estat gof
Form STATA:
***The extreme significance of the goodness-of-fit chi-2 indicates that the Poisson regression model is
*inappropriate, suggesting to us that we should try a negative binomial model:
nbreg Violence_count Violence_count_1 wy_Violence_count_1 c.comm1_log_1 ///
polity2 c.polity2#c.polity2 Nightlight_mean RoadDensity_mean Diamond InfantMortality_mean Mineral Petroleum Population_mean_ipolate area
*\alpha is significantly different from 0, so the process being Poisson is rejected.
eststo clear
qui eststo: menbreg Violence_count Violence_count_1 c.comm1_log_1 ///
polity2 c.polity2#c.polity2 Nightlight_mean RoadDensity_mean Diamond InfantMortality_mean Mineral Petroleum Population_mean_ipolate area || objectid:, vce(cluster objectid)
qui eststo: menbreg Violence_count Violence_count_1 wy_Violence_count_1 c.comm1_log_1 ///
polity2 c.polity2#c.polity2 Nightlight_mean RoadDensity_mean Diamond InfantMortality_mean Mineral Petroleum Population_mean_ipolate area || objectid:, vce(cluster objectid)
qui eststo: menbreg Violence_count Violence_count_1 c.cultivated##c.comm1_log_1 ///
polity2 c.polity2#c.polity2 Nightlight_mean RoadDensity_mean Diamond InfantMortality_mean Mineral Petroleum Population_mean_ipolate area || objectid:, vce(cluster objectid)
qui eststo: menbreg Violence_count Violence_count_1 wy_Violence_count_1 c.cultivated##c.comm1_log_1 ///
polity2 c.polity2#c.polity2 Nightlight_mean RoadDensity_mean Diamond InfantMortality_mean Mineral Petroleum Population_mean_ipolate area || objectid:, vce(cluster objectid)
qui eststo: menbreg Violence_count Violence_count_1 c.comm1_log_1##cultivated_quart ///
polity2 c.polity2#c.polity2 Nightlight_mean RoadDensity_mean Diamond InfantMortality_mean Mineral Petroleum Population_mean_ipolate area || objectid:, vce(cluster objectid)
qui eststo: menbreg Violence_count Violence_count_1 wy_Violence_count_1 c.comm1_log_1##cultivated_quart ///
polity2 c.polity2#c.polity2 Nightlight_mean RoadDensity_mean Diamond InfantMortality_mean Mineral Petroleum Population_mean_ipolate area || objectid:, vce(cluster objectid)
esttab, s(N aic bic)
esttab using Table2.csv, s(N aic bic) replace