GWR is the term introduced by Brunsdon, Fotheringham, and Charlton (1997, 2002) to describe a family of regression models in which the coefficients are allowed to vary spatially. GWR uses the coordinates of each sample point or zone centroid, as a target point for a form of spatially weighted least squares regression (de Smith et al, 2015). The model takes on the following form:
\[y_i = \beta_{i0} + \beta_{i1}x_{i1} + ~... ~+ \beta_{ip}x_{ip} + \epsilon_i\]. Where \(\beta_{ip}\) is the local estimate of \(\beta_p\) at geographical location \(i\).
“Global OLS values are simply spatial averages that can hide a lot of information, we examine these summary statistics to gain a sense of how much spatial variation underlies the global OLS coefficients geographically weighted regression that attempts to untangle the apparent spatial non-stationarity in the parameter estimates.” (Cho et al, 2010).
Major decisions to make when running GWR: the kernel density function assigning weights \(w_{ij}\); the bandwidth \(h\) of the function, fixed or adaptive. A neighborhood (also known as a bandwidth) is the distance band or number of neighbors used for each local regression equation.
Spatially weighted regression with an adaptive bandwidth using nearest neighbor weighting, therefore spatial kernels varying depending on the region of interest. A neighborhood (also known as a bandwidth) is the distance band or number of neighbors used for each local regression equation and is perhaps the most important parameter to consider for Geographically Weighted Regression, as it controls the degree of smoothing in the model. Local weighting schemes include Gaussian and bisquare.
The Gaussian weighting scheme: \(w_{ij} = \exp (-\frac{d^2_{ij}}{h^2})\) where \(d_{ij}\) is the distance between locations \(i, j\) and \(h\) is the bandwidth. It assigns a weight of one to the regression feature (feature i), and weights for the surrounding features (j features) smoothly and gradually decrease as the distance from the regression feature increases. A Gaussian weighting scheme ensures that each regression feature will have many neighbors and thus increases the chance that there will be variation in the values of those neighbors. This avoids a well-known problem in geographically weighted regression called local collinearity. Use a Gaussian weighting scheme when the influence of neighboring features becomes smoothly and gradually less important but that influence is always present regardless of how far away the surrounding features are.
The Bisquare weighting scheme: \(w_{ij} = 1 - (\frac{d^3_{ij}}{h^3})^3\). It assigns a weight of one to the regression feature (feature i), and weights for the surrounding features (j features) smoothly and gradually decrease as the distance from the regression feature increases. However, all features outside of the neighborhood specified are assigned zero and do not impact the local regression for the target feature.
The cross-validation method to choose optimal kernel bandwidth: \(CV = \sum _i [y_i - \hat y_{\ne i}(\beta)]^2\). We are trying to find \(h\) that minimizes \(CV\).
First, we fit a global OLS regression predicting number of major building code violations per area in square miles (usarea) based on the following independent variables: Log median household income, log population size, percent non-Hispanic black, percent Hispanic, the unemployment rate, percent vacant units, percent of housing units built before 1970, percent of housing units built 2014 and after, and log median housing value.
library(spgwr)
library(grid)
library(gridExtra)
library(tidycensus)
library(tidyverse)
library(sf)
library(sp)
library(spdep)
library(rgdal)
library(tmap)
setwd("phil/")
#download.file(url = "https://raw.githubusercontent.com/crd230/data/master/phil_tracts.zip", #destfile = "phil_tracts.zip")
#unzip(zipfile = "phil_tracts.zip")
philly <- st_read("phil_tracts.shp")
fit.ols<-lm(usarea ~ lmhhinc + lpop + pnhblk + punemp + pvac + ph70 + lmhval +
phnew + phisp, data = philly)
summary(fit.ols)
##
## Call:
## lm(formula = usarea ~ lmhhinc + lpop + pnhblk + punemp + pvac +
## ph70 + lmhval + phnew + phisp, data = philly)
##
## Residuals:
## Min 1Q Median 3Q Max
## -177.24 -34.81 -9.91 23.03 670.17
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 534.491 164.270 3.254 0.00124 **
## lmhhinc 2.462 12.176 0.202 0.83990
## lpop -1.344 6.338 -0.212 0.83216
## pnhblk 21.158 18.077 1.170 0.24260
## punemp -5.097 63.645 -0.080 0.93622
## pvac 371.699 58.427 6.362 5.96e-10 ***
## ph70 -79.691 35.535 -2.243 0.02552 *
## lmhval -45.668 10.458 -4.367 1.64e-05 ***
## phnew 17.958 319.042 0.056 0.95514
## phisp -56.308 30.695 -1.834 0.06741 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 69.5 on 366 degrees of freedom
## Multiple R-squared: 0.3984, Adjusted R-squared: 0.3836
## F-statistic: 26.93 on 9 and 366 DF, p-value: < 2.2e-16
Fit multiple models: fixed versus adaptive weights. The first two weights are fixed in meters, while the third one is adaptive in proportion. It is the proportion of all cases which the weighting function will search, and include this fraction of observations in a model for each tract. The bandwidth distance will change according to the spatial density of features in the input feature class. The bandwidth becomes a function of the number of nearest neighbors such that each local estimation is based on the same number of features. Plug the bandwidth into the function gwr(), which runs the GWR model, using the argument adapt
philly.sp <- as(philly, "Spatial")
gwr.b1<-gwr.sel(usarea ~ lmhhinc + lpop + pnhblk + punemp + pvac + ph70 +
lmhval + phnew + phisp, philly.sp)
## Bandwidth: 14224.76 CV score: 1845565
## Bandwidth: 22993.17 CV score: 1860162
## Bandwidth: 8805.59 CV score: 1818722
## Bandwidth: 5456.356 CV score: 1750890
## Bandwidth: 3386.415 CV score: 1610987
## Bandwidth: 2107.121 CV score: 1497008
## Bandwidth: 1316.474 CV score: 1412725
## Bandwidth: 827.8277 CV score: 1787166
## Bandwidth: 1618.475 CV score: 1446078
## Bandwidth: 1129.828 CV score: 1448167
## Bandwidth: 1377.629 CV score: 1414671
## Bandwidth: 1298.316 CV score: 1413143
## Bandwidth: 1324.036 CV score: 1412698
## Bandwidth: 1322.62 CV score: 1412697
## Bandwidth: 1322.714 CV score: 1412696
## Bandwidth: 1322.708 CV score: 1412696
## Bandwidth: 1322.708 CV score: 1412696
## Bandwidth: 1322.708 CV score: 1412696
## Bandwidth: 1322.708 CV score: 1412696
gwr.fit1<-gwr(usarea ~ lmhhinc + lpop + pnhblk + punemp + pvac + ph70 + lmhval +
phnew + phisp, data = philly.sp, bandwidth = gwr.b1, se.fit=T, hatmatrix=T)
gwr.b2<-gwr.sel(usarea ~ lmhhinc + lpop + pnhblk + punemp + pvac + ph70 + lmhval +
phnew + phisp, data = philly.sp, gweight = gwr.bisquare)
## Bandwidth: 14224.76 CV score: 1797387
## Bandwidth: 22993.17 CV score: 1840913
## Bandwidth: 8805.59 CV score: 1634064
## Bandwidth: 5456.356 CV score: 1578492
## Bandwidth: 1760.637 CV score: NA
## Bandwidth: 4044.717 CV score: NA
## Bandwidth: 6735.649 CV score: 1585951
## Bandwidth: 5535.278 CV score: 1581610
## Bandwidth: 4917.158 CV score: NA
## Bandwidth: 5250.4 CV score: 1569337
## Bandwidth: 5123.113 CV score: 1563084
## Bandwidth: 5044.445 CV score: NA
## Bandwidth: 5171.732 CV score: 1565614
## Bandwidth: 5093.064 CV score: 1561374
## Bandwidth: 5074.493 CV score: NA
## Bandwidth: 5104.542 CV score: 1562042
## Bandwidth: 5085.971 CV score: NA
## Bandwidth: 5097.448 CV score: 1561631
## Bandwidth: 5090.355 CV score: NA
## Bandwidth: 5094.739 CV score: 1561473
## Bandwidth: 5092.029 CV score: NA
## Bandwidth: 5093.704 CV score: 1561412
## Bandwidth: 5092.669 CV score: NA
## Bandwidth: 5093.309 CV score: 1561389
## Bandwidth: 5092.913 CV score: 1561365
## Bandwidth: 5092.82 CV score: NA
## Bandwidth: 5092.971 CV score: 1561369
## Bandwidth: 5092.878 CV score: NA
## Bandwidth: 5092.935 CV score: 1561367
## Bandwidth: 5092.9 CV score: 1561365
## Bandwidth: 5092.891 CV score: NA
## Bandwidth: 5092.905 CV score: 1561365
## Bandwidth: 5092.897 CV score: NA
## Bandwidth: 5092.902 CV score: 1561365
## Bandwidth: 5092.899 CV score: 1561365
## Bandwidth: 5092.898 CV score: 1561364
## Bandwidth: 5092.897 CV score: NA
## Bandwidth: 5092.898 CV score: 1561365
## Bandwidth: 5092.898 CV score: 1561364
## Bandwidth: 5092.898 CV score: NA
## Bandwidth: 5092.898 CV score: 1561364
gwr.fit2<-gwr(usarea ~ lmhhinc + lpop + pnhblk + punemp + pvac + ph70 + lmhval +
phnew + phisp, data = philly.sp, bandwidth = gwr.b2, gweight = gwr.bisquare, se.fit=T,
hatmatrix=T)
# adaptive weights
gwr.b3<-gwr.sel(usarea ~ lmhhinc + lpop + pnhblk + punemp + pvac + ph70 +
lmhval + phnew + phisp, data = philly.sp, adapt = TRUE)
## Adaptive q: 0.381966 CV score: 1785033
## Adaptive q: 0.618034 CV score: 1814714
## Adaptive q: 0.236068 CV score: 1733969
## Adaptive q: 0.145898 CV score: 1650907
## Adaptive q: 0.09016994 CV score: 1565518
## Adaptive q: 0.05572809 CV score: 1471615
## Adaptive q: 0.03444185 CV score: 1401939
## Adaptive q: 0.02128624 CV score: 1384273
## Adaptive q: 0.01588534 CV score: 1431365
## Adaptive q: 0.02662582 CV score: 1380970
## Adaptive q: 0.02518866 CV score: 1380056
## Adaptive q: 0.02491844 CV score: 1380031
## Adaptive q: 0.02487775 CV score: 1380032
## Adaptive q: 0.02495914 CV score: 1380032
## Adaptive q: 0.02491844 CV score: 1380031
gwr.fit3<-gwr(usarea ~ lmhhinc + lpop + pnhblk + punemp + pvac + ph70 + lmhval +
phnew + phisp, data = philly.sp, adapt=gwr.b3, se.fit=T, hatmatrix=T)
c(gwr.b1, gwr.b2, gwr.b3)
## [1] 1.322708e+03 5.092898e+03 2.491844e-02
Map the adaptive bandwidth to geographical units and you see that smaller bandwidths are in smaller tracts primarily located in downtown Philadelphia.
philly$bwadapt <- gwr.fit3$bandwidth
tm_shape(philly, unit = "mi") +
tm_polygons(col = "bwadapt", style = "quantile",palette = "Reds",
border.alpha = 0, title = "") +
tm_scale_bar(breaks = c(0, 1, 2), size = 1, position = c("right", "bottom")) +
tm_compass(type = "4star", position = c("left", "top")) +
tm_layout(main.title = "GWR bandwidth", main.title.size = 0.95, frame = FALSE, legend.outside = TRUE)
Names of the spatial polygons data frame containing the 376 regression model estimates.
names(gwr.fit3$SDF)
## [1] "sum.w" "X.Intercept." "lmhhinc"
## [4] "lpop" "pnhblk" "punemp"
## [7] "pvac" "ph70" "lmhval"
## [10] "phnew" "phisp" "X.Intercept._se"
## [13] "lmhhinc_se" "lpop_se" "pnhblk_se"
## [16] "punemp_se" "pvac_se" "ph70_se"
## [19] "lmhval_se" "phnew_se" "phisp_se"
## [22] "gwr.e" "pred" "pred.se"
## [25] "localR2" "X.Intercept._se_EDF" "lmhhinc_se_EDF"
## [28] "lpop_se_EDF" "pnhblk_se_EDF" "punemp_se_EDF"
## [31] "pvac_se_EDF" "ph70_se_EDF" "lmhval_se_EDF"
## [34] "phnew_se_EDF" "phisp_se_EDF" "pred.se.1"
Plots of histograms of local parameter estimates and their corresponding global parameter estimate. All the global parameter estimates are in the middle of the local parameter estimates as expected.
par(mfrow = c(2, 2))
hist(gwr.fit3$SDF$pnhblk, probability = TRUE, main = "Histogram of percent non-Hispanic black", xlab = "percent non-Hispanic black")
abline(v=fit.ols$coeff["pnhblk"], col='red', lwd=3, lty='dashed')
hist(gwr.fit3$SDF$lmhhinc, probability = TRUE, main = "Histogram of Log median household income", xlab = "Log median household income")
abline(v=fit.ols$coeff["lmhhinc"], col='red', lwd=3, lty='dashed')
hist(gwr.fit3$SDF$lpop, probability = TRUE, main = "Histogram of percent log population size", xlab = "percent log population size")
abline(v=fit.ols$coeff["lpop"], col='red', lwd=3, lty='dashed')
hist(gwr.fit3$SDF$phisp, probability = TRUE, main = "Histogram of percent Hispanic", xlab = "percent Hispanic")
abline(v=fit.ols$coeff["lmhhinc"], col='red', lwd=3, lty='dashed')
hist(gwr.fit3$SDF$punemp, probability = TRUE, main = "Histogram of the unemployment rate", xlab = "the unemployment rate")
abline(v=fit.ols$coeff["punemp"], col='red', lwd=3, lty='dashed')
hist(gwr.fit3$SDF$pvac, probability = TRUE, main = "Histogram of the percent vacant units", xlab = "percent vacant units")
abline(v=fit.ols$coeff["pvac"], col='red', lwd=3, lty='dashed')
hist(gwr.fit3$SDF$ph70, probability = TRUE, main = "Histogram of percent built before 1970", xlab = "percent of housing units built before 1970")
abline(v=fit.ols$coeff["ph70"], col='red', lwd=3, lty='dashed')
hist(gwr.fit3$SDF$lmhval, probability = TRUE, main = "Histogram of log median housing value", xlab = "log median housing value")
abline(v=fit.ols$coeff["lmhval"], col='red', lwd=3, lty='dashed')
# Same thing for the following variable
# hist(gwr.fit3$SDF$phnew, probability = TRUE, main = "Histogram of percent housing built 2014 and after", xlab = "percent housing built 2014 and after")
# abline(v=fit.ols$coeff["phnew"], col='red', lwd=3, lty='dashed')
Calculate a t-statistic based p-value for the percent non-hispanic black variable parameter estimates. And map the p-values to the geographic units.
dfree<-gwr.fit3$results$edf
philly$pnhblk.t <- gwr.fit3$SDF$pnhblk/gwr.fit3$SDF$pnhblk_se
philly$pnhblk.t.p<-2*pt(-abs(philly$pnhblk.t), dfree)
breaks <- c(0,0.01,0.05,0.1,1)
tm_shape(philly, unit = "mi") +
tm_polygons(col = "pnhblk.t.p",palette = "Reds", breaks = breaks,
border.alpha = 0, title = "") +
tm_scale_bar(breaks = c(0, 1, 2), size = 1, position = c("right", "bottom")) +
tm_compass(type = "4star", position = c("left", "top")) +
tm_layout(main.title = "t-stat", main.title.size = 0.95, frame = FALSE, legend.outside = TRUE)
Assessing local multi-colinearity.
round(cor(as.data.frame(gwr.fit3$SDF[,2:11]), use ="complete.obs"),2)
## X.Intercept. lmhhinc lpop pnhblk punemp pvac ph70 lmhval phnew
## X.Intercept. 1.00 -0.38 -0.36 -0.07 -0.45 -0.59 -0.44 -0.76 0.32
## lmhhinc -0.38 1.00 -0.12 0.48 0.44 0.12 -0.02 -0.20 -0.32
## lpop -0.36 -0.12 1.00 -0.15 0.26 0.79 0.14 0.09 0.04
## pnhblk -0.07 0.48 -0.15 1.00 -0.20 -0.16 -0.56 -0.23 -0.39
## punemp -0.45 0.44 0.26 -0.20 1.00 0.63 0.48 0.10 0.02
## pvac -0.59 0.12 0.79 -0.16 0.63 1.00 0.41 0.27 -0.15
## ph70 -0.44 -0.02 0.14 -0.56 0.48 0.41 1.00 0.48 0.00
## lmhval -0.76 -0.20 0.09 -0.23 0.10 0.27 0.48 1.00 -0.17
## phnew 0.32 -0.32 0.04 -0.39 0.02 -0.15 0.00 -0.17 1.00
## phisp -0.11 0.12 0.04 0.17 -0.03 -0.02 -0.34 0.02 0.05
## phisp
## X.Intercept. -0.11
## lmhhinc 0.12
## lpop 0.04
## pnhblk 0.17
## punemp -0.03
## pvac -0.02
## ph70 -0.34
## lmhval 0.02
## phnew 0.05
## phisp 1.00
Assessingi model fit.
c(gwr.fit3$results$AICh, gwr.fit3$results$AICc, gwr.fit3$results$AICb, AIC(fit.ols))
## [1] 3964.174 4349.030 4258.020 4268.359
Brunsdon, Fotheringham & Charlton (1999, 2002) tests and Leung et al. (2000) tests. The first four model fit tests show that the GWR shows significant improvement in explanatory power over an OLS. The last model shows that the variables pvac, phnblk, and ph70 indicate statistically significant spatial heterogeneity in its GWR coefficients. These results indicate that there is spatial heterogeneity in the relationships between independent variables and major build code violations.
BFC02.gwr.test(gwr.fit3)
##
## Brunsdon, Fotheringham & Charlton (2002, pp. 91-2) ANOVA
##
## data: gwr.fit3
## F = 3.0349, df1 = 366.00, df2 = 198.16, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## SS OLS residuals SS GWR residuals
## 1767753.4 582484.4
BFC99.gwr.test(gwr.fit3)
##
## Brunsdon, Fotheringham & Charlton (1999) ANOVA
##
## data: gwr.fit3
## F = 2.4024, df1 = 333.76, df2 = 260.43, p-value = 2.142e-13
## alternative hypothesis: greater
## sample estimates:
## SS GWR improvement SS GWR residuals
## 1185269.0 582484.4
LMZ.F1GWR.test(gwr.fit3)
##
## Leung et al. (2000) F(1) test
##
## data: gwr.fit3
## F = 0.6086, df1 = 260.43, df2 = 366.00, p-value = 1.108e-05
## alternative hypothesis: less
## sample estimates:
## SS OLS residuals SS GWR residuals
## 1767753.4 582484.4
LMZ.F2GWR.test(gwr.fit3)
##
## Leung et al. (2000) F(2) test
##
## data: gwr.fit3
## F = 1.4621, df1 = 233.86, df2 = 366.00, p-value = 0.0005763
## alternative hypothesis: greater
## sample estimates:
## SS OLS residuals SS GWR improvement
## 1767753 1185269
LMZ.F3GWR.test(gwr.fit3)
##
## Leung et al. (2000) F(3) test
##
## F statistic Numerator d.f. Denominator d.f. Pr(>)
## (Intercept) 0.94419 122.59061 260.43 0.6369477
## lmhhinc 0.57486 105.11265 260.43 0.9993521
## lpop 0.83970 117.68318 260.43 0.8595358
## pnhblk 1.43484 89.33826 260.43 0.0151795 *
## punemp 0.92026 119.99577 260.43 0.6949096
## pvac 2.94794 106.14539 260.43 1.055e-12 ***
## ph70 1.69122 100.67109 260.43 0.0004901 ***
## lmhval 0.91483 116.24838 260.43 0.7054314
## phnew 0.22495 11.59110 260.43 0.9966467
## phisp 0.35611 40.49489 260.43 0.9999100
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
il <- get_acs(geography = "county", variables = c(tpop = "B01003_001", tpopr = "B03002_001",
nhwhite = "B03002_003", nhblk = "B03002_004",
nhasn = "B03002_006", hisp = "B03002_012",
unemptt = "B23025_003", unemp = "B23025_005",
povt = "B17001_001", pov = "B17001_002",
colt = "B15003_001", col1 = "B15003_022",
col2 = "B15003_023", col3 = "B15003_024",
col4 = "B15003_025", mobt = "B07003_001",
mob1 = "B07003_004"), state = "IL", year = 2019,output = "wide", geometry = TRUE)
il.counties <- il %>%
rename_with(~ sub("E$", "", .x), everything()) %>% #removes the E
mutate(pnhwhite = 100*(nhwhite/tpopr), pnhasn = 100*(nhasn/tpopr),
percentblack = 100*(nhblk/tpopr), percenthispanic = 100*(hisp/tpopr),
unemploymentrate = 100*(unemp/unemptt),
percentpoverty = 100*(pov/povt),
percentcollege = 100*((col1+col2+col3+col4)/colt),
mobility = 100-100*(mob1/mobt)) %>%
dplyr::select(c(GEOID,tpop, pnhwhite, pnhasn, percentblack, percenthispanic, percentpoverty, unemploymentrate, percentcollege, mobility))
olsfit <- lm(unemploymentrate ~ mobility + percentcollege + percentpoverty + percentblack + percenthispanic + log(tpop), data = il.counties)
summary(olsfit)
The F-statistic indicates that the model is statistically significant while only the coefficient for percent people in poverty is significant.
We map the residuals across the spatial units below to see whether there are any spatial patterns there.
resids <- residuals(olsfit)
map.resids <- cbind(il.counties, resids)
qtm(map.resids, fill = "resids")
locations = st_coordinates(st_centroid(il.counties$geometry))
GWRbandwidth <- gwr.sel(unemploymentrate ~ mobility + percentcollege + percentpoverty + percentblack + percenthispanic + log(tpop), data = il.counties, adapt = T, coords = locations)
## Adaptive q: 0.381966 CV score: 348.4971
## Adaptive q: 0.618034 CV score: 346.1579
## Adaptive q: 0.763932 CV score: 346.1717
## Adaptive q: 0.6891776 CV score: 346.2376
## Adaptive q: 0.527864 CV score: 346.7701
## Adaptive q: 0.5835921 CV score: 346.4808
## Adaptive q: 0.6452084 CV score: 346.1703
## Adaptive q: 0.6048784 CV score: 346.2409
## Adaptive q: 0.6284137 CV score: 346.2315
## Adaptive q: 0.613009 CV score: 346.1616
## Adaptive q: 0.6219987 CV score: 346.1888
## Adaptive q: 0.6161146 CV score: 346.1568
## Adaptive q: 0.616376 CV score: 346.1564
## Adaptive q: 0.6168197 CV score: 346.1559
## Adaptive q: 0.6172835 CV score: 346.1553
## Adaptive q: 0.6175702 CV score: 346.1549
## Adaptive q: 0.6177473 CV score: 346.1557
## Adaptive q: 0.6174805 CV score: 346.155
## Adaptive q: 0.6176378 CV score: 346.1549
## Adaptive q: 0.6176797 CV score: 346.1551
## Adaptive q: 0.6176378 CV score: 346.1549
gwrgauss <- gwr(unemploymentrate ~ mobility + percentcollege + percentpoverty + percentblack + percenthispanic + log(tpop), data = il.counties, coords = locations, adapt=GWRbandwidth, hatmatrix=T, se.fit=TRUE)
# save the results
#write.table(gwrgauss$SDF, file="gwrgauss.csv",sep=",", row.names=FALSE)
library(tmap)
results <-as.data.frame(gwrgauss$SDF)
gwr.map <- cbind(il.counties, as.matrix(results))
qtm(gwr.map, fill = "localR2")
map1 <- tm_shape(gwr.map) +
tm_fill("unemploymentrate",
n = 5,
style = "quantile", title = "unemployment rate") +
tm_layout(frame = FALSE,
legend.text.size = 0.5,
legend.title.size = 0.6)
map2 <- tm_shape(gwr.map) +
tm_fill("percentpoverty.1",
n = 5,
style = "quantile", title = "percent in poverty") +
tm_layout(frame = FALSE,
legend.text.size = 0.5,
legend.title.size = 0.6)
map3 <- tm_shape(gwr.map) +
tm_fill("percentpoverty",
n = 5,
style = "quantile", title = "Coefficient for percent in poverty") +
tm_layout(frame = FALSE,
legend.text.size = 0.5,
legend.title.size = 0.6)
map4 <- tm_shape(gwr.map) +
tm_fill("percentblack",
n = 5,
style = "quantile",
title = "Percent black") +
tm_layout(frame = FALSE,
legend.text.size = 0.5,
legend.title.size = 0.6)
grid.newpage()
pushViewport(viewport(layout=grid.layout(1,2)))
# prints a map object into a defined cell
print(map1, vp=viewport(layout.pos.col = 1, layout.pos.row =1))
print(map2, vp=viewport(layout.pos.col = 2, layout.pos.row =1))
grid.newpage()
pushViewport(viewport(layout=grid.layout(1,2)))
print(map3, vp=viewport(layout.pos.col = 1, layout.pos.row =1))
print(map4, vp=viewport(layout.pos.col = 2, layout.pos.row =1))
# 2 standard deviation intervals
int2se <- c(olsfit$coeff["(Intercept)"] - (2*summary(olsfit)$coeff[1,2]),
olsfit$coeff["(Intercept)"] + (2*summary(olsfit)$coeff[1,2]))
mobility2se <- c(olsfit$coeff["mobility"] - (2*summary(olsfit)$coeff[2,2]),
olsfit$coeff["mobility"] + (2*summary(olsfit)$coeff[2,2]))
percentcollege2se <- c(olsfit$coeff["percentcollege"] - (2*summary(olsfit)$coeff[3,2]),
olsfit$coeff["percentcollege"] + (2*summary(olsfit)$coeff[3,2]))
percentpoverty2se <- c(olsfit$coeff["percentpoverty"] - (2*summary(olsfit)$coeff[4,2]),
olsfit$coeff["percentpoverty"] + (2*summary(olsfit)$coeff[4,2]))
percentblack2se <- c(olsfit$coeff["percentblack"] - (2*summary(olsfit)$coeff[5,2]),
olsfit$coeff["percentblack"] +(2*summary(olsfit)$coeff[5,2]))
percenthispanic2se <- c(olsfit$coeff["percenthispanic"] - (2*summary(olsfit)$coeff[6,2]),
olsfit$coeff["percenthispanic"] + (2*summary(olsfit)$coeff[6,2]))
logtpop2se = c(olsfit$coeff["log(tpop)"] - (2*summary(olsfit)$coeff[7,2]),
olsfit$coeff["log(tpop)"] + (2*summary(olsfit)$coeff[7,2]))
How many local parameter estimates are outside the 2 standard deviation intervals of the corresponding global parameter estimates? Zero. You could also plot histograms of the local regression parameter estimates and see where does the global parameter estimates lie in their corresponding histograms.
temp <- read.csv("gwrgauss.csv")
sum((temp[,"X.Intercept."] < int2se[1]) |
(temp[,"X.Intercept."] > int2se[2])) / 102
## [1] 0
sum((temp[,"mobility"] < mobility2se[1]) | (temp[,"mobility"] > mobility2se[2])) / 102
## [1] 0
sum((temp[,"percentcollege"] < percentcollege2se[1]) | (temp[,"percentcollege"] > percentcollege2se[2])) / 102
## [1] 0
sum((temp[,"percentpoverty"] < percentpoverty2se[1]) | (temp[,"percentpoverty"] > percentpoverty2se[2])) / 102
## [1] 0
sum((temp[,"percentblack"] < percentblack2se[1]) | (temp[,"percentblack"] > percentblack2se[2])) / 102
## [1] 0
sum((temp[,"percenthispanic"] < percenthispanic2se[1]) | (temp[,"percenthispanic"] > percenthispanic2se[2])) / 102
## [1] 0
sum((temp[,"log.tpop."] < logtpop2se[1]) | (temp[,"log.tpop."] > logtpop2se[2])) / 102
## [1] 0
Tests as described in Fotheringham et al. (2002) and Leung et al. (2000) tests.
BFC02.gwr.test(gwrgauss)
##
## Brunsdon, Fotheringham & Charlton (2002, pp. 91-2) ANOVA
##
## data: gwrgauss
## F = 1.0554, df1 = 95.000, df2 = 90.298, p-value = 0.3987
## alternative hypothesis: greater
## sample estimates:
## SS OLS residuals SS GWR residuals
## 304.2916 288.3260
BFC99.gwr.test(gwrgauss)
##
## Brunsdon, Fotheringham & Charlton (1999) ANOVA
##
## data: gwrgauss
## F = 1.0634, df1 = 67.306, df2 = 93.681, p-value = 0.3882
## alternative hypothesis: greater
## sample estimates:
## SS GWR improvement SS GWR residuals
## 15.9656 288.3260
LMZ.F1GWR.test(gwrgauss)
##
## Leung et al. (2000) F(1) test
##
## data: gwrgauss
## F = 0.99687, df1 = 93.681, df2 = 95.000, p-value = 0.4941
## alternative hypothesis: less
## sample estimates:
## SS OLS residuals SS GWR residuals
## 304.2916 288.3260
LMZ.F2GWR.test(gwrgauss)
##
## Leung et al. (2000) F(2) test
##
## data: gwrgauss
## F = 1.0601, df1 = 15.337, df2 = 95.000, p-value = 0.4036
## alternative hypothesis: greater
## sample estimates:
## SS OLS residuals SS GWR improvement
## 304.2916 15.9656