Geographically weighted regression

GWR is the term introduced by Brunsdon, Fotheringham, and Charlton (1997, 2002) to describe a family of regression models in which the coefficients are allowed to vary spatially. GWR uses the coordinates of each sample point or zone centroid, as a target point for a form of spatially weighted least squares regression (de Smith et al, 2015). The model takes on the following form:

\[y_i = \beta_{i0} + \beta_{i1}x_{i1} + ~... ~+ \beta_{ip}x_{ip} + \epsilon_i\]. Where \(\beta_{ip}\) is the local estimate of \(\beta_p\) at geographical location \(i\).

“Global OLS values are simply spatial averages that can hide a lot of information, we examine these summary statistics to gain a sense of how much spatial variation underlies the global OLS coefficients geographically weighted regression that attempts to untangle the apparent spatial non-stationarity in the parameter estimates.” (Cho et al, 2010).

Major decisions to make when running GWR: the kernel density function assigning weights \(w_{ij}\); the bandwidth \(h\) of the function, fixed or adaptive. A neighborhood (also known as a bandwidth) is the distance band or number of neighbors used for each local regression equation.

Spatially weighted regression with an adaptive bandwidth using nearest neighbor weighting, therefore spatial kernels varying depending on the region of interest. A neighborhood (also known as a bandwidth) is the distance band or number of neighbors used for each local regression equation and is perhaps the most important parameter to consider for Geographically Weighted Regression, as it controls the degree of smoothing in the model. Local weighting schemes include Gaussian and bisquare.

The Gaussian weighting scheme: \(w_{ij} = \exp (-\frac{d^2_{ij}}{h^2})\) where \(d_{ij}\) is the distance between locations \(i, j\) and \(h\) is the bandwidth. It assigns a weight of one to the regression feature (feature i), and weights for the surrounding features (j features) smoothly and gradually decrease as the distance from the regression feature increases. A Gaussian weighting scheme ensures that each regression feature will have many neighbors and thus increases the chance that there will be variation in the values of those neighbors. This avoids a well-known problem in geographically weighted regression called local collinearity. Use a Gaussian weighting scheme when the influence of neighboring features becomes smoothly and gradually less important but that influence is always present regardless of how far away the surrounding features are.

The Bisquare weighting scheme: \(w_{ij} = 1 - (\frac{d^3_{ij}}{h^3})^3\). It assigns a weight of one to the regression feature (feature i), and weights for the surrounding features (j features) smoothly and gradually decrease as the distance from the regression feature increases. However, all features outside of the neighborhood specified are assigned zero and do not impact the local regression for the target feature.

The cross-validation method to choose optimal kernel bandwidth: \(CV = \sum _i [y_i - \hat y_{\ne i}(\beta)]^2\). We are trying to find \(h\) that minimizes \(CV\).

An example where spatial heterogeneity does exist.

OLS Regression

First, we fit a global OLS regression predicting number of major building code violations per area in square miles (usarea) based on the following independent variables: Log median household income, log population size, percent non-Hispanic black, percent Hispanic, the unemployment rate, percent vacant units, percent of housing units built before 1970, percent of housing units built 2014 and after, and log median housing value.

library(spgwr)
library(grid)
library(gridExtra)
library(tidycensus)
library(tidyverse)
library(sf)
library(sp)
library(spdep)
library(rgdal)
library(tmap)

setwd("phil/")
#download.file(url = "https://raw.githubusercontent.com/crd230/data/master/phil_tracts.zip", #destfile = "phil_tracts.zip")
#unzip(zipfile = "phil_tracts.zip")

philly <- st_read("phil_tracts.shp")

fit.ols<-lm(usarea ~ lmhhinc   + lpop + pnhblk + punemp + pvac  + ph70 + lmhval + 
     phnew + phisp, data = philly) 
summary(fit.ols)

## 
## Call:
## lm(formula = usarea ~ lmhhinc + lpop + pnhblk + punemp + pvac + 
##     ph70 + lmhval + phnew + phisp, data = philly)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -177.24  -34.81   -9.91   23.03  670.17 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  534.491    164.270   3.254  0.00124 ** 
## lmhhinc        2.462     12.176   0.202  0.83990    
## lpop          -1.344      6.338  -0.212  0.83216    
## pnhblk        21.158     18.077   1.170  0.24260    
## punemp        -5.097     63.645  -0.080  0.93622    
## pvac         371.699     58.427   6.362 5.96e-10 ***
## ph70         -79.691     35.535  -2.243  0.02552 *  
## lmhval       -45.668     10.458  -4.367 1.64e-05 ***
## phnew         17.958    319.042   0.056  0.95514    
## phisp        -56.308     30.695  -1.834  0.06741 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 69.5 on 366 degrees of freedom
## Multiple R-squared:  0.3984, Adjusted R-squared:  0.3836 
## F-statistic: 26.93 on 9 and 366 DF,  p-value: < 2.2e-16

GWR models: fixed versus adaptive weights

Fit multiple models: fixed versus adaptive weights. The first two weights are fixed in meters, while the third one is adaptive in proportion. It is the proportion of all cases which the weighting function will search, and include this fraction of observations in a model for each tract. The bandwidth distance will change according to the spatial density of features in the input feature class. The bandwidth becomes a function of the number of nearest neighbors such that each local estimation is based on the same number of features. Plug the bandwidth into the function gwr(), which runs the GWR model, using the argument adapt

philly.sp <- as(philly, "Spatial")

gwr.b1<-gwr.sel(usarea ~ lmhhinc   + lpop + pnhblk + punemp + pvac  + ph70 + 
                  lmhval + phnew + phisp, philly.sp)

## Bandwidth: 14224.76 CV score: 1845565 
## Bandwidth: 22993.17 CV score: 1860162 
## Bandwidth: 8805.59 CV score: 1818722 
## Bandwidth: 5456.356 CV score: 1750890 
## Bandwidth: 3386.415 CV score: 1610987 
## Bandwidth: 2107.121 CV score: 1497008 
## Bandwidth: 1316.474 CV score: 1412725 
## Bandwidth: 827.8277 CV score: 1787166 
## Bandwidth: 1618.475 CV score: 1446078 
## Bandwidth: 1129.828 CV score: 1448167 
## Bandwidth: 1377.629 CV score: 1414671 
## Bandwidth: 1298.316 CV score: 1413143 
## Bandwidth: 1324.036 CV score: 1412698 
## Bandwidth: 1322.62 CV score: 1412697 
## Bandwidth: 1322.714 CV score: 1412696 
## Bandwidth: 1322.708 CV score: 1412696 
## Bandwidth: 1322.708 CV score: 1412696 
## Bandwidth: 1322.708 CV score: 1412696 
## Bandwidth: 1322.708 CV score: 1412696

gwr.fit1<-gwr(usarea ~ lmhhinc   + lpop + pnhblk + punemp + pvac  + ph70 + lmhval + 
     phnew + phisp, data = philly.sp, bandwidth = gwr.b1, se.fit=T, hatmatrix=T)

gwr.b2<-gwr.sel(usarea ~ lmhhinc   + lpop + pnhblk + punemp + pvac  + ph70 + lmhval + 
     phnew + phisp, data = philly.sp, gweight = gwr.bisquare)

## Bandwidth: 14224.76 CV score: 1797387 
## Bandwidth: 22993.17 CV score: 1840913 
## Bandwidth: 8805.59 CV score: 1634064 
## Bandwidth: 5456.356 CV score: 1578492 
## Bandwidth: 1760.637 CV score: NA 
## Bandwidth: 4044.717 CV score: NA 
## Bandwidth: 6735.649 CV score: 1585951 
## Bandwidth: 5535.278 CV score: 1581610 
## Bandwidth: 4917.158 CV score: NA 
## Bandwidth: 5250.4 CV score: 1569337 
## Bandwidth: 5123.113 CV score: 1563084 
## Bandwidth: 5044.445 CV score: NA 
## Bandwidth: 5171.732 CV score: 1565614 
## Bandwidth: 5093.064 CV score: 1561374 
## Bandwidth: 5074.493 CV score: NA 
## Bandwidth: 5104.542 CV score: 1562042 
## Bandwidth: 5085.971 CV score: NA 
## Bandwidth: 5097.448 CV score: 1561631 
## Bandwidth: 5090.355 CV score: NA 
## Bandwidth: 5094.739 CV score: 1561473 
## Bandwidth: 5092.029 CV score: NA 
## Bandwidth: 5093.704 CV score: 1561412 
## Bandwidth: 5092.669 CV score: NA 
## Bandwidth: 5093.309 CV score: 1561389 
## Bandwidth: 5092.913 CV score: 1561365 
## Bandwidth: 5092.82 CV score: NA 
## Bandwidth: 5092.971 CV score: 1561369 
## Bandwidth: 5092.878 CV score: NA 
## Bandwidth: 5092.935 CV score: 1561367 
## Bandwidth: 5092.9 CV score: 1561365 
## Bandwidth: 5092.891 CV score: NA 
## Bandwidth: 5092.905 CV score: 1561365 
## Bandwidth: 5092.897 CV score: NA 
## Bandwidth: 5092.902 CV score: 1561365 
## Bandwidth: 5092.899 CV score: 1561365 
## Bandwidth: 5092.898 CV score: 1561364 
## Bandwidth: 5092.897 CV score: NA 
## Bandwidth: 5092.898 CV score: 1561365 
## Bandwidth: 5092.898 CV score: 1561364 
## Bandwidth: 5092.898 CV score: NA 
## Bandwidth: 5092.898 CV score: 1561364

gwr.fit2<-gwr(usarea ~ lmhhinc   + lpop + pnhblk + punemp + pvac  + ph70 + lmhval + 
     phnew + phisp, data = philly.sp, bandwidth = gwr.b2, gweight = gwr.bisquare, se.fit=T, 
     hatmatrix=T)

# adaptive weights
gwr.b3<-gwr.sel(usarea ~ lmhhinc   + lpop + pnhblk + punemp + pvac  + ph70 + 
                  lmhval + phnew + phisp, data = philly.sp, adapt = TRUE)

## Adaptive q: 0.381966 CV score: 1785033 
## Adaptive q: 0.618034 CV score: 1814714 
## Adaptive q: 0.236068 CV score: 1733969 
## Adaptive q: 0.145898 CV score: 1650907 
## Adaptive q: 0.09016994 CV score: 1565518 
## Adaptive q: 0.05572809 CV score: 1471615 
## Adaptive q: 0.03444185 CV score: 1401939 
## Adaptive q: 0.02128624 CV score: 1384273 
## Adaptive q: 0.01588534 CV score: 1431365 
## Adaptive q: 0.02662582 CV score: 1380970 
## Adaptive q: 0.02518866 CV score: 1380056 
## Adaptive q: 0.02491844 CV score: 1380031 
## Adaptive q: 0.02487775 CV score: 1380032 
## Adaptive q: 0.02495914 CV score: 1380032 
## Adaptive q: 0.02491844 CV score: 1380031

gwr.fit3<-gwr(usarea ~ lmhhinc   + lpop + pnhblk + punemp + pvac  + ph70 + lmhval + 
     phnew + phisp, data = philly.sp, adapt=gwr.b3, se.fit=T, hatmatrix=T)
c(gwr.b1, gwr.b2, gwr.b3)

## [1] 1.322708e+03 5.092898e+03 2.491844e-02

Map the adaptive bandwidth to geographical units and you see that smaller bandwidths are in smaller tracts primarily located in downtown Philadelphia.

philly$bwadapt <- gwr.fit3$bandwidth

tm_shape(philly, unit = "mi") +
  tm_polygons(col = "bwadapt", style = "quantile",palette = "Reds", 
              border.alpha = 0, title = "") +
  tm_scale_bar(breaks = c(0, 1, 2), size = 1, position = c("right", "bottom")) +
  tm_compass(type = "4star", position = c("left", "top")) + 
  tm_layout(main.title = "GWR bandwidth",  main.title.size = 0.95, frame = FALSE, legend.outside = TRUE)

Inspecting the GWR results

Names of the spatial polygons data frame containing the 376 regression model estimates.

names(gwr.fit3$SDF)

##  [1] "sum.w"               "X.Intercept."        "lmhhinc"            
##  [4] "lpop"                "pnhblk"              "punemp"             
##  [7] "pvac"                "ph70"                "lmhval"             
## [10] "phnew"               "phisp"               "X.Intercept._se"    
## [13] "lmhhinc_se"          "lpop_se"             "pnhblk_se"          
## [16] "punemp_se"           "pvac_se"             "ph70_se"            
## [19] "lmhval_se"           "phnew_se"            "phisp_se"           
## [22] "gwr.e"               "pred"                "pred.se"            
## [25] "localR2"             "X.Intercept._se_EDF" "lmhhinc_se_EDF"     
## [28] "lpop_se_EDF"         "pnhblk_se_EDF"       "punemp_se_EDF"      
## [31] "pvac_se_EDF"         "ph70_se_EDF"         "lmhval_se_EDF"      
## [34] "phnew_se_EDF"        "phisp_se_EDF"        "pred.se.1"

Plots of histograms of local parameter estimates and their corresponding global parameter estimate. All the global parameter estimates are in the middle of the local parameter estimates as expected.

par(mfrow = c(2, 2))
hist(gwr.fit3$SDF$pnhblk, probability = TRUE, main = "Histogram of percent non-Hispanic black", xlab = "percent non-Hispanic black")
abline(v=fit.ols$coeff["pnhblk"], col='red', lwd=3, lty='dashed')

hist(gwr.fit3$SDF$lmhhinc, probability = TRUE, main = "Histogram of Log median household income", xlab = "Log median household income")
abline(v=fit.ols$coeff["lmhhinc"], col='red', lwd=3, lty='dashed')

hist(gwr.fit3$SDF$lpop, probability = TRUE, main = "Histogram of percent log population size", xlab = "percent log population size")
abline(v=fit.ols$coeff["lpop"], col='red', lwd=3, lty='dashed')

hist(gwr.fit3$SDF$phisp, probability = TRUE, main = "Histogram of percent Hispanic", xlab = "percent Hispanic")
abline(v=fit.ols$coeff["lmhhinc"], col='red', lwd=3, lty='dashed')

hist(gwr.fit3$SDF$punemp, probability = TRUE, main = "Histogram of the unemployment rate", xlab = "the unemployment rate")
abline(v=fit.ols$coeff["punemp"], col='red', lwd=3, lty='dashed')

hist(gwr.fit3$SDF$pvac, probability = TRUE, main = "Histogram of the percent vacant units", xlab = "percent vacant units")
abline(v=fit.ols$coeff["pvac"], col='red', lwd=3, lty='dashed')

hist(gwr.fit3$SDF$ph70, probability = TRUE, main = "Histogram of percent built before 1970", xlab = "percent of housing units built before 1970")
abline(v=fit.ols$coeff["ph70"], col='red', lwd=3, lty='dashed')

hist(gwr.fit3$SDF$lmhval, probability = TRUE, main = "Histogram of log median housing value", xlab = "log median housing value")
abline(v=fit.ols$coeff["lmhval"], col='red', lwd=3, lty='dashed')

# Same thing for the following variable
# hist(gwr.fit3$SDF$phnew, probability = TRUE, main = "Histogram of percent housing built 2014 and after", xlab = "percent housing built 2014 and after")
# abline(v=fit.ols$coeff["phnew"], col='red', lwd=3, lty='dashed')

Calculate a t-statistic based p-value for the percent non-hispanic black variable parameter estimates. And map the p-values to the geographic units.

dfree<-gwr.fit3$results$edf
philly$pnhblk.t <- gwr.fit3$SDF$pnhblk/gwr.fit3$SDF$pnhblk_se
philly$pnhblk.t.p<-2*pt(-abs(philly$pnhblk.t), dfree)
breaks <- c(0,0.01,0.05,0.1,1)

tm_shape(philly, unit = "mi") +
  tm_polygons(col = "pnhblk.t.p",palette = "Reds", breaks = breaks,
              border.alpha = 0, title = "") +
  tm_scale_bar(breaks = c(0, 1, 2), size = 1, position = c("right", "bottom")) +
  tm_compass(type = "4star", position = c("left", "top")) + 
  tm_layout(main.title = "t-stat",  main.title.size = 0.95, frame = FALSE, legend.outside = TRUE)

Assessing local multi-colinearity.

round(cor(as.data.frame(gwr.fit3$SDF[,2:11]), use ="complete.obs"),2)

##              X.Intercept. lmhhinc  lpop pnhblk punemp  pvac  ph70 lmhval phnew
## X.Intercept.         1.00   -0.38 -0.36  -0.07  -0.45 -0.59 -0.44  -0.76  0.32
## lmhhinc             -0.38    1.00 -0.12   0.48   0.44  0.12 -0.02  -0.20 -0.32
## lpop                -0.36   -0.12  1.00  -0.15   0.26  0.79  0.14   0.09  0.04
## pnhblk              -0.07    0.48 -0.15   1.00  -0.20 -0.16 -0.56  -0.23 -0.39
## punemp              -0.45    0.44  0.26  -0.20   1.00  0.63  0.48   0.10  0.02
## pvac                -0.59    0.12  0.79  -0.16   0.63  1.00  0.41   0.27 -0.15
## ph70                -0.44   -0.02  0.14  -0.56   0.48  0.41  1.00   0.48  0.00
## lmhval              -0.76   -0.20  0.09  -0.23   0.10  0.27  0.48   1.00 -0.17
## phnew                0.32   -0.32  0.04  -0.39   0.02 -0.15  0.00  -0.17  1.00
## phisp               -0.11    0.12  0.04   0.17  -0.03 -0.02 -0.34   0.02  0.05
##              phisp
## X.Intercept. -0.11
## lmhhinc       0.12
## lpop          0.04
## pnhblk        0.17
## punemp       -0.03
## pvac         -0.02
## ph70         -0.34
## lmhval        0.02
## phnew         0.05
## phisp         1.00

Assessingi model fit.

c(gwr.fit3$results$AICh, gwr.fit3$results$AICc, gwr.fit3$results$AICb, AIC(fit.ols))

## [1] 3964.174 4349.030 4258.020 4268.359

Brunsdon, Fotheringham & Charlton (1999, 2002) tests and Leung et al. (2000) tests. The first four model fit tests show that the GWR shows significant improvement in explanatory power over an OLS. The last model shows that the variables pvac, phnblk, and ph70 indicate statistically significant spatial heterogeneity in its GWR coefficients. These results indicate that there is spatial heterogeneity in the relationships between independent variables and major build code violations.

BFC02.gwr.test(gwr.fit3)

## 
##  Brunsdon, Fotheringham & Charlton (2002, pp. 91-2) ANOVA
## 
## data:  gwr.fit3
## F = 3.0349, df1 = 366.00, df2 = 198.16, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## SS OLS residuals SS GWR residuals 
##        1767753.4         582484.4

BFC99.gwr.test(gwr.fit3)

## 
##  Brunsdon, Fotheringham & Charlton (1999) ANOVA
## 
## data:  gwr.fit3
## F = 2.4024, df1 = 333.76, df2 = 260.43, p-value = 2.142e-13
## alternative hypothesis: greater
## sample estimates:
## SS GWR improvement   SS GWR residuals 
##          1185269.0           582484.4

LMZ.F1GWR.test(gwr.fit3)

## 
##  Leung et al. (2000) F(1) test
## 
## data:  gwr.fit3
## F = 0.6086, df1 = 260.43, df2 = 366.00, p-value = 1.108e-05
## alternative hypothesis: less
## sample estimates:
## SS OLS residuals SS GWR residuals 
##        1767753.4         582484.4

LMZ.F2GWR.test(gwr.fit3)

## 
##  Leung et al. (2000) F(2) test
## 
## data:  gwr.fit3
## F = 1.4621, df1 = 233.86, df2 = 366.00, p-value = 0.0005763
## alternative hypothesis: greater
## sample estimates:
##   SS OLS residuals SS GWR improvement 
##            1767753            1185269

LMZ.F3GWR.test(gwr.fit3)

## 
## Leung et al. (2000) F(3) test
## 
##             F statistic Numerator d.f. Denominator d.f.     Pr(>)    
## (Intercept)     0.94419      122.59061           260.43 0.6369477    
## lmhhinc         0.57486      105.11265           260.43 0.9993521    
## lpop            0.83970      117.68318           260.43 0.8595358    
## pnhblk          1.43484       89.33826           260.43 0.0151795 *  
## punemp          0.92026      119.99577           260.43 0.6949096    
## pvac            2.94794      106.14539           260.43 1.055e-12 ***
## ph70            1.69122      100.67109           260.43 0.0004901 ***
## lmhval          0.91483      116.24838           260.43 0.7054314    
## phnew           0.22495       11.59110           260.43 0.9966467    
## phisp           0.35611       40.49489           260.43 0.9999100    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

A negative example that shows that spatial heterogeneity does not exist

il <- get_acs(geography = "county", variables = c(tpop = "B01003_001", tpopr = "B03002_001", 
                                   nhwhite = "B03002_003", nhblk = "B03002_004",
                                   nhasn = "B03002_006", hisp = "B03002_012",
                                   unemptt = "B23025_003", unemp = "B23025_005",
                                   povt = "B17001_001", pov = "B17001_002", 
                                   colt = "B15003_001", col1 = "B15003_022", 
                                   col2 = "B15003_023", col3 = "B15003_024", 
                                   col4 = "B15003_025", mobt = "B07003_001", 
                                   mob1 = "B07003_004"), state = "IL", year = 2019,output = "wide", geometry = TRUE)

il.counties <- il %>% 
  rename_with(~ sub("E$", "", .x), everything()) %>%  #removes the E 
  mutate(pnhwhite = 100*(nhwhite/tpopr), pnhasn = 100*(nhasn/tpopr), 
         percentblack = 100*(nhblk/tpopr), percenthispanic = 100*(hisp/tpopr),
         unemploymentrate = 100*(unemp/unemptt),
         percentpoverty = 100*(pov/povt), 
         percentcollege = 100*((col1+col2+col3+col4)/colt), 
         mobility = 100-100*(mob1/mobt)) %>%
  dplyr::select(c(GEOID,tpop, pnhwhite, pnhasn, percentblack, percenthispanic, percentpoverty, unemploymentrate, percentcollege, mobility))


olsfit <- lm(unemploymentrate ~ mobility + percentcollege + percentpoverty + percentblack + percenthispanic + log(tpop), data = il.counties)

summary(olsfit)

The F-statistic indicates that the model is statistically significant while only the coefficient for percent people in poverty is significant.

We map the residuals across the spatial units below to see whether there are any spatial patterns there.

resids <- residuals(olsfit)

map.resids <- cbind(il.counties, resids) 

qtm(map.resids, fill = "resids")

locations = st_coordinates(st_centroid(il.counties$geometry))

GWRbandwidth <- gwr.sel(unemploymentrate ~ mobility + percentcollege + percentpoverty + percentblack + percenthispanic + log(tpop), data = il.counties, adapt = T, coords = locations)

## Adaptive q: 0.381966 CV score: 348.4971 
## Adaptive q: 0.618034 CV score: 346.1579 
## Adaptive q: 0.763932 CV score: 346.1717 
## Adaptive q: 0.6891776 CV score: 346.2376 
## Adaptive q: 0.527864 CV score: 346.7701 
## Adaptive q: 0.5835921 CV score: 346.4808 
## Adaptive q: 0.6452084 CV score: 346.1703 
## Adaptive q: 0.6048784 CV score: 346.2409 
## Adaptive q: 0.6284137 CV score: 346.2315 
## Adaptive q: 0.613009 CV score: 346.1616 
## Adaptive q: 0.6219987 CV score: 346.1888 
## Adaptive q: 0.6161146 CV score: 346.1568 
## Adaptive q: 0.616376 CV score: 346.1564 
## Adaptive q: 0.6168197 CV score: 346.1559 
## Adaptive q: 0.6172835 CV score: 346.1553 
## Adaptive q: 0.6175702 CV score: 346.1549 
## Adaptive q: 0.6177473 CV score: 346.1557 
## Adaptive q: 0.6174805 CV score: 346.155 
## Adaptive q: 0.6176378 CV score: 346.1549 
## Adaptive q: 0.6176797 CV score: 346.1551 
## Adaptive q: 0.6176378 CV score: 346.1549

gwrgauss <- gwr(unemploymentrate ~ mobility + percentcollege + percentpoverty + percentblack + percenthispanic + log(tpop), data = il.counties, coords = locations, adapt=GWRbandwidth, hatmatrix=T, se.fit=TRUE)

# save the results
#write.table(gwrgauss$SDF, file="gwrgauss.csv",sep=",", row.names=FALSE)

Map the result coefficients

library(tmap)

results <-as.data.frame(gwrgauss$SDF)
gwr.map <- cbind(il.counties, as.matrix(results))

qtm(gwr.map, fill = "localR2")

map1 <- tm_shape(gwr.map) + 
  tm_fill("unemploymentrate",
          n = 5,
          style = "quantile", title = "unemployment rate")  +
  tm_layout(frame = FALSE,
            legend.text.size = 0.5,
            legend.title.size = 0.6)

map2 <- tm_shape(gwr.map) + 
  tm_fill("percentpoverty.1",
          n = 5,
          style = "quantile", title = "percent in poverty")  +
  tm_layout(frame = FALSE,
            legend.text.size = 0.5,
            legend.title.size = 0.6)

map3 <- tm_shape(gwr.map) +
  tm_fill("percentpoverty",
          n = 5,
          style = "quantile", title = "Coefficient for percent in poverty") +
  tm_layout(frame = FALSE,
            legend.text.size = 0.5,
            legend.title.size = 0.6)

map4 <- tm_shape(gwr.map) +
  tm_fill("percentblack",
          n = 5,
          style = "quantile",
          title = "Percent black") +
  tm_layout(frame = FALSE,
            legend.text.size = 0.5,
            legend.title.size = 0.6)

grid.newpage()
pushViewport(viewport(layout=grid.layout(1,2)))

# prints a map object into a defined cell   
print(map1, vp=viewport(layout.pos.col = 1, layout.pos.row =1))
print(map2, vp=viewport(layout.pos.col = 2, layout.pos.row =1))

grid.newpage()
pushViewport(viewport(layout=grid.layout(1,2)))
print(map3, vp=viewport(layout.pos.col = 1, layout.pos.row =1))
print(map4, vp=viewport(layout.pos.col = 2, layout.pos.row =1))

compare OLS and GWR

# 2  standard deviation intervals

int2se <- c(olsfit$coeff["(Intercept)"] - (2*summary(olsfit)$coeff[1,2]),
  olsfit$coeff["(Intercept)"] + (2*summary(olsfit)$coeff[1,2]))

mobility2se <- c(olsfit$coeff["mobility"] - (2*summary(olsfit)$coeff[2,2]),
  olsfit$coeff["mobility"] + (2*summary(olsfit)$coeff[2,2]))

percentcollege2se <- c(olsfit$coeff["percentcollege"] - (2*summary(olsfit)$coeff[3,2]),
  olsfit$coeff["percentcollege"] + (2*summary(olsfit)$coeff[3,2]))
  
percentpoverty2se <- c(olsfit$coeff["percentpoverty"] - (2*summary(olsfit)$coeff[4,2]),
  olsfit$coeff["percentpoverty"] + (2*summary(olsfit)$coeff[4,2]))
  
percentblack2se <- c(olsfit$coeff["percentblack"] - (2*summary(olsfit)$coeff[5,2]),
  olsfit$coeff["percentblack"] +(2*summary(olsfit)$coeff[5,2]))
  
percenthispanic2se <- c(olsfit$coeff["percenthispanic"] - (2*summary(olsfit)$coeff[6,2]),
  olsfit$coeff["percenthispanic"] + (2*summary(olsfit)$coeff[6,2]))
  
logtpop2se = c(olsfit$coeff["log(tpop)"] - (2*summary(olsfit)$coeff[7,2]),
  olsfit$coeff["log(tpop)"] + (2*summary(olsfit)$coeff[7,2]))

How many local parameter estimates are outside the 2 standard deviation intervals of the corresponding global parameter estimates? Zero. You could also plot histograms of the local regression parameter estimates and see where does the global parameter estimates lie in their corresponding histograms.

temp <- read.csv("gwrgauss.csv")

sum((temp[,"X.Intercept."] < int2se[1]) |
    (temp[,"X.Intercept."] > int2se[2])) / 102

## [1] 0

sum((temp[,"mobility"] < mobility2se[1]) | (temp[,"mobility"] > mobility2se[2])) / 102

## [1] 0

sum((temp[,"percentcollege"] < percentcollege2se[1]) | (temp[,"percentcollege"] > percentcollege2se[2])) / 102

## [1] 0

sum((temp[,"percentpoverty"] < percentpoverty2se[1]) | (temp[,"percentpoverty"] > percentpoverty2se[2])) / 102

## [1] 0

sum((temp[,"percentblack"] < percentblack2se[1]) | (temp[,"percentblack"] > percentblack2se[2])) / 102

## [1] 0

sum((temp[,"percenthispanic"] < percenthispanic2se[1]) | (temp[,"percenthispanic"] > percenthispanic2se[2])) / 102

## [1] 0

sum((temp[,"log.tpop."] < logtpop2se[1]) | (temp[,"log.tpop."] > logtpop2se[2])) / 102

## [1] 0

Tests as described in Fotheringham et al. (2002) and Leung et al. (2000) tests.

BFC02.gwr.test(gwrgauss)

## 
##  Brunsdon, Fotheringham & Charlton (2002, pp. 91-2) ANOVA
## 
## data:  gwrgauss
## F = 1.0554, df1 = 95.000, df2 = 90.298, p-value = 0.3987
## alternative hypothesis: greater
## sample estimates:
## SS OLS residuals SS GWR residuals 
##         304.2916         288.3260

BFC99.gwr.test(gwrgauss)

## 
##  Brunsdon, Fotheringham & Charlton (1999) ANOVA
## 
## data:  gwrgauss
## F = 1.0634, df1 = 67.306, df2 = 93.681, p-value = 0.3882
## alternative hypothesis: greater
## sample estimates:
## SS GWR improvement   SS GWR residuals 
##            15.9656           288.3260

LMZ.F1GWR.test(gwrgauss)

## 
##  Leung et al. (2000) F(1) test
## 
## data:  gwrgauss
## F = 0.99687, df1 = 93.681, df2 = 95.000, p-value = 0.4941
## alternative hypothesis: less
## sample estimates:
## SS OLS residuals SS GWR residuals 
##         304.2916         288.3260

LMZ.F2GWR.test(gwrgauss)

## 
##  Leung et al. (2000) F(2) test
## 
## data:  gwrgauss
## F = 1.0601, df1 = 15.337, df2 = 95.000, p-value = 0.4036
## alternative hypothesis: greater
## sample estimates:
##   SS OLS residuals SS GWR improvement 
##           304.2916            15.9656