Want to create interactive content? It’s easy in Genially!

Get started free

Lecture 4 - Mapping and Modelling Geographic Data in R

Richard Harris

Created on November 17, 2023

Start designing with a free template

Discover more than 1500 professional designs like these:

Terrazzo Presentation

Visual Presentation

Relaxing Presentation

Modern Presentation

Colorful Presentation

Modular Structure Presentation

Chromatic Presentation

Transcript

Mapping and ModellingGeographic Data in R

Lecture 4

Geographically Weighted Statistics
Part 1, Intro to R Intro to statistics Intro to regression
Part 2, Mapping in R
GeographicalDataScience
Part 3, Spatial analysis in R

A statistical note

Statistics is concerned with variation in data. Variation can arise:

  • Because of measurement errors
  • Because what we are measuring differs between groups, or places, or at different times, or...
Spatial statistics are concerns with spatial (geographic) variations in data (geographic patterns in the data; the differences between places; spatially-varying relationships; etc.)

A statistical note

Often, when we generate, a statistic, we also consider the possibility of a null ('nothing') hypothesis.For example:

  • The difference between the average of two samples of data is zero
  • The effect of a dependent variable (X) on an independent variable (Y) is zero(they are unrelated)
  • There is no correlation between values recorded at locations and the corresponding values of those locations' average neighbour (no spatial autocorrelation)

A statistical note

The probability that a difference, an effect size or a correlation (etc.) being exactly zero is tiny.The question then becomes, is it far enough away from zero to have confidence to reject the possibility of the null hypothesis. Clasically, 'far enough' is dependent upon:

  • How far the value (the test statistic) deviates from zero
  • How much data we have (the degrees of freedom)
  • How variable ('noisy') the data are
  • How confident we want to be (e.g. 95% confidence, 99% confidence, 99.9% confidence)

A statistical note

This is where the idea of a confidence interval and also 'p' (or Pr) values come from.

  • Treating the data as a sample from some underlying 'population',
  • and given assumptions about how the test statistic would be distributed with repeated random sampling from that underlying population,
  • then the range can be determined of, say, the 95% of values that arise arise 'by chance' through the process of sampling,
  • and that knowledge can be used to create a confidence interval around what we have calculated (so, for example, instead of stating that the Moran correlation coefficient is exactly 0.577, the 95% cofidence interval could be used to suggest it lies between 0.495 and 0.658).

A statistical note

Typically, if the confidence interval does not include zero then,

  • the null hypothesis is rejected at the given level of confidence, and
  • the result is said to be 'statistically significant'.
  • 95% confidence (p < 0.05)
  • 99% confidence (p < 0.01)
  • 99.9% confidence (p < 0.001)

The effect of age on No_schooling is negative and statistically significant at (more than) a 95% confidence.

Vs

'Local'

'Global'

Local

Global

e.g. The average values for sub-spaces of the map
e.g. The average value for the whole of the map

Local

Global

e.g. The average values for sub-spaces of the map
e.g. The average value for the whole of the map
Which can then be compared
etc.

Geographically weighted statistics

Source: GWmodel: An R Package for Exploring Spatial Heterogeneity Using Geographically Weighted Models

Geographically weighted statistics

Need to consider:

  • the type of neighbours
  • the number of neighbours
  • shape of the kernel (the inverse distance weighting)
  • where to calculate (interpolate) the values

Geographically weighted statistics

k-nearest neighbours (adaptive) or within a fixed distance

Need to consider:

  • the type of neighbours
  • the number of neighbours
  • shape of the kernel (the inverse distance weighting)
  • where to calculate (inteprolate) the values

Geographically weighted statistics

Can be specified or set by an optimisation/calibration procedure

Need to consider:

  • the type of neighbours
  • the number of neighbours
  • shape of the kernel (the inverse distance weighting)
  • where to calculate (interpolate) the values

Geographically weighted statistics

Need to consider:

  • the type of neighbours
  • the number of neighbours
  • shape of the kernel (the inverse distance weighting)
  • where to calculate (interpolate) the values

Default is bisquare

Geographically weighted statistics

Need to consider:

  • the type of neighbours
  • the number of neighbours
  • shape of the kernel (the inverse distance weighting)
  • where to calculate (interpolate) the values

Typically at the centroids of polygons

Geographically weighted statistics

  • Geographically weighted mean
  • Geographically weighted standard deviation
  • Geographically weighted correlation
  • Geographically weighted regression
  • ...

Applictions include

Spatial smoothing

Applictions include

Spatial interpolation

Applictions include

Examiningspatially varyingrelationships

Applictions include

Examiningspatially varyingrelationships

Not all these correlations are necessarily significant(nor are all the geographically weighted means or other statistics generated)

A statistical note

This is where the idea of a confidence interval and also 'p' (or Pr) values come from.

  • Treating the data as a sample from some underlying 'population',
  • and given assumptions about how the test statistic would be distributed with repeated random sampling from that underlying population,
  • then the range can be determined of, say, the 95% of values that arise arise 'by chance' through the process of sampling,
  • and that knowledge can be used to create a confidence interval around what we have calculated (so, for example, instead of stating that the Moran correlation coefficient is exactly 0.577, the 95% cofidence interval could be used to suggest it lies between 0.495 and 0.658).

But there are other ways...

Permutation

The 'invisibility' problem

Map insert
'Blanced cartogram'
More about cartograms
'Hexogram'

Anyquestions?