Last June I did a blog post about building dot-denisty maps in R using UK Census data. It has proven to be a fairly popular post, most likely due to the maps looking like something you’re more likely to see in the Tate Modern…
IMAGE
Not only do these maps looks beautiful, but there is a strong argument that they do a better job of representing data compared to the more common choropleth methods of filling geographical regions with one colour based on one variable.
The pièce de résistance of dot-density mapping is that it does not suffer from the tendency of over-emphasising the influence of large, yet sparsely populated areas, as colour-coverage is dictated by count, not area size.
When applied to election mapping, this gives a fairer assessment of the ‘popular vote’ when compared to a standard choropleth map that will fill entire constituencies with the colour of the winning party, regardless of how close the contest was or how many people voted.
A good example of this is the webcomic xkcd and cartographer Kenneth Field’s recent interpretations of the 2016 US Presidential Election (see images below, respectively), both of which set twitter alight with debate.
IMAGE
You can also have a gander at the Mapping tool we built last year to look at UK election results from a dot-density context.
And for a more enlightented discussion on the troubles and strifes of accurate mapping, check out the latest FT chart doctor article, where the FT’s Alan Smith talks with professor Mark Monmonier, author of the classic book How to Lie With Maps. (while you’re at it, have a look at Steven Bernard’s piece on how the FT’s always-on-point maps are made here).
R’s New Spatial Workflow
With all of this in mind, I thought it would be a good time to update the previous blog post, this time utilising the relatively new simple features (sf) R package. sf makes it a lot easier to do geospatial analysis within a tidy framework, ergo making it work seamlessly with the tidyverse, as each geospatial element is bundled into a list and treated as a single observation of a geographic variable in a data frame. No more fortifying malarky.
This means we can go from raw data -> dot density map with a lot less code and stress than ever before. So here’s a quick demo of how to get it done, this time as a map of 2016 UK General Election results in London Constituencies…
Load Packages and Get Some Data
First lets get some election data and a constituency level shapefile then select/rename the columns we need in each and join them together.
I filter each dataset to the London region but if you’re doing this yourself and want to map another region, you can simply switch London out for the region of your choice and continue on.
library(tidyverse) # dev version of ggplot2 required devtools::install_github('hadley/ggplot2')
library(sf)
extrafont::loadfonts("win")
# election results filtered to London region
ge_data <- read_csv("http://researchbriefings.files.parliament.uk/documents/CBP-7979/HoC-GE2017-constituency-results.csv") %>%
filter(region_name == "London") %>%
select(ons_id, constituency_name, first_party, Con = con, Lab = lab, LD = ld, UKIP = ukip, Green = green)
# shapefile filtered to London region
# data available here: https://www.dropbox.com/s/4iajcx25grpx5qi/uk_650_wpc_2017_full_res_v1.8.zip?dl=0
uk <- st_read("../../data/blog_data/uk_650_wpc_2017_full_res_v1.8.shp", stringsAsFactors = FALSE, quiet = TRUE) %>%
st_transform(4326) %>%
filter(REGN == "London") %>%
select(ons_id = PCONCODE)
# merge the data
sf_data <- left_join(ge_data, uk) %>%
st_as_sf() # I'm losing sf class after join so make sf object again
head(sf_data)
## Simple feature collection with 6 features and 8 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -0.2050868 ymin: 51.34552 xmax: 0.2176442 ymax: 51.56706
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## # A tibble: 6 x 9
## ons_id constituency_name first_party Con Lab LD UKIP Green
## <chr> <chr> <chr> <int> <int> <int> <int> <int>
## 1 E140005~ Barking Lab 10711 32319 599 3031 724
## 2 E140005~ Battersea Lab 22876 25292 4401 357 866
## 3 E140005~ Beckenham Con 30632 15545 4073 0 1380
## 4 E140005~ Bermondsey and Old S~ Lab 7581 31161 18189 838 639
## 5 E140005~ Bethnal Green and Bow Lab 7576 42969 2982 894 1516
## 6 E140005~ Bexleyheath and Cray~ Con 25113 16040 1201 1944 601
## # ... with 1 more variable: geometry <MULTIPOLYGON [°]>
Generating Coordinates for each Dot
Here we create a data frame with the number of dots we want plotted in each constituency for each party. Dividing total vote count by 100 means that each dot will represent 100 votes. We then apply a random rounding algorithm on the floats to avoid any systematic bias in overall dot counts. Then we plug this data into a purrr::map_df call and let it pipe it’s way to a nice tidy tibble with coordinates columns and a categorical column for the politcal party assignment of each dot. Finally we randomise the order of rows with slice, again to avoid any bias in plotting order.
It took me a while to figure how to do the final stage in one pipe. The tricky part was realising that the ‘geometry set’ produced after the st_sample stage (generation of coordinates) has the top level ‘geometry type’ of GEOMETRY, but in order for us to be able to scrape the the coordinates with the st_coordinates function, we must first simplify the geometry type to POINT with st_cast function…
# credit to Jens von Bergmann for this algo https://github.com/mountainMath/dotdensity/blob/master/R/dot-density.R
random_round <- function(x) {
v=as.integer(x)
r=x-v
test=runif(length(r), 0.0, 1.0)
add=rep(as.integer(0),length(r))
add[r>test] <- as.integer(1)
value=v+add
ifelse(is.na(value) | value<0,0,value)
return(value)
}
# data frame of number of dots to plot for each party (1 for every 100 votes)
num_dots <- as.data.frame(sf_data) %>%
select(Con:Green) %>%
mutate_all(funs(. / 100)) %>%
mutate_all(random_round)
# generates data frame with coordinates for each point + what party it is assiciated with
sf_dots <- map_df(names(num_dots),
~ st_sample(sf_data, size = num_dots[,.x], type = "random") %>% # generate the points in each polygon
st_cast("POINT") %>% # cast the geom set as 'POINT' data
st_coordinates() %>% # pull out coordinates into a matrix
as_tibble() %>% # convert to tibble
setNames(c("lon","lat")) %>% # set column names
mutate(Party = .x) # add categorical party variable
) %>%
slice(sample(1:n())) # once map_df binds rows randomise order to avoid bias in plotting order
head(sf_dots)
## # A tibble: 6 x 3
## lon lat Party
## <dbl> <dbl> <chr>
## 1 -0.0721 51.5 Lab
## 2 -0.431 51.6 Lab
## 3 -0.0686 51.6 Green
## 4 -0.0118 51.5 Lab
## 5 -0.293 51.6 Lab
## 6 -0.321 51.6 Con
We’re now ripe for plotting with ggplot2.
Visualise the Votes
Here’s my ggplot2 code for the map output. Plotting this many points on a standard sized plot image won’t be particularly insightful as there will be severe over-plotting. So play around with your image size until it’s looking good, then adjust the text and legend sizes to compensate for the enlarged plot
# colour palette for our party points
pal <- c("Con" = "#0087DC", "Lab" = "#DC241F", "LD" = "#FCBB30", "UKIP" = "#70147A", "Green" = "#78B943")
# plot it and save as png big enough to avoid over-plotting of the points
p <- ggplot() +
geom_sf(data = sf_data, fill = "transparent",colour = "white") +
geom_point(data = sf_dots, aes(lon, lat, colour = Party)) +
scale_colour_manual(values = pal) +
coord_sf(crs = 4326, datum = NA) +
theme_void(base_family = "Iosevka", base_size = 48) +
labs(x = NULL, y = NULL,
title = "UK General Election 2017\n",
subtitle = "London Constituencies\n1 dot = 100 votes",
caption = "Map by Culture of Insight @PaulCampbell91 | Data Sources: House of Commons Library, Alasdair Rae") +
guides(colour = guide_legend(override.aes = list(size = 18))) +
theme(legend.position = c(0.82, 1.03), legend.direction = "horizontal",
plot.background = element_rect(fill = "#212121", color = NA),
panel.background = element_rect(fill = "#212121", color = NA),
legend.background = element_rect(fill = "#212121", color = NA),
legend.key = element_rect(fill = "#212121", colour = NA),
plot.margin = margin(1, 1, 1, 1, "cm"),
text = element_text(color = "white"),
title = element_text(color = "white"),
plot.title = element_text(hjust = 0.5),
plot.caption = element_text(size = 32)
)
ggsave("../../static/img/party_points.png", plot = p, dpi = 320, width = 80, height = 70, units = "cm")
The results should look something like this…
IMAGE
ggplot() +
geom_sf(data = sf_data, aes(fill = first_party), colour = "white") +
scale_fill_manual(values = pal, name = "Seat Winner") +
coord_sf(crs = 4326, datum = NA) +
theme_void() +
theme(legend.position = c(0.8, 0.9), legend.direction = "horizontal")
IMAGE
What do we think is the most insightful map? Luckily we don’t have to choose one or the other, can use both! No one map will be able to give the you all the answers so I find that it’s best to combine techniques for maximum insight. The choropleth gives us a clear indication as to who won where, and the dot-density looks under the hood and gives us an idea of the count and diversity of votes within each constituency.
That’s all for now. I know I didn’t go into great detail about the code so if you have any questions or want to kick off a heated mapping debate, please do leave a comment below or catch me on twitter.
A few shout outs
Thanks to the FT data visualisation team for always inspiring with stellar maps and graphics
The Trafford Data Lab for getting in touch regarding their use of our old dot-density tutorial and their code that helped a lot in converting it to an this sf-friendly version
And Donald Trump, who’s love of a certain choropleth map has triggered the desire for better election maps everywhere
댓글