Introduction
For this blog post, I decided to try to find a dataset covering an issue I feel quite strongly about - homelessness. I managed to find a fairly large dataset from the Cambridgeshire Insight website.
For a while I’ve wanted to try out R’s mapping potential and hopefully generate a heatmap, so I’ve deliberately tried to find a dataset where I can try this out. It’s worth saying that this activity has been the most difficult and frustrating project I’ve taken on by far. It’s taken me 6 or 7 sessions to produce this blog, in which the first was me trying to install gganimate
(which I ended up not using) and figuring out where to start with mapping.
Data wrangling
Let’s load the required packages and read the data in:
library(tidyverse)
## ── Attaching packages ────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.0 ✔ purrr 0.3.0
## ✔ tibble 2.0.1 ✔ dplyr 0.7.8
## ✔ tidyr 0.8.2 ✔ stringr 1.3.1
## ✔ readr 1.3.1 ✔ forcats 0.3.0
## ── Conflicts ───────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(gifski)
library(sf)
## Linking to GEOS 3.6.2, GDAL 2.2.3, PROJ 4.9.3
data <- read_csv("https://data.cambridgeshireinsight.org.uk/sites/default/files/P1E-national-homelessness-CLG-tab784-to-201718-csv.csv")
## Parsed with column specification:
## cols(
## .default = col_double(),
## `ONS code` = col_character(),
## `Local authority area` = col_character(),
## Region = col_character(),
## `2009/10: Numbers accepted as homeless and in priority need who are White` = col_character(),
## `2009/10: Numbers accepted as homeless and in priority need who are Black or Black British` = col_character(),
## `2009/10: Numbers accepted as homeless and in priority need who are Asian or Asian British` = col_character(),
## `2009/10: Numbers accepted as homeless and in priority need who are Mixed` = col_character(),
## `2009/10: Numbers accepted as homeless and in priority need who are Other ethnic origin` = col_character(),
## `2009/10: Numbers accepted as homeless and in priority need who are Ethnic Group not Stated` = col_character(),
## `2009/10: Total decisions where eligible homeless & in priority need but intentionally` = col_character(),
## `2009/10: Total decisions where eligible & homeless but not in priority need` = col_character(),
## `2009/10: Total decisions where eligible but not homeless` = col_character(),
## `2009/10: Total homelessness decisions` = col_character(),
## `31 March 2010: Total households in B&B (including shared annex)` = col_character(),
## `31 March 2010: Total households in hostels` = col_character(),
## `31 March 2010: Total households in LA/HA stock` = col_character(),
## `31 March 2010: Total households in private sector leased (by LA or HA)` = col_character(),
## `31 March 2010: Total households in other temp (including private landlord)` = col_character(),
## `2015/16: Numbers accepted as homeless and in priority need who are White` = col_character(),
## `2015/16: Numbers accepted as homeless and in priority need who are Black or Black British` = col_character()
## # ... with 58 more columns
## )
## See spec(...) for full column specifications.
names(data)
## [1] "ONS code"
## [2] "Local authority area"
## [3] "Region"
## [4] "2009/10: Thousands of households 2006 mid-year estimate"
## [5] "2009/10: Numbers accepted as homeless and in priority need who are White"
## [6] "2009/10: Numbers accepted as homeless and in priority need who are Black or Black British"
## [7] "2009/10: Numbers accepted as homeless and in priority need who are Asian or Asian British"
## [8] "2009/10: Numbers accepted as homeless and in priority need who are Mixed"
## [9] "2009/10: Numbers accepted as homeless and in priority need who are Other ethnic origin"
## [10] "2009/10: Numbers accepted as homeless and in priority need who are Ethnic Group not Stated"
## [11] "2009/10: Numbers accepted as homeless and in priority need total"
## [12] "2009/10: Number accepted per 1000 households"
## [13] "2009/10: Total decisions where eligible homeless & in priority need but intentionally"
## [14] "2009/10: Total decisions where eligible & homeless but not in priority need"
## [15] "2009/10: Total decisions where eligible but not homeless"
## [16] "2009/10: Total homelessness decisions"
## [17] "31 March 2010: Total households in B&B (including shared annex)"
## [18] "31 March 2010: Total households in hostels"
## [19] "31 March 2010: Total households in LA/HA stock"
## [20] "31 March 2010: Total households in private sector leased (by LA or HA)"
## [21] "31 March 2010: Total households in other temp (including private landlord)"
## [22] "31 March 2010: Total households in temporary accommodation"
## [23] "31 March 2010: Number in temp per 1000 households"
## [24] "2009/10: Duty owed but no accommodation has been secured at end of March 2010"
## [25] "2010/11: Thousands of households 2008 mid-year estimate"
## [26] "2010/11: Numbers accepted as homeless and in priority need who are White"
## [27] "2010/11: Numbers accepted as homeless and in priority need who are Black or Black British"
## [28] "2010/11: Numbers accepted as homeless and in priority need who are Asian or Asian British"
## [29] "2010/11: Numbers accepted as homeless and in priority need who are Mixed"
## [30] "2010/11: Numbers accepted as homeless and in priority need who are Other ethnic origin"
## [31] "2010/11: Numbers accepted as homeless and in priority need who are Ethnic Group not Stated"
## [32] "2010/11: Numbers accepted as homeless and in priority need total"
## [33] "2010/11: Number accepted per 1000 households"
## [34] "2010/11: Total decisions where eligible homeless & in priority need but intentionally"
## [35] "2010/11: Total decisions where eligible & homeless but not in priority need"
## [36] "2010/11: Total decisions where eligible but not homeless"
## [37] "2010/11: Total homelessness decisions"
## [38] "31 March 2011: Total households in B&B (including shared annex)"
## [39] "31 March 2011: Total households in hostels"
## [40] "31 March 2011: Total households in LA/HA stock"
## [41] "31 March 2011: Total households in private sector leased (by LA or HA)"
## [42] "31 March 2011: Total households in other temp (including private landlord)"
## [43] "31 March 2011: Total households in temporary accommodation"
## [44] "31 March 2011: Number in temp per 1000 households"
## [45] "2010/11: Duty owed but no accommodation has been secured at end of March 2011"
## [46] "2011/12: Thousands of households 2008 mid-year estimate"
## [47] "2011/12: Numbers accepted as homeless and in priority need who are White"
## [48] "2011/12: Numbers accepted as homeless and in priority need who are Black or Black British"
## [49] "2011/12: Numbers accepted as homeless and in priority need who are Asian or Asian British"
## [50] "2011/12: Numbers accepted as homeless and in priority need who are Mixed"
## [51] "2011/12: Numbers accepted as homeless and in priority need who are Other ethnic origin"
## [52] "2011/12: Numbers accepted as homeless and in priority need who are Ethnic Group not Stated"
## [53] "2011/12: Numbers accepted as homeless and in priority need total"
## [54] "2011/12: Number accepted per 1000 households"
## [55] "2011/12: Total decisions where eligible homeless & in priority need but intentionally"
## [56] "2011/12: Total decisions where eligible & homeless but not in priority need"
## [57] "2011/12: Total decisions where eligible but not homeless"
## [58] "2011/12: Total homelessness decisions"
## [59] "31 March 2012: Total households in B&B (including shared annex)"
## [60] "31 March 2012: Total households in hostels"
## [61] "31 March 2012: Total households in LA/HA stock"
## [62] "31 March 2012: Total households in private sector leased (by LA or HA)"
## [63] "31 March 2012: Total households in other temp (including private landlord)"
## [64] "31 March 2012: Total households in temporary accommodation"
## [65] "31 March 2012: Number in temp per 1000 households"
## [66] "2011/12: Duty owed but no accommodation has been secured at end of March 2012"
## [67] "2012/13: Thousands of households 2008-based interim projections for 2012"
## [68] "2012/13: Numbers accepted as homeless and in priority need who are White"
## [69] "2012/13: Numbers accepted as homeless and in priority need who are Black or Black British"
## [70] "2012/13: Numbers accepted as homeless and in priority need who are Asian or Asian British"
## [71] "2012/13: Numbers accepted as homeless and in priority need who are Mixed"
## [72] "2012/13: Numbers accepted as homeless and in priority need who are Other ethnic origin"
## [73] "2012/13: Numbers accepted as homeless and in priority need who are Ethnic Group not Stated"
## [74] "2012/13: Numbers accepted as homeless and in priority need total"
## [75] "2012/13: Number accepted per 1000 households"
## [76] "2012/13: Total decisions where eligible homeless & in priority need but intentionally"
## [77] "2012/13: Total decisions where eligible & homeless but not in priority need"
## [78] "2012/13: Total decisions where eligible but not homeless"
## [79] "2012/13: Total homelessness decisions"
## [80] "31 March 2013: Total households in B&B (including shared annex)"
## [81] "31 March 2013: Total households in hostels"
## [82] "31 March 2013: Total households in LA/HA stock"
## [83] "31 March 2013: Total households in private sector leased (by LA or HA)"
## [84] "31 March 2013: Total households in other temp (including private landlord)"
## [85] "31 March 2013: Total households in temporary accommodation"
## [86] "31 March 2013: Number in temp per 1000 households"
## [87] "2012/13: Duty owed but no accommodation has been secured at end of March 2013"
## [88] "2013/14: Thousands of households 2012-based interim projections for 2013"
## [89] "2013/14: Numbers accepted as homeless and in priority need who are White"
## [90] "2013/14: Numbers accepted as homeless and in priority need who are Black or Black British"
## [91] "2013/14: Numbers accepted as homeless and in priority need who are Asian or Asian British"
## [92] "2013/14: Numbers accepted as homeless and in priority need who are Mixed"
## [93] "2013/14: Numbers accepted as homeless and in priority need who are Other ethnic origin"
## [94] "2013/14: Numbers accepted as homeless and in priority need who are Ethnic Group not Stated"
## [95] "2013/14: Numbers accepted as homeless and in priority need total"
## [96] "2013/14: Number accepted per 1000 households"
## [97] "2013/14: Total decisions where eligible homeless & in priority need but intentionally"
## [98] "2013/14: Total decisions where eligible & homeless but not in priority need"
## [99] "2013/14: Total decisions where eligible but not homeless"
## [100] "2013/14: Total homelessness decisions"
## [101] "31 March 2014: Total households in B&B (including shared annex)"
## [102] "31 March 2014: Total households in hostels"
## [103] "31 March 2014: Total households in LA/HA stock"
## [104] "31 March 2014: Total households in private sector leased (by LA or HA)"
## [105] "31 March 2014: Total households in other temp (including private landlord)"
## [106] "31 March 2014: Total households in temporary accommodation"
## [107] "31 March 2014: Number in temp per 1000 households"
## [108] "2013/14: Duty owed but no accommodation has been secured at end of March 2014"
## [109] "2014/15: Thousands of households 2012-based interim projections for 2014"
## [110] "2014/15: Numbers accepted as homeless and in priority need who are White"
## [111] "2014/15: Numbers accepted as homeless and in priority need who are Black or Black British"
## [112] "2014/15: Numbers accepted as homeless and in priority need who are Asian or Asian British"
## [113] "2014/15: Numbers accepted as homeless and in priority need who are Mixed"
## [114] "2014/15: Numbers accepted as homeless and in priority need who are Other ethnic origin"
## [115] "2014/15: Numbers accepted as homeless and in priority need who are Ethnic Group not Stated"
## [116] "2014/15: Numbers accepted as homeless and in priority need total"
## [117] "2014/15: Number accepted per 1000 households"
## [118] "2014/15: Total decisions where eligible homeless & in priority need but intentionally"
## [119] "2014/15: Total decisions where eligible & homeless but not in priority need"
## [120] "2014/15: Total decisions where eligible but not homeless"
## [121] "2014/15: Total homelessness decisions"
## [122] "31 March 2015: Total households in B&B (including shared annex)"
## [123] "31 March 2015: Total households in hostels"
## [124] "31 March 2015: Total households in LA/HA stock"
## [125] "31 March 2015: Total households in private sector leased (by LA or HA)"
## [126] "31 March 2015: Total households in other temp (including private landlord)"
## [127] "31 March 2015: Total households in temporary accommodation"
## [128] "31 March 2015: Number in temp per 1000 households"
## [129] "2014/15: Duty owed but no accommodation has been secured at end of March 2015"
## [130] "2015/16: Thousands of households 2012-based interim projections for 2015"
## [131] "2015/16: Numbers accepted as homeless and in priority need who are White"
## [132] "2015/16: Numbers accepted as homeless and in priority need who are Black or Black British"
## [133] "2015/16: Numbers accepted as homeless and in priority need who are Asian or Asian British"
## [134] "2015/16: Numbers accepted as homeless and in priority need who are Mixed"
## [135] "2015/16: Numbers accepted as homeless and in priority need who are Other ethnic origin"
## [136] "2015/16: Numbers accepted as homeless and in priority need who are Ethnic Group not Stated"
## [137] "2015/16: Numbers accepted as homeless and in priority need total"
## [138] "2015/16: Number accepted per 1000 households"
## [139] "2015/16: Total decisions where eligible homeless & in priority need but intentionally"
## [140] "2015/16: Total decisions where eligible & homeless but not in priority need"
## [141] "2015/16: Total decisions where eligible but not homeless"
## [142] "2015/16: Total homelessness decisions"
## [143] "31 March 2016: Total households in B&B (including shared annex)"
## [144] "31 March 2016: Total households in hostels"
## [145] "31 March 2016: Total households in LA/HA stock"
## [146] "31 March 2016: Total households in private sector leased (by LA or HA)"
## [147] "31 March 2016: Total households in other temp (including private landlord)"
## [148] "31 March 2016: Total households in temporary accommodation"
## [149] "31 March 2016: Number in temp per 1000 households"
## [150] "2015/16: Duty owed but no accommodation has been secured at end of March 2015"
## [151] "2016/17: Thousands of households 2012-based interim projections for 2016"
## [152] "2016/17: Numbers accepted as homeless and in priority need who are White"
## [153] "2016/17: Numbers accepted as homeless and in priority need who are Black or Black British"
## [154] "2016/17: Numbers accepted as homeless and in priority need who are Asian or Asian British"
## [155] "2016/17: Numbers accepted as homeless and in priority need who are Mixed"
## [156] "2016/17: Numbers accepted as homeless and in priority need who are Other ethnic origin"
## [157] "2016/17: Numbers accepted as homeless and in priority need who are Ethnic Group not Stated"
## [158] "2016/17: Numbers accepted as homeless and in priority need total"
## [159] "2016/17: Number accepted per 1000 households"
## [160] "2016/17: Total decisions where eligible homeless & in priority need but intentionally"
## [161] "2016/17: Total decisions where eligible & homeless but not in priority need"
## [162] "2016/17: Total decisions where eligible but not homeless"
## [163] "2016/17: Total homelessness decisions"
## [164] "31 March 2017: Total households in B&B (including shared annex)"
## [165] "31 March 2017: Total households in hostels"
## [166] "31 March 2017: Total households in LA/HA stock"
## [167] "31 March 2017: Total households in private sector leased (by LA or HA)"
## [168] "31 March 2017: Total households in other temp (including private landlord)"
## [169] "31 March 2017: Total households in temporary accommodation"
## [170] "31 March 2017: Number in temp per 1000 households"
## [171] "2016/17: Duty owed but no accommodation has been secured at end of March 2017"
## [172] "2017/18: Thousands of households 2012-based interim projections for 2017"
## [173] "2017/18: Numbers accepted as homeless and in priority need who are White"
## [174] "2017/18: Numbers accepted as homeless and in priority need who are Black or Black British"
## [175] "2017/18: Numbers accepted as homeless and in priority need who are Asian or Asian British"
## [176] "2017/18: Numbers accepted as homeless and in priority need who are Mixed"
## [177] "2017/18: Numbers accepted as homeless and in priority need who are Other ethnic origin"
## [178] "2017/18: Numbers accepted as homeless and in priority need who are Ethnic Group not Stated"
## [179] "2017/18: Numbers accepted as homeless and in priority need total"
## [180] "2017/18: Number accepted per 1000 households"
## [181] "2017/18: Total decisions where eligible homeless & in priority need but intentionally"
## [182] "2017/18: Total decisions where eligible & homeless but not in priority need"
## [183] "2017/18: Total decisions where eligible but not homeless"
## [184] "2017/18: Total homelessness decisions"
## [185] "31 March 2018: Total households in B&B (including shared annex)"
## [186] "31 March 2018: Total households in hostels"
## [187] "31 March 2018: Total households in LA/HA stock"
## [188] "31 March 2018: Total households in private sector leased (by LA or HA)"
## [189] "31 March 2018: Total households in other temp (including private landlord)"
## [190] "31 March 2018: Total households in temporary accommodation"
## [191] "31 March 2018: Number in temp per 1000 households"
## [192] "2017/18: Duty owed but no accommodation has been secured at end of March 2018"
The first thing to do is to try to hone in on some data I’d like to use. A quick scan of the columns and the “Local authority area” looks critical, and I’d like to see if I have yearly data for “Numbers accepted as homeless and in priority need total”:
ind <- str_detect(names(data), "priority need total")
names(data)[ind]
## [1] "2009/10: Numbers accepted as homeless and in priority need total"
## [2] "2010/11: Numbers accepted as homeless and in priority need total"
## [3] "2011/12: Numbers accepted as homeless and in priority need total"
## [4] "2012/13: Numbers accepted as homeless and in priority need total"
## [5] "2013/14: Numbers accepted as homeless and in priority need total"
## [6] "2014/15: Numbers accepted as homeless and in priority need total"
## [7] "2015/16: Numbers accepted as homeless and in priority need total"
## [8] "2016/17: Numbers accepted as homeless and in priority need total"
## [9] "2017/18: Numbers accepted as homeless and in priority need total"
This looks to fit the bill. Now I’ve honed in on the columns I need, let’s have a look at the structure and distribution of the data:
data_trim <- data %>% select(2, names(data)[ind])
str(data_trim, give.attr = FALSE)
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 65447 obs. of 10 variables:
## $ Local authority area : chr "ENGLAND" "Adur" "Allerdale" "Amber Valley" ...
## $ 2009/10: Numbers accepted as homeless and in priority need total: num 40020 71 102 30 52 ...
## $ 2010/11: Numbers accepted as homeless and in priority need total: num 44160 90 104 46 79 ...
## $ 2011/12: Numbers accepted as homeless and in priority need total: num 50290 58 63 53 100 ...
## $ 2012/13: Numbers accepted as homeless and in priority need total: num 53770 37 41 61 129 ...
## $ 2013/14: Numbers accepted as homeless and in priority need total: num 52290 10 26 64 109 ...
## $ 2014/15: Numbers accepted as homeless and in priority need total: num 54430 7 30 117 191 ...
## $ 2015/16: Numbers accepted as homeless and in priority need total: chr "57740" "16" "32" "101" ...
## $ 2016/17: Numbers accepted as homeless and in priority need total: chr "59110" "31" "17" "81" ...
## $ 2017/18: Numbers accepted as homeless and in priority need total: chr "56580" "38" "22" "76" ...
summary(data_trim)
## Local authority area
## Length:65447
## Class :character
## Mode :character
##
##
##
##
## 2009/10: Numbers accepted as homeless and in priority need total
## Min. : 1.0
## 1st Qu.: 30.0
## Median : 63.0
## Mean : 244.8
## 3rd Qu.: 136.0
## Max. :40020.0
## NA's :65120
## 2010/11: Numbers accepted as homeless and in priority need total
## Min. : 1.0
## 1st Qu.: 36.5
## Median : 73.0
## Mean : 270.1
## 3rd Qu.: 149.0
## Max. :44160.0
## NA's :65120
## 2011/12: Numbers accepted as homeless and in priority need total
## Min. : 0.0
## 1st Qu.: 41.0
## Median : 85.0
## Mean : 307.6
## 3rd Qu.: 168.0
## Max. :50290.0
## NA's :65120
## 2012/13: Numbers accepted as homeless and in priority need total
## Min. : 0.0
## 1st Qu.: 38.0
## Median : 78.0
## Mean : 326.4
## 3rd Qu.: 178.5
## Max. :53770.0
## NA's :65120
## 2013/14: Numbers accepted as homeless and in priority need total
## Min. : 0.0
## 1st Qu.: 38.5
## Median : 82.0
## Mean : 319.8
## 3rd Qu.: 174.5
## Max. :52290.0
## NA's :65120
## 2014/15: Numbers accepted as homeless and in priority need total
## Min. : 0.0
## 1st Qu.: 39.0
## Median : 87.0
## Mean : 332.9
## 3rd Qu.: 185.0
## Max. :54430.0
## NA's :65120
## 2015/16: Numbers accepted as homeless and in priority need total
## Length:65447
## Class :character
## Mode :character
##
##
##
##
## 2016/17: Numbers accepted as homeless and in priority need total
## Length:65447
## Class :character
## Mode :character
##
##
##
##
## 2017/18: Numbers accepted as homeless and in priority need total
## Length:65447
## Class :character
## Mode :character
##
##
##
##
I can see that apart from the annoyingly long column names, I seem to have the totals for the whole of England in the first row. So let’s fix these issues:
data_trim <- data_trim %>%
slice(-1) %>%
set_names("LAA", 2009:2017)
head(data_trim, 20)
## # A tibble: 20 x 10
## LAA `2009` `2010` `2011` `2012` `2013` `2014` `2015` `2016` `2017`
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <chr>
## 1 Adur 71 90 58 37 10 7 16 31 38
## 2 Allerdale 102 104 63 41 26 30 32 17 22
## 3 Amber Va… 30 46 53 61 64 117 101 81 76
## 4 Arun 52 79 100 129 109 191 228 220 204
## 5 Ashfield 42 25 16 26 85 87 93 100 123
## 6 Ashford 178 194 161 199 166 152 154 136 160
## 7 Aylesbur… 93 112 126 133 116 160 177 161 150
## 8 Babergh 37 46 78 100 86 86 94 61 72
## 9 Barking … 232 221 199 664 853 764 941 543 512
## 10 Barnet 232 251 339 595 674 677 422 640 444
## 11 Barnsley 95 56 38 23 14 13 14 15 41
## 12 Barrow-i… 40 26 29 29 19 17 18 11 13
## 13 Basildon 191 232 255 282 302 351 208 180 202
## 14 Basingst… 1 1 2 11 22 54 46 107 81
## 15 Bassetlaw 18 27 48 75 41 91 65 84 77
## 16 Bath and… 68 100 86 86 65 48 68 86 84
## 17 Bedford … 141 107 211 242 174 164 287 252 224
## 18 Bexley 128 204 346 349 420 498 483 508 500
## 19 Birmingh… 3371 4207 3929 3957 3160 3140 3524 3479 3386
## 20 Blaby 2 7 2 1 0 6 11 17 32
That’s looking a bit better. I notice that there seems to be a stray “UA” at the end of some LAAs. From the output of the summary()
function above, I can also see that the 2015-2017 columns seem to have been parsed as a character, so there’s probably some non-numeric character in there. Let’s see how many places these issues affect:
data_trim %>% filter(str_detect(LAA, " UA")) %>% select(LAA)
## # A tibble: 56 x 1
## LAA
## <chr>
## 1 Bath and North East Somerset UA
## 2 Bedford UA
## 3 Blackburn with Darwen UA
## 4 Blackpool UA
## 5 Bournemouth UA
## 6 Bracknell Forest UA
## 7 Brighton and Hove UA
## 8 Bristol City of UA
## 9 Central Bedfordshire UA
## 10 Cheshire East UA
## # … with 46 more rows
data_trim %>% filter(str_detect(`2015`, "[^0-9]+")) %>% select(LAA, `2015`)
## # A tibble: 5 x 2
## LAA `2015`
## <chr> <chr>
## 1 Chorley -
## 2 Eden -
## 3 Hyndburn -
## 4 Isles of Scilly UA -
## 5 Waverley -
56 place names ending in “UA” and five places without data in 2015! Let’s update our trimmed data to fix these issues, and make the data tidy by gathering the year headers into their own column:
data_tidy <- data_trim %>%
mutate(LAA = str_replace(LAA, " UA", "")) %>%
mutate(`2015` = str_replace(`2015`, "-", NA_character_) %>% as.integer()) %>%
mutate(`2016` = str_replace(`2016`, "-", NA_character_) %>% as.integer()) %>%
mutate(`2017` = str_replace(`2017`, "-", NA_character_) %>% as.integer()) %>%
gather(year, num_homeless, -LAA) %>%
mutate(year = as.integer(year))
str(data_tidy)
## Classes 'tbl_df', 'tbl' and 'data.frame': 589014 obs. of 3 variables:
## $ LAA : chr "Adur" "Allerdale" "Amber Valley" "Arun" ...
## $ year : int 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 ...
## $ num_homeless: num 71 102 30 52 42 178 93 37 232 232 ...
Initial analysis
Now I have the data in a more manageable format, let’s quickly plot the top 6 homelessness figures in each year:
data_tidy %>%
group_by(year) %>%
arrange(year, desc(num_homeless)) %>%
top_n(6) %>%
ggplot(aes(x = LAA, y = num_homeless)) +
geom_bar(stat = "identity") +
coord_flip() +
facet_wrap(~ year, ncol=2, scales="free_y")
## Selecting by num_homeless
We can see that Birmingham is by far the worst offender. I’m not sure of the accuracy of these figures, but if true that is truly horrifying and it hadn’t seemed to have got any better up to 2017. Which areas have seen the most drastic improvement/deterioration over the 8 years?:
extremes <- data_tidy %>%
drop_na() %>%
filter(year %in% c(2009, 2017)) %>%
group_by(LAA) %>%
mutate(homeless2009 = lag(num_homeless),
change = num_homeless - homeless2009) %>%
ungroup() %>%
drop_na() %>%
arrange(change)
bind_rows(head(extremes, 8), tail(extremes, 8))
## # A tibble: 16 x 5
## LAA year num_homeless homeless2009 change
## <chr> <int> <dbl> <dbl> <dbl>
## 1 Sheffield 2017 481 946 -465
## 2 North Tyneside 2017 179 502 -323
## 3 Tower Hamlets 2017 437 690 -253
## 4 Hillingdon 2017 264 452 -188
## 5 Lambeth 2017 467 625 -158
## 6 Herefordshire County of 2017 53 201 -148
## 7 Gateshead 2017 219 365 -146
## 8 Leeds 2017 281 427 -146
## 9 Bexley 2017 500 128 372
## 10 Wandsworth 2017 822 426 396
## 11 Bristol City of 2017 721 285 436
## 12 Kensington and Chelsea 2017 709 255 454
## 13 Enfield 2017 786 241 545
## 14 Milton Keynes 2017 679 84 595
## 15 Manchester 2017 1222 482 740
## 16 Newham 2017 1143 97 1046
Sheffield was the most improved with a reduction of 465, with Newham seeing a massive increase of over 1000.
The painful part
So having never done any geospatial analysis or mapping before, I tried doing some Google searches to see if I could find any code I could use. I quickly discovered that if I was going to do any mapping of UK regions, I was going to need to access some shape files.
I managed to download some from the UK Data Service website. I also had enormous trouble getting the function to read the data from within this blog post, but I managed to make it work using the here
package, which I’ve since heard good things about on Twitter.
shapes <- st_read(dsn = paste(here::here(),"content/post/data/homelessness/BoundaryData", sep="/"), layer = "infuse_dist_lyr_2011") %>% arrange(name)
## Reading layer `infuse_dist_lyr_2011' from data source `/home/jamie/Documents/R/r-house/content/post/data/homelessness/BoundaryData' using driver `ESRI Shapefile'
## Simple feature collection with 324 features and 5 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 82643.6 ymin: 5333.602 xmax: 655989 ymax: 657599.5
## epsg (SRID): NA
## proj4string: +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=m +no_defs
str(shapes)
## Classes 'sf' and 'data.frame': 324 obs. of 6 variables:
## $ name : Factor w/ 324 levels "Adur","Allerdale",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ label : Factor w/ 324 levels "E92000001E06000001",..: 243 64 70 244 195 136 55 220 292 293 ...
## $ geo_labelw: Factor w/ 0 levels: NA NA NA NA NA NA NA NA NA NA ...
## $ geo_label : Factor w/ 324 levels "Adur","Allerdale",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ geo_code : Factor w/ 324 levels "E06000001","E06000002",..: 243 64 70 244 195 136 55 220 292 293 ...
## $ geometry :sfc_MULTIPOLYGON of length 324; first list element: List of 1
## ..$ :List of 1
## .. ..$ : num [1:2718, 1:2] 515970 515950 515901 515901 515855 ...
## ..- attr(*, "class")= chr "XY" "MULTIPOLYGON" "sfg"
## - attr(*, "sf_column")= chr "geometry"
## - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA
## ..- attr(*, "names")= chr "name" "label" "geo_labelw" "geo_label" ...
With the intent of joining my dataframes together, I identified an inconsistency in the areas given in each table (diff()
is a very handy function!):
n_distinct(data_tidy$LAA)
## [1] 327
n_distinct(shapes$name)
## [1] 324
data_diff <- setdiff(data_tidy$LAA, shapes$name)
shapes_diff <- setdiff(shapes$name, data_tidy$LAA)
data_frame(data = data_diff,
shapes = c(shapes_diff,"","",""))
## Warning: `data_frame()` is deprecated, use `tibble()`.
## This warning is displayed once per session.
## # A tibble: 10 x 2
## data shapes
## <chr> <chr>
## 1 Bristol City of Bristol, City of
## 2 City of London City of London,Westminster
## 3 Cornwall Cornwall,Isles of Scilly
## 4 Durham County Durham
## 5 Herefordshire County of Herefordshire, County of
## 6 Isles of Scilly Kingston upon Hull, City of
## 7 Kingston upon Hull City of St. Helens
## 8 St Helens ""
## 9 Westminster ""
## 10 <NA> ""
You can see from the output above that my homelessness data has split out Westminster from the City of London, and the Isles of Scilly from Cornwall. There are also some grammatical inconsistencies that need to be sorted out. Let’s clean it up, by combining rows
data_final <- data_tidy %>%
#mutate_at(vars("year", "num_homeless"), as.numeric) %>%
mutate(LAA = ifelse(LAA %in% c("City of London","Westminster"),
"City of London,Westminster",
LAA)) %>%
mutate(LAA = ifelse(LAA %in% c("Cornwall","Isles of Scilly"),
"Cornwall,Isles of Scilly",
LAA)) %>%
mutate(LAA = ifelse(LAA == "Bristol City of","Bristol, City of",LAA)) %>%
mutate(LAA = ifelse(LAA == "Durham","County Durham",LAA)) %>%
mutate(LAA = ifelse(LAA == "Herefordshire County of","Herefordshire, County of",LAA)) %>%
mutate(LAA = ifelse(LAA == "Kingston upon Hull City of","Kingston upon Hull, City of",LAA)) %>%
mutate(LAA = ifelse(LAA == "St Helens","St. Helens",LAA)) %>%
mutate(LAA = ifelse(LAA == "St. Albans","St Albans",LAA)) %>%
mutate(LAA = ifelse(LAA == "St. Edmundsbury","St Edmundsbury",LAA)) %>%
mutate(LAA = as.factor(LAA)) %>%
group_by(LAA, year) %>%
summarise(total_homeless = sum(num_homeless)) %>%
ungroup()
Next, I created a function to take a year and a set of regions and generate a heatmap. This function filters the homelessness data, joins it with the shape data, and then plots the data. I’ve included regions
as an argument so that Birmingham can be filtered out, as it dominates the heatmap.
heatmap <- function(inp_year, regions) {
data_joined <- data_final %>%
filter(year==inp_year) %>%
filter(LAA %in% regions) %>%
right_join(shapes, by = c("LAA"="name"))
max_scale <- max(data_final %>%
filter(LAA %in% regions) %>%
select(total_homeless), na.rm=TRUE)
p <- ggplot() +
geom_sf(data=data_joined, aes(fill=total_homeless), col="black") +
theme_void() + coord_sf(datum=NA) +
scale_fill_viridis_c(name = NULL, option = "magma",
limits = c(0, max_scale),
breaks = c(0, max_scale/2, max_scale)) +
labs(title = paste0("Total number of people accepted as homeless and in priority need in England in ",inp_year),
caption = "Data obtained from http://opendata.cambridgeshireinsight.org.uk/dataset/homelessness-england")
print(p)
}
regions_to_include <- unique(setdiff(data_final$LAA, "Birmingham"))
save_gif(walk(min(data_final$year):max(data_final$year), heatmap, regions = regions_to_include),
delay = 0.7, gif_file = "animation.gif")

Homelessness heatmap
I certainly feel this project has been a bit of a hack job. It’s taken me over a month to write because it’s been so challenging and I’ve had to leave and come back to it so many times. I’m not proud of it, mainly because I rushed it at the end because I just wanted it done.
I’ve since used Tableau, and that seems a bit easier to do heatmaps. If I were to do it again in R however, I think I’ll be taking the courses on DataCamp first!