Scaling Your Analysis w Functions and Loops
Building Community Map Files
If you recall from CPP 526 we discussed the example where Ben Balter, GitHub’s official government evangelist, created a project to make Washington DC open GIS files more accessible and useful by converting them all to a format more amenable to open-source projects (geoJSON files).
Ben wrote a script that downloaded all of Washington DC’s open data files, converted them to better formats, then uploaded them to GitHub so others have access:
https://github.com/benbalter/dc-maps
The geoJSON files can also be read into R directly from GitHub, making it easy to incorporate the spatial maps and data into a wide variety of projects:
library( geojsonio )
library( sp )
github <- "https://raw.githubusercontent.com/benbalter/dc-maps/master/maps/2006-traffic-volume.geojson"
traffic <- geojson_read( x=github, what="sp" )
plot( traffic, col="steelblue" )
Dorling Lab from CPP 529
Recall the lab where you created one Dorling cartogram for your neighborhood clustering project:
library( geojsonio ) # read shapefiles
library( sp ) # work with shapefiles
library( sf ) # work with shapefiles - simple features format
library( tmap ) # theme maps
library( dplyr ) # data wrangling
library( pander ) # nice tables
crosswalk <- "https://raw.githubusercontent.com/DS4PS/cpp-529-master/master/data/cbsatocountycrosswalk.csv"
crosswalk <- read.csv( crosswalk, stringsAsFactors=F, colClasses="character" )
# search for citie names by strings, use the ^ anchor for "begins with"
grep( "^MIN", crosswalk$msaname, value=TRUE )
# select all FIPS for Minneapolis
these.minneapolis <- crosswalk$msaname == "MINNEAPOLIS-ST. PAUL, MN-WI"
these.fips <- crosswalk$fipscounty[ these.minneapolis ]
these.fips <- na.omit( these.fips )
state.fips <- substr( these.fips, 1, 2 )
county.fips <- substr( these.fips, 3, 5 )
dat <- data.frame( name="MINNEAPOLIS-ST. PAUL, MN-WI",
state.fips, county.fips, fips=these.fips )
dat
name | state.fips | county.fips | fips |
---|---|---|---|
MINNEAPOLIS-ST. PAUL, MN-WI | 27 | 003 | 27003 |
MINNEAPOLIS-ST. PAUL, MN-WI | 27 | 019 | 27019 |
MINNEAPOLIS-ST. PAUL, MN-WI | 27 | 025 | 27025 |
MINNEAPOLIS-ST. PAUL, MN-WI | 27 | 037 | 27037 |
MINNEAPOLIS-ST. PAUL, MN-WI | 27 | 053 | 27053 |
MINNEAPOLIS-ST. PAUL, MN-WI | 27 | 059 | 27059 |
MINNEAPOLIS-ST. PAUL, MN-WI | 27 | 123 | 27123 |
MINNEAPOLIS-ST. PAUL, MN-WI | 27 | 139 | 27139 |
MINNEAPOLIS-ST. PAUL, MN-WI | 27 | 141 | 27141 |
MINNEAPOLIS-ST. PAUL, MN-WI | 27 | 163 | 27163 |
MINNEAPOLIS-ST. PAUL, MN-WI | 27 | 171 | 27171 |
MINNEAPOLIS-ST. PAUL, MN-WI | 55 | 093 | 55093 |
MINNEAPOLIS-ST. PAUL, MN-WI | 55 | 109 | 55109 |
Now download shapefiles with Census data:
library( tidycensus )
# census_api_key("YOUR KEY GOES HERE")
# key <- "abc123"
# census_api_key( key )
# Minneapolis metro area spans two states -
# Minnesota = 27
# Wisconsin = 55
msp.pop1 <-
get_acs( geography = "tract", variables = "B01003_001",
state = "27", county = county.fips[state.fips=="27"], geometry = TRUE ) %>%
select( GEOID, estimate ) %>%
rename( POP=estimate )
msp.pop2 <-
get_acs( geography = "tract", variables = "B01003_001",
state = "55", county = county.fips[state.fips=="55"], geometry = TRUE ) %>%
select( GEOID, estimate ) %>%
rename( POP=estimate )
msp.pop <- rbind( msp.pop1, msp.pop2 )
plot( msp.pop )
Convert to a Dorling cartogram:
# convert sf map object to an sp version
msp.sp <- as_Spatial( msp )
class( msp.sp )
# project map and remove empty tracts
msp.sp <- spTransform( msp.sp, CRS("+init=epsg:3395"))
msp.sp <- msp.sp[ msp.sp$POP != 0 & (! is.na( msp.sp$POP )) , ]
# convert census tract polygons to dorling cartogram
# no idea why k=0.03 works, but it does - default is k=5
msp.sp$pop.w <- msp.sp$POP / 9000 # max(msp.sp$POP) # standardizes it to max of 1.5
msp_dorling <- cartogram_dorling( x=msp.sp, weight="pop.w", k=0.05 )
plot( msp_dorling )
Instructions:
- Create an R script that will convert all US Metro Area shapefiles into Dorling cartograms, one new shapefile for each metro area.
- Save each Dorling cartogram as a geoJSON file.
- Create a dorling-msa-geojson GitHub repository.
- Upload the files and add instructions to the README for people to use them as alternatives to regular Census tract maps to improve the visualization of demographic data in urban environments.
For example, once you have finished it will be possible to do the following:
# dorling cartogram of Phoenix Census Tracts
github.url <- "https://raw.githubusercontent.com/DS4PS/cpp-529-master/master/data/phx_dorling.geojson"
phx <- geojson_read( x=github.url, what="sp" )
plot( phx )
Start with pseudo-code and write down the steps. I would recommend writing a couple of functions:
- Select and parse state and county FIPS codes based upon a city name, return a data frame.
- Using the MSA data frame you just created, download the census data and shapefile.
- Convert a current MSA object to a Dorling cartogram object.
Test your code with a single city until it is functional:
these.minneapolis <- crosswalk$msaname == "MINNEAPOLIS-ST. PAUL, MN-WI"
At that point you can scale your steps by generalizing the city name.
city.names <- unique( crosswalk$cbsaname )
for( i in city.names )
{
# your code here
}