Overview

In this lab you will practice working with the US census data searching for variables, loading them into R, performing basic calculations and data wrangling tasks, visualizing the data with figures and maps, and offer substantive feedback on the data analysis.

The topic for our analysis will be to compare the house-price-to-income ratio across US counties and over time. The ratio tells the number of years it would take for the median income household to buy the median household price. Under healthy economic conditions, the rule of thumb is that a buyer can afford a house if its price is equivalent to a house-price-to-income ratio of 2.6. Read the Citylab report report by Richard Florida for more background on the ratio and its importance and US county rankings.

For loading, manipulating, and plotting Census data in R, refer to Lecture 2 video and notes, and in particular, the part of lecture that introduces tidycensus. In that portion of lecture, the exact codes are already given for you. All you need to do typically is to change variable name or year value, and re-run the code.

Required Packages

library(tidycensus)
library(tidyverse)
library(viridis)

Remember to set census_api_key("your_key_here") to access data

You can get a Census API Key at: https://api.census.gov/data/key_signup.html

Instructions

Step 1: Getting the Data

  • Use tidycensus::get_acs() to download information on median household income and median household value for ACS 5-year estimate, 2013-2017 period. The notation tidycensus::get_acs() means the get_acs() function is located inside the tidycensus package, i.e. you need that package loaded to use the function.

  • Call the dataset CenDF

Notes:

Use load_variables to view and search for variables. In filter, search for Median value and use first row to get median household price variable name.

Remember that year=2017 and survey="acs5" reports 5-year estimates for the last five years from the specified date (i.e. year=2017 uses data from 2013-2017).

#Edit me

Step 2: Transforming the Data

  • Use spread() to transform the dataset from long to wide and save the dataset back into CenDF

  • Use mutate() to create the ratio of median household value divided by household median income, and call it HHInc_HousePrice_Ratio

  • Notes

    • It will be helpful to relabel variable names from census name to a more easy to understand name, such as HHIncome and HouseValue. You can do this using mutate() and case_when().
    • It will also be helpful to remove MOE variable using select()
#Edit me

Step 3: Data Exploration

  • Use base R function order() and rev(order) to explore which counties have the lowest and highest house-price-to-income ratio.

    • Example: CenDF[ order(CenDF$HHInc_HousePrice_Ratio) , ] sorts the data by HHInc_HousePrice_Ratio low to high
    • Example: CenDF[ rev(order(CenDF$HHInc_HousePrice_Ratio)) , ] sorts the data by HHInc_HousePrice_Ratio high to low
  • There is always more than one way to accomplish the same task in R. For example, to get the lowest and highest house-price-to-income ratios

    • dplyr: arrange( CenDF, HHInc_HousePrice_Ratio ) sorts the df low to high
    • dplyr: arrange( CenDF, desc(HHInc_HousePrice_Ratio) ) sorts the df high to low
  • Bonus: Use datatable() function in the DT library to get interactive table that allows you to order columns

    • datatable(CenDF)
#Edit me

Step 4: Map the Data

#Edit me

Questions

Question 1: Exploration.

1a. In step 3 above, what is the county with the highest (and lowest) house-price-to-income ratio? What is the average house-price-to-income ratio across all US counties over the time period?

- Answer: 

1b. Where is Maricopa County on the list? How many years of median income will it take to buy a home in Maricopa County?

- Answer: 

1c. Where is Los Angeles on the list and how does its ranking compare to the list reported in the Citylab report report by Richard Florida?

- Answer: 

Question 2: Temporal analysis.

1a. Go back to Step 1 above, and enter in 2012 as the year value, which refers to the 2008-2012 5-year ACS estimate. The time period corresponds to the aftermath of the 2007/08 Great Financial Crisis. Repeat Steps 2-4, and re-answer Question 1 above for the 2008-2012 time period.

- Answer: 

2b. Compare and contrast the findings over the different time periods. Calculate the change in the housing value-to-income ratio from 2008-2012 to 2013-2017. What is the change? Did the average house-price-to-income ratio increase or decrease? Did the house-price-to-income ratio increase (or decrease) for Maricopa County and Los Angeles county?

- Answer: 

Question 3: High-Resolution Analysis.

3a. Go back to Step 1 above and keep 2017 as the year value. This time change geography value from county to tract. Also add state = "AZ" and county = "Maricopa County" within the get_acs function. Repeat Steps 2-4. What are the summary statistics (min, max, median, mean, sd) for the house-price-to-income ratio in Maricopa county?

- Answer: 

3b. Compare and contrast the findings for the different levels of geography. How does the minimum, maxim, and mean value for census tracts within Maricopa county compare to results found in Question 1 looking at county-level data?

- Answer: