This portion of the Neighborhood Change examines the national trends in the median home values from 2000 & 2010, and takes a closer look at the effects in the national metro areas & 3 specific areas. This section also includes statistical techniques to address issues of skew, multicollinearity, and the incorporation of fixed effects to more accurately examine the statistical models. Although the model had unexpected results, the team believes it captured the disparity between the national experience from 2000 to 2010 and the metro areas.
Home values increased nearly across the board between 2000 & 2010. Although for the minimum home values there was a loss of more than $1 million dollars.
##
## ==========================================================================
## Statistic Min Pctl(25) Median Mean Pctl(75) Max
## --------------------------------------------------------------------------
## MedianHomeValue2000 11,167 105,661 154,903 187,129 224,337 1,288,551
## MedianHomeValue2010 9,999 123,200 193,200 246,570 312,000 1,000,001
## MHV.Change.00.to.10 -1,228,651 7,187 36,268 60,047 94,881 1,000,001
## MHV.Growth.00.to.12 -97 6 25 33 50 6,059
## --------------------------------------------------------------------------
## Percent Change in MHV 2000 to 2010
While the above graphs show an increase in the home values, it’s critical to examine the percentage of change to gain a clear perspective.
## While the median home values increased nationally, the cost of
housing did not affect Metro Areas equally. Therefore, it was necessary
to include the changes happening at the Metro level into the model to
ensure that the changes are not wrongly attributed to the Federal
Housing programs.
cbsaname | metro.mhv.change | metro.mhv.growth |
---|---|---|
Abilene, TX | 12667 | 2182 |
Akron, OH | -10634 | -887.3 |
Albany-Schenectady-Troy, NY | 54413 | 4176 |
Albany, GA | 5547 | 645.6 |
Albuquerque, NM | 27947 | 1849 |
Alexandria, LA | 23329 | 3274 |
Variable selection was a challenging portion of this project because of the geographical (census tract or metro level) and either a count or percentage of each variable. The potential number of variables increases very quickly, the 10 variables became 40 potential variables. A model with 40 variables creates 9,880 3-variable models. However, if every census variable is included in the model the total dimension of analysis is more than 10 million. Fortunately, the analysts relied on the theory described below to determine the best practice to create the model.
When the poverty rate is plotted into a basic histogram, the data is
clearly skewed. This is problematic because it can artificially increase
the standard deviation and bias slopes. One of the tools used to
mitigate this problem is transforming the data by using the log
function.
By utilizing this tool, the data is normally distributed & is
useful in our model.
The same is true for the Vacancy Rate variable
The benefits of using the Log transformation becomes more obvious when
the poverty and vacancy variables are compared.
## Addressing Skew in Census Data
The problem of skewness in the data is far more apparent when using a correlation plot. Specifically, the majority of the data points are located in the bottom left section of plots.
Once the variables were transformed using the log function, it’s far
easier to see the statistical relationship between the variables.
Another issue the analysts faced were outliers. The graph below shows
the appearance of a linear relationship but that was mainly due to the
outliers. Using the log transformation in this scenario is helpful.
## Multicollinearlity Challenges
The analysts assessed the variables for the presence of
multicollinearity. This outcome happens when multiple variables are
measuring the same underlying concept with different variables. The
effect of this problem is an increase of the standard deviation and the
decrease of the coefficients.
1.
##
## ===============================================================================================
## Dependent variable:
## ---------------------------------------------------------------------------
## mhv.growth
## (1) (2) (3) (4)
## -----------------------------------------------------------------------------------------------
## Constant 27.61*** 28.27*** 17.43*** -53.33***
## (0.44) (3.25) (0.46) (6.62)
##
## Vacancy 2.27*** -2.78***
## (0.52) (0.60)
##
## % of Aged 25+ 0.65 26.54***
## (1.79) (2.16)
##
## % Unemployed 15.57*** 21.82***
## (0.56) (0.80)
##
## Household Income 4.31***
## (1.13)
##
## -----------------------------------------------------------------------------------------------
## Observations 58,810 58,839 58,801 58,790
## Adjusted R2 0.0003 -0.0000 0.01 0.02
## Residual Std. Error 35.17 (df = 58808) 35.17 (df = 58837) 34.94 (df = 58799) 34.89 (df = 58785)
## ===============================================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
The regression model above does not show indications of multicollinearity & therefore can be included in the model.
It is critical to evaluate the trends in the cities during the study period. Each of the specific metro areas experienced their own period of growth or decline as a result of local, state, or other factors.
This is another way to visualize the the median house values &
household incomes in the metro areas of interest during the 2000 - 2010
period. The effect of these individual differences is the need to
include the unique baseline of each of the metro areas into the
analysis.
The use of the fixed effects model is the best technique because
each of the metro areas has an individual intercept, or starting
point.
After the inclusion of the fixed-effect model into the analysis, a
better regression model is made possible. The variables of choice are
used in three models: M1 = The selected variables changes nationally M2
= The selected variables in metro areas M3 = The selected variables
& national metro growth
##
## =============================================================================
## Dependent variable:
## --------------------------------------------------------
## mhv.growth
## (1) (2) (3)
## -----------------------------------------------------------------------------
## Constant -53.33*** 109.86*** 77.74***
## (6.62) (6.82) (5.13)
##
## Household Income 4.31*** -20.84*** -16.64***
## (1.13) (0.95) (0.87)
##
## Metro MHV Growth 0.01***
## (0.0000)
##
## % Vacant Lots -2.78*** 7.46*** 6.19***
## (0.60) (0.52) (0.47)
##
## % age 25+ 26.54*** 2.97* -0.36
## (2.16) (1.75) (1.67)
##
## % Unemployed 21.82*** -3.16*** -2.23***
## (0.80) (0.69) (0.62)
##
## -----------------------------------------------------------------------------
## Metro Fixed Effects: NO YES NO
## Observations 58,790 58,790 58,790
## Adjusted R2 0.02 0.42 0.42
## Residual Std. Error 34.89 (df = 58785) 26.78 (df = 58407) 26.85 (df = 58784)
## =============================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
The model showed that in areas with a decrease in household income, there was a corresponding increase in the percentage of vacant lots & residents 25+ years old but a decrease in the unemployment rates when the fixed effects model was utilized (M2). The model showed that while the median house value had an overall increase, the reality for residents living in the Metro areas was an increase in unemployment and the percentage of vacant lots.