Descriptive Statistics

The median home value in 2000 is $187,129
Descriptives in 2000 & 2010
Historgram of the MHV Change 2000-2010
Challenges in Variable Selection
Addressing the Variable Skew
Group Structure
Discussion

This portion of the Neighborhood Change examines the national trends in the median home values from 2000 & 2010, and takes a closer look at the effects in the national metro areas & 3 specific areas. This section also includes statistical techniques to address issues of skew, multicollinearity, and the incorporation of fixed effects to more accurately examine the statistical models. Although the model had unexpected results, the team believes it captured the disparity between the national experience from 2000 to 2010 and the metro areas.

The median home value in 2000 is $187,129

Descriptives in 2000 & 2010

Home values increased nearly across the board between 2000 & 2010. Although for the minimum home values there was a loss of more than $1 million dollars.

## 
## ==========================================================================
## Statistic              Min     Pctl(25) Median   Mean   Pctl(75)    Max   
## --------------------------------------------------------------------------
## MedianHomeValue2000   11,167   105,661  154,903 187,129 224,337  1,288,551
## MedianHomeValue2010   9,999    123,200  193,200 246,570 312,000  1,000,001
## MHV.Change.00.to.10 -1,228,651  7,187   36,268  60,047   94,881  1,000,001
## MHV.Growth.00.to.12    -97        6       25      33       50      6,059  
## --------------------------------------------------------------------------

Historgram of the MHV Change 2000-2010

## Percent Change in MHV 2000 to 2010

While the above graphs show an increase in the home values, it’s critical to examine the percentage of change to gain a clear perspective.

## While the median home values increased nationally, the cost of housing did not affect Metro Areas equally. Therefore, it was necessary to include the changes happening at the Metro level into the model to ensure that the changes are not wrongly attributed to the Federal Housing programs.

cbsaname	metro.mhv.change	metro.mhv.growth
Abilene, TX	12667	2182
Akron, OH	-10634	-887.3
Albany-Schenectady-Troy, NY	54413	4176
Albany, GA	5547	645.6
Albuquerque, NM	27947	1849
Alexandria, LA	23329	3274

Challenges in Variable Selection

Variable selection was a challenging portion of this project because of the geographical (census tract or metro level) and either a count or percentage of each variable. The potential number of variables increases very quickly, the 10 variables became 40 potential variables. A model with 40 variables creates 9,880 3-variable models. However, if every census variable is included in the model the total dimension of analysis is more than 10 million. Fortunately, the analysts relied on the theory described below to determine the best practice to create the model.

Addressing the Variable Skew

When the poverty rate is plotted into a basic histogram, the data is clearly skewed. This is problematic because it can artificially increase the standard deviation and bias slopes. One of the tools used to mitigate this problem is transforming the data by using the log function.
By utilizing this tool, the data is normally distributed & is useful in our model.
The same is true for the Vacancy Rate variable The benefits of using the Log transformation becomes more obvious when the poverty and vacancy variables are compared. ## Addressing Skew in Census Data

The problem of skewness in the data is far more apparent when using a correlation plot. Specifically, the majority of the data points are located in the bottom left section of plots.

Once the variables were transformed using the log function, it’s far easier to see the statistical relationship between the variables.

Another issue the analysts faced were outliers. The graph below shows the appearance of a linear relationship but that was mainly due to the outliers. Using the log transformation in this scenario is helpful.

## Multicollinearlity Challenges

The analysts assessed the variables for the presence of multicollinearity. This outcome happens when multiple variables are measuring the same underlying concept with different variables. The effect of this problem is an increase of the standard deviation and the decrease of the coefficients.
1.

## 
## ===============================================================================================
##                                                 Dependent variable:                            
##                     ---------------------------------------------------------------------------
##                                                     mhv.growth                                 
##                            (1)                (2)                (3)                (4)        
## -----------------------------------------------------------------------------------------------
## Constant                 27.61***           28.27***           17.43***          -53.33***     
##                           (0.44)             (3.25)             (0.46)             (6.62)      
##                                                                                                
## Vacancy                  2.27***                                                  -2.78***     
##                           (0.52)                                                   (0.60)      
##                                                                                                
## % of Aged 25+                                 0.65                                26.54***     
##                                              (1.79)                                (2.16)      
##                                                                                                
## % Unemployed                                                   15.57***           21.82***     
##                                                                 (0.56)             (0.80)      
##                                                                                                
## Household Income                                                                  4.31***      
##                                                                                    (1.13)      
##                                                                                                
## -----------------------------------------------------------------------------------------------
## Observations              58,810             58,839             58,801             58,790      
## Adjusted R2               0.0003            -0.0000              0.01               0.02       
## Residual Std. Error 35.17 (df = 58808) 35.17 (df = 58837) 34.94 (df = 58799) 34.89 (df = 58785)
## ===============================================================================================
## Note:                                                               *p<0.1; **p<0.05; ***p<0.01

The regression model above does not show indications of multicollinearity & therefore can be included in the model.

Group Structure

It is critical to evaluate the trends in the cities during the study period. Each of the specific metro areas experienced their own period of growth or decline as a result of local, state, or other factors.

This is another way to visualize the the median house values & household incomes in the metro areas of interest during the 2000 - 2010 period. The effect of these individual differences is the need to include the unique baseline of each of the metro areas into the analysis.

The use of the fixed effects model is the best technique because each of the metro areas has an individual intercept, or starting point.

After the inclusion of the fixed-effect model into the analysis, a better regression model is made possible. The variables of choice are used in three models: M1 = The selected variables changes nationally M2 = The selected variables in metro areas M3 = The selected variables & national metro growth

## 
## =============================================================================
##                                        Dependent variable:                   
##                      --------------------------------------------------------
##                                             mhv.growth                       
##                             (1)                (2)                (3)        
## -----------------------------------------------------------------------------
## Constant                 -53.33***          109.86***           77.74***     
##                            (6.62)             (6.82)             (5.13)      
##                                                                              
## Household Income          4.31***           -20.84***          -16.64***     
##                            (1.13)             (0.95)             (0.87)      
##                                                                              
## Metro MHV Growth                                                0.01***      
##                                                                 (0.0000)     
##                                                                              
## % Vacant Lots             -2.78***           7.46***            6.19***      
##                            (0.60)             (0.52)             (0.47)      
##                                                                              
## % age 25+                 26.54***            2.97*              -0.36       
##                            (2.16)             (1.75)             (1.67)      
##                                                                              
## % Unemployed              21.82***           -3.16***           -2.23***     
##                            (0.80)             (0.69)             (0.62)      
##                                                                              
## -----------------------------------------------------------------------------
## Metro Fixed Effects:         NO                YES                 NO        
## Observations               58,790             58,790             58,790      
## Adjusted R2                 0.02               0.42               0.42       
## Residual Std. Error  34.89 (df = 58785) 26.78 (df = 58407) 26.85 (df = 58784)
## =============================================================================
## Note:                                             *p<0.1; **p<0.05; ***p<0.01

Discussion

The model showed that in areas with a decrease in household income, there was a corresponding increase in the percentage of vacant lots & residents 25+ years old but a decrease in the unemployment rates when the fixed effects model was utilized (M2). The model showed that while the median house value had an overall increase, the reality for residents living in the Metro areas was an increase in unemployment and the percentage of vacant lots.