Foundations in Data Science III
CPP 528 is the third course in the Foundations of Data Science sequence. This semester synthesizes and extends work from 526 and 527 by introducing project management frameworks to your workflow.
You will apply this knowledge through an applied data project looking at neighborhood change in US metro areas. The project is designed as if you are being hired by the government to evaluate two large federal programs designed to revitalize distressed communities. Your final deliverable will be a report detailing your conclusions. The report will link to a GitHub repository that provides all of the data and code needed to reproduce the results from your study.
You will be assigned to teams. Each team will work on the class project independently. The goal of working in a team is to put the project management principles into practice, and to get some experience collaborating on a project that is large enough that tasks must be split between members and redudancy can be used for quality assurance purposes.
Project Management Skills
The course is designed to teach standard frameworks for organizing large data projects and coordinating team efforts using tools in GitHub and R Studio.
Some easy heuristics to test whether the project management system is working:
- Can you easily reproduce the results from your project working from raw data to final models with a single script?
- Can you identify what changes have been made to your project, by whom, and when?
- Can someone else that was not a member of your team easily use your project?
- Are you building institutional capacity (libraries) to do future projects faster and better?
Examples of Project Repositories Built for Reproducibility:
These GitHub repositories provide a few examples of work that is designed to be reproducible and extensible by providing access to the data and code used for analysis, or in the BBC case generating a set of reporting templates that be re-used and adapted by journalists across the agency to build institutional knowledge to make the organization more efficient.
- City of Chicago Food Inspection Evaluation
- US EPA Modeling Lake Trophic State
- BBC Visual and Data Journalism cookbook for R graphics
- Traffic Stops Across Connecticut
Neighborhood Change Project
CPP 528 is organized around a single large project that allows you to revisit and practice skills from 523, 524, 526 and 527.
For the project, your team has been hired by the federal government to provide a rigorous assessment of program impact. You will assess whether two large federal programs designed to revitalize distressed neighborhoods in US cities have been successful.
You need to compile the data necessary for the analysis, run some models, and provide your client with a final report stating your assessment of program impact.
Low Income Housing Tax Credits
Low Income Housing Tax Credits (LIHTC) are one of the primary policy instruments used to incentivize the construction of new affordable housing units in the United States. Learn about LIHTC:
New Market Tax Credits
New Market Tax Credits (NMTC) are mechanisms designed to catalyze economic development in distressed communities by attrating investments from private developers.
Has each federal program been successful in facilitating economic development in distressed communities?
We will use 2000 to 2010 as the study period and look at broad trends in neighborhood change over this decade, then examine whether neighborhoods targeted by the programs have achieved any more success than they would have without the billion of dollars in federal subsidies.
Course Cadence
The project will be split into the following steps:
- Week 1: Neighborhood revitalization background
- Week 2: Build the census dataset
- Week 3: Descriptive analysis
- Week 4: Model neighborhood change
- Week 5: Estimate program impact
- Week 6: Introduce a new parameter to existing analysis
- Week 7: Finalize deliverables
These analysis will mirror with following project management steps:
- Week 01: Introduction to project management
- Week 02: Introduction to data management
- Week 03: Descriptive analysis of neighborhood change
- Week 04: Predicting median home value change, 2000 to 2010
- Week 05: Adding federal program data to your predictive models
- Week 06: Test reproducible work flow with a parameter change
- Week 07: Finalize project website and project requirements