- Executive Summary
- Project Objective
- Business Use Cases
- Data
- Tools
- Data Cleanup
- Design Considerations I: Data
- Design Considerations II: Implementation
- EER Diagram
- Dimensional Model
- Preliminary Expectations
- Actual Results
- Distance from Housing Developments to One Instance of Public Transportation
- Number of Units by Bus and L Distance
- Most units did have access to nearby parks
- Affordable Housing Developments and Percentage of Neighborhood Below Poverty Line (Centered at 13%)
- Walkscore (Binned by # of Units)
- Developments by Per Capita Income
- Abandoned Buildings VS Affordable Housing
- Conclusion
- Areas of Improvement for Project
- References
Affordable housing developments aid cities in promoting diversity and inclusion, improving quality of life, enriching neighborhoods, and growing the local economy. In this study, we investigate whether the locations of affordable housing developments provide positive features for its population, relating to accessibility to public transportation, amenities, active commercial scene, walkability, and general neighborhood activity.
Through our investigations, we discovered stark discrepancies between neighborhoods with a multitude of affordable housing developments and those without. Generally, areas with scarce access to transit and worse socio-economic performance contain most affordable housing developments.
This project seeks to develop an end-to-end data pipeline for affordable housing data to:
- Examine data between the location, price, and size of affordable housing developments and standard-of-life features available for tenants.
- Foster a better understanding of:
- Factors that affect location, prices, amenities, and quality of affordable housing developments.
- The distribution of socio-economic indicators, economic activity, access to transit and parks, in each neighborhood.
- Potentially identify correlations that may confirm or deny the team’s initial hypotheses related to affordable housing.
Understanding affordable housing development’s characteristics’ relationship to socio-economic features could be used to develop more conscious policies and drive significant investments that provide inhabitants of affordable housing a better quality of life (i.e. development of additional public transportation in isolated neighborhoods, investment
in local businesses, etc.)
Data sources included:
- City of Chicago: Affordable Housing Development Dataset, Business Licenses Dataset, Park Dataset, Abandoned Property Datasets
- Zillow and Cook County Records API
- Neighborhood Shapefiles and Boundary files for GIS-type applications including generation of Table Keys (Neighborhoods), Public Transit and City Park GIS data points.
- Web-Scraping Walk-Scores of Chicago neighborhoods Data Profile:
- 428 data points, 75 columns, 85.5K rows
- Data was scraped via Python (Walk-Scores), downloaded from online sources (City of Chicago for Housing Developments, Commerce, Park) and gathered using APIs (Google Geocoding, Zillow, Cook County).
- Pandas was used for data cleanup.
- R was used to build an additional Geocoding API (Google’s) to obtain coordinates from address.
- Our data database was built using MySQL. DDL and DML tables were created for each table.
- Tableau was used to generate tables and charts to draw insights from our data.
- ETL Scripts written in Python and R
- Keys were generated for tables (neighborhood from latitude and longitude, or from address):
- Chicago neighborhood shapefiles
- Geocodes via GeoPy and Google’s Geocoding API via GCP to obtain Latitude and Longitude from address
- Web-scraping Walk-Scores to calculate “walkability” of area
- Number of tables and joins are not extensive, so this pipeline can be implemented locally given enough disk space
- Due to the archival nature of our data, our application can be used as an OLAP system.
- Snowflake data model
- One to many relationship between individual address and neighborhood requires definition of granularity at neighborhood level
- In less affluent areas
- Near abandoned buildings
- In highly-commercialized areas
- Near public transit
- Near parks
- In less affluent areas ✔
- Near abandoned buildings ~
- In highly-commercialized areas~
- Near public transit ~
- Near parks X
Through the course of the project, we discovered strong correlations between the count of affordable housing developments per neighborhood and:
- Socioeconomic status of neighborhood, which could perhaps be explained by higher land costs for more affluent neighborhoods
- Transit scores of neighborhood for L trains, which could perhaps further isolate neighborhoods with lower household income per capita. Additionally, weaker correlations were identified between the count of affordable housing developments per neighborhood and:
- Business density of each neighborhood
- Count of abandoned buildings of each neighborhood It is difficult to draw conclusions from these weaker correlations.
- With enough computing resources and time, further granularity could be utilized to map by-block impacts, driven by the by address level development, assessment, and transportation data, rather than by the neighborhood or area level. This could yield further insight and inform policy by visualizing ripple effects related to affordable housing developments and the blocks in which they are built.
- Obtaining data from the 2020 U.S. Census and re-evaluate our findings
- Study additional datasets (type of housing, family size, health background, etc.)
- City of Chicago Data Portal: https://data.cityofchicago.org/
- Institute for Housing Studies at DePaul University: https://www.housingstudies.org/
- Google Maps Platform: https://developers.google.com/maps/documentation
- Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython 1st Edition by Wes McKinney