Albany Hub: Week 9

Today marks the completion of the CDS program. Boy, does time fly.

It feels like just yesterday we met with our PI, Dr. Asensio, to go over initial designs of the database we constructed over the last 10 weeks. From the beginning, the objective of this project was to build a comprehensive database to help city officials and Georgia Tech evaluate the impacts of housing investment on utility consumption. The main challenge we faced was that city data were spread across many different departments and entities, many of which had different data entry practices. We also obtained a lot of the data from sources outside the city, such as the Census, NOAA, and a private real estate data company, since this information is not housed within Albany’s databases. Collecting this data turned out to be a bigger challenge than expected, as each dataset posed unique challenges related to access, standardization, or volume.

To wrangle all these disparate datasets into a workable structure, much of our work this summer focused on using automatic processing methods to merge data and evaluate performance in new ways that were not previously possible. This involved standardizing housing addresses within Albany (spelling, street endings, cardinal directions), geocoding all those addresses, parsing data from HUD reports, converting datasets to time series format, and then linking all of these datasets into a relational database structure. In the end, we were able to build a SQL Server database hosted on Azure that links information on utilities, taxes, each housing project, Census data at the block group and tract levels, weather, and real estate information. We used Python to clean and merge the data, ArcGIS for some spatial exploration, and RStudio for preliminary analysis. While we didn’t come away with many tangible insights to share with the city, we created the infrastructure necessary to transition into the analysis phase of the project. The data have come a long way, and we can’t thank everyone involved with the CDS program enough for giving us the opportunity to work with real data that will be used to make a significant impact.

We presented our work in its final form to ESRI and Albany city officials on Wednesday ahead of the CDS end-of-program presentations that same day. They were excited to see the work we had done, and were interested in scheduling a meeting with city officials. We can’t wait to hand the database off to see what kinds of stories will be told. All of our scripts are commented, our process has been documented, and we have constructed a visual schema and data dictionary for the database. This will allow the city to more easily maintain the database and add data in the future. Hopefully, the database will help the city make better-informed policy decisions and initiate conversations between the city and its citizens in the future regarding energy efficiency, housing investment, and neighborhood blight.

Albany Hub: Week 2

Image source: https://community.data.gov.in/

On Thursday and Friday, we finished up research on Albany’s housing programs. These include the Community HOME Investment Program, the Tenant Based Rental Assistance Program, and others, which can be found here.

After we met with Dr. Asensio on Monday, we dove into the housing data with exploratory data analysis. The housing data records each housing project funded by HOME or Community Development Block Grants and the amount that was funded. First, we looked for identifiers in the dataset; these are attributes that uniquely describe each record. Later, we will use the identifiers to link across different datasets. We also found the range of dates, counted the number of observations, and checked that the data was formatted correctly. Ultimately, the data was relatively clean and comprehensive. We will perform the same review on the other datasets over the next few days.

In addition to performing some initial analysis on the data, we brainstormed some response variables to measure the success of the various housing programs. This would involve bringing outside data from organizations like the U.S. Census Bureau, the Centers for Disease Control and Prevention, and NeighborWorks America. Some measures could be median income, unemployment rate, property values, and percent homeownership.

Yesterday, we joined a group call with Albany’s Technology and Communications (TAC) Department and a support team from ESRI to discuss the construction of the ArcGIS Hub. In this call, TAC assigned us some tasks which aligned closely with our project’s goals in the near future. These include performing analytics on the housing and utilities data as well as review other GeoHubs to devise a design for Albany’s ArcGIS Hub. Some examples of other GeoHubs we’re looking at are here and here.

Before we start developing the ArcGIS Hub, we must first build a database to house all necessary datasets such as that for housing, utilities, and weather. We’ll have a lot to do in the next week, like finding linkages across the datasets and looking into different database frameworks. We also plan to investigate Albany spatially, visualizing income level, political boundaries, and the distribution of projects across these boundaries. By our next meeting, we will determine which success measures are possible to calculate and decide on a linking strategy.