This week has centered around preparing the database to be shipped off to ESRI and the Albany team. We’ve written documentation, commented our code, and compiled our many cleaning scripts into larger chunks that anyone could run to reproduce our results. We were lucky enough to meet with Clayton Feustel, a Ph.D. student working with Amanda and Ellen on a similar housing database for Atlanta. He reviewed our current database and suggested some future directions in terms of documentation and databasing best practices.
We also added on to the Census table, per Albany’s request. Our database now contains data as far as back as 2009 as opposed to a single year and demographic measures for each block group such as race and age. Including data over multiple years will allow us to evaluate the success of the housing programs over time and improve the accuracy of our analysis.
To accurately conduct analysis, we must also normalize utility consumption by fluctuations in weather, or control for the weather. If we take the data at face value, there may be instances of high consumption of gas (for heating) or electricity (for cooling) that are the result of a particularly cold or hot month. Thus, extremes or anomalies in weather must be accounted for in the consumption of utilities. Without this, we would be misinterpreting our data and could not say confidently whether a housing project actually made a difference in utility consumption. At the moment, we’re still working on a weather normalization process. Many utility management companies test changes in consumption of a single house by setting a base year for weather and consumption and applying adjustments to the consumption data of future years they’d like to compare. So far, this is the strategy we’ve been thinking about — we just need to generalize it. This is what we will continue to work towards for weather normalization.
In the final week of the CDS program, we will conduct preliminary analysis now that the database is largely complete and prepare a poster that details some of our findings as well as the process we went through to construct the database. Our goal with the preliminary analysis is to show what kinds of programs receive the most funding and the return, if any, of housing investment on utility consumption. We look forward to finally using the database to tell a story about the effectiveness of federal housing policies in Albany.
