Albany Hub: Week 8

This week has centered around preparing the database to be shipped off to ESRI and the Albany team. We’ve written documentation, commented our code, and compiled our many cleaning scripts into larger chunks that anyone could run to reproduce our results. We were lucky enough to meet with Clayton Feustel, a Ph.D. student working with Amanda and Ellen on a similar housing database for Atlanta. He reviewed our current database and suggested some future directions in terms of documentation and databasing best practices.

We also added on to the Census table, per Albany’s request. Our database now contains data as far as back as 2009 as opposed to a single year and demographic measures for each block group such as race and age. Including data over multiple years will allow us to evaluate the success of the housing programs over time and improve the accuracy of our analysis.

To accurately conduct analysis, we must also normalize utility consumption by fluctuations in weather, or control for the weather. If we take the data at face value, there may be instances of high consumption of gas (for heating) or electricity (for cooling) that are the result of a particularly cold or hot month. Thus, extremes or anomalies in weather must be accounted for in the consumption of utilities. Without this, we would be misinterpreting our data and could not say confidently whether a housing project actually made a difference in utility consumption. At the moment, we’re still working on a weather normalization process. Many utility management companies test changes in consumption of a single house by setting a base year for weather and consumption and applying adjustments to the consumption data of future years they’d like to compare. So far, this is the strategy we’ve been thinking about — we just need to generalize it. This is what we will continue to work towards for weather normalization.

In the final week of the CDS program, we will conduct preliminary analysis now that the database is largely complete and prepare a poster that details some of our findings as well as the process we went through to construct the database. Our goal with the preliminary analysis is to show what kinds of programs receive the most funding and the return, if any, of housing investment on utility consumption. We look forward to finally using the database to tell a story about the effectiveness of federal housing policies in Albany.

Missing our time in Albany!

Albany Hub: Week 7

What a busy weekend it was! Last week, we had Thursday and Friday off to celebrate the Fourth. We all went to different events in Atlanta to celebrate — some of us went to Ponce City Market and others went to Centennial Olympic Park, all to watch some incredible fireworks displays. If you go to Centennial to watch fireworks next year, make sure to arrive early! On Sunday, Dr. Asensio took us to an Atlanta United soccer game. It was a close finish, but in the end, the game was a draw, 3-3. It was some of our first times going to an MLS game, so we were really happy to have had the opportunity to go.

The next day, we got ourselves together, started our data dictionary, and finally drove down to Albany. After our three-hour drive down south, we checked into the hotel and grabbed some dinner at Harvest Moon. The next morning, before our busy day of meetings, we stopped in The Cookie Shoppe for breakfast, mostly featuring plenty of biscuits. Afterward, we walked to our meeting location. Here’s the basic schedule of our day: 

  • 9 – 10:30 Albany’s DCED department, a discussion of the housing projects
  • 10:30 – 12 conversations with Albany’s utilities department
  • 1 – 2:30 Georgia Smart Communities working meeting
  • 2:30 – 4 meeting with Fight Albany Blight (FAB)

Through these meetings, we feel that we’ve gained a much better understanding of the context of the project and the places and people we are working with. It was lovely to meet everyone in Albany. Following our busy day of meetings, Ms. Shuronda Hawkins was kind enough to give us a tour of downtown Albany. Before we left for Atlanta, we grabbed dinner at The Flint with our team and Ms. Hawkins. 

Since returning to Atlanta, we’ve come back feeling refreshed and excited to finish up our work on the project. We look forward to finalizing our database so it can be passed onto Albany and they can continue on with their incredible work.

Albany Hub: Week 6

This week we put the final touches on the database. This involved cleaning addresses, cleaning the census data, and pulling in housing data from ATTOM’s data API. The ATTOM dataset contained properties of each property in Albany, such as square footage, number of rooms, flooring style, and the date of the most recent major improvements. We hope to use these fields to identify reference groups across Albany. These reference groups will allow us to analyze a difference in means between households that did and did not receive funding. In our context, reference groups will consist of groups of households with similar properties to the households that received project funding. 

To begin this analysis, we constructed tables for each of our utility types (gas, electric, water, and sewage) and looked at the number of projects funded, number of unique addresses with each utility type, and mean consumption by block group. These tables serve to show preliminary findings in potential differences between funded and nonfunded homes in the context of utility consumption. We hope to investigate these tables further by looking at outliers, normalizing by square footage, and running t-tests between the two groups of houses.

Finally, we geared up for our pair of days in Albany. We met with Amanda Meng, a research scientist working on the open government data aspect of our project. She will be bringing us to Albany, where we can ask staff clarifying questions about the programs (eligibility requirements, monitoring of programs, direction and motivation of the programs) while she will simultaneously be conducting interviews with staff and participants of the housing projects. 

We’ll be here soon!

Cheeeeeers from Albany! (in a few days)

 

 

Albany Hub: Week 5

It’s week 5, which means we’re already over halfway through the program! The rate of work has been picking up. This week, we gave our midterm presentation to Dr. Le Dantec and a group of researchers and visualization experts. It was a great opportunity to share our research and gain some valuable feedback. We also got to watch the presentations of GwinNETTwork and FloodBud and learn about their work. 

Last week, we worked on the database and finally incorporated Census data. The data we obtained contains information such as employment rate, median income, and vacancy status for the block groups and Census tracts within Albany. This information will be invaluable in helping us evaluate the success of the housing projects. After downloading the data from American FactFinder, we restructured it so that each row corresponds to a unique tract-block group combination, joined the data sets by tract and block group, deleted irrelevant and repeated fields, and then renamed the fields as to make for easier analysis. We named the resulting table census_blockgroup.

We integrated all of our tables into a SQL database, which will make it simple to query and retrieve data. This involved picking the column names wisely and ensuring that every column was encoded as the data type that minimizes storage size without truncation. The five tables we have are utilities, housing projects, weather, Census, and addresses. The next step will be to use this for statistical evaluation; hopefully by next week, we will have summary tables to characterize the key variables. 

Here is a short snapshot of our data; since there are so many columns, it wouldn’t all fit on screen!

From the utilities:

From the census data:

An image of the query editor, which lets us retrieve information from the database in real time:

Albany Hub: Week 4

We’ve faced two big challenges this past week: getting to know Albany spatially and preparing all data for our research-grade database. Because of the nature of these problems, our team has split into our specializations and tackled them in groups of two.

On Thursday and Friday, Olivia and David worked to map Albany’s housing projects on ArcGIS, layered with various attributes like median household income and political ward boundaries. At the moment, we only have access to tract-level data for indicators we’ve obtained. We’ve even contacted the Census to request more granular data, but unfortunately, they can only offer it at the tract level. With this median household income data, we’ve created a rough draft of what Albany looks like spatially with respect to median household income. All shading is the result of the default settings of ArcGIS. The dots represent different projects, colors represent each project’s federal funding source, and the size of the dot represents the amount of money invested in the project. Again, the sizes of the dots are all the result of default settings in ArcGIS and do not accurately reflect the full picture of Albany. The tract shapes displayed are those either fully or partially within Albany; they do not reflect the boundaries of the city.

We used the same dots of the projects for the next visual but layered with political ward boundaries rather than Census tracts. While the previous map does not contain the boundaries of Albany, this map does.

While Olivia and David developed these maps, Mirabel and Billy worked to clean up all of the sources of data for the database. These datasets include the weather of Albany, information on the housing projects, utility billing, and data from the Census. Most of their work has come from standardizing addresses, ensuring that streets are recorded as streets rather than drives or avenues, checking that address numbers line up across datasets, as well as other tests to verify continuity across datasets. Essentially, we need the addresses to match in all datasets so that when they’re merged, all data is managed precisely so that we don’t lose any valuable information. Mirabel also took the time to geocode all addresses in Albany. This will associate each address in Albany with a physical location on a map. In continuing our spatial analysis, this addition will be indispensable.  

Besides cleanup and geocoding, they set up the basic structure of our database. Our internal database will be queried using SQL behind the Georgia Tech firewall. With this database, we will be able to answer our own and our advisor’s research questions through statistical analysis. All of our questions will be centered around the following motivation: to evaluate the effectiveness of Albany’s energy efficiency housing projects. Our focus for this week will be on finalizing this database. We’re expected to present the first draft to Dr. Asensio tomorrow.

Also this past week, Olivia attended the Georgia Smart Communities Challenge press conference in Macon to represent our project. Albany Hub is part of the current lineup of projects for this past year’s winners. At the conference, the newly awarded communities were announced. Congratulations to the winning communities! You can find more information about the challenge and the winning proposals here.

That’s all for now. Talk to you next week!

Albany Hub: Week 3

This week, our team had the privilege of attending the Machine Learning in Science and Engineering Conference. The conference focused on showcasing machine learning research in interdisciplinary fields – our group mostly attended the talks on public policy and computer engineering. Some of the details were difficult to understand for undergraduates, but we got a sense of the depths of research going on in this field. We also attended the Women in Data Science workshop and got to hear from some inspiring women working with data nationwide.

(from left: David Reynolds, Olivia Fiol, Mirabel Reid, Billy Jang)

We focused on a particular research question: what is the impact of housing programs in Albany geared toward energy efficiency on utilities consumption? In order to build the database that we would use to answer this question, we investigated and cleaned weather and utilities data of Albany, answering the same 8 questions from the week prior. We also tried to incorporate census data, but ran into a lot of challenges; the website was confusing, and it seemed impossible to download all the data we needed at once. Ultimately, we decided to push this data collection off until we can access more resources.

Next, we all gathered in front of the whiteboard to brainstorm how we could link the different sheets together in a relational database format. This involved drawing connections between similar fields in the utilities and housing datasheets. We decided that the primary identifier would be the address (which would then be tied to parcel, block group, tract, and XY coordinate). This whiteboard map serves as our plan of action when we create the database.

Lastly, we got access to ArcGIS, which will allow us to investigate Albany spatially. Hopefully, by next week we’ll have created some enlightening maps to share with you.

Albany Hub

Hello! Our names are Olivia Fiol, Billy Jang, Mirabel Reid, and David Reynolds, and we’re working to bring Smart Housing Analytics to Albany, Georgia. Georgia Tech has partnered with the city of Albany to build a housing data inventory in order to better manage housing investments and address issues of neighborhood blight. Under the direction of Dr. Omar Asensio and in conjunction with Georgia Tech’s Smart Communities Challenge, we will:

  • provide data-driven analysis of housing policies in Albany, Georgia and
  • utilize the ArcGIS Hub tool to visualize housing data and engage with the community.

From this work, local policymakers and community leaders will be given greater access to housing data, allowing them to make better-informed, data-driven decisions.

(source: https://www.digitalcommonwealth.org/search/commonwealth:79407z14m)

Through the partnership with Albany, 20 years worth of housing data has been aggregated. This gives us ample opportunities to perform statistical and computational analysis to evaluate and better understand the performance of Albany’s housing investments. This week, we are performing background research to learn more about housing policies in Albany. We will then run descriptive statistics in Python to get a better understanding of the housing data.

Ultimately, we hope to provide Albany with an interactive tool that allows community members and researchers to better understand the past 20 years of Albany’s housing investments. The functionality and design of the tool will be shaped by community workshops held in Albany.

By the end of the summer, we hope to get to know the city of Albany and make a visible positive impact on the community.