To recap, our project team is working with the Atlanta Fire Rescue Department (AFRD) to help them understand more clearly what the most predictive factors are for fires, so they can make more informed decisions about conducting fire inspections. Over the past week, our team has been grappling with how best to understand, clean, and merge the various data we have about the many buildings in the city of Atlanta. Our contact at AFRD has been very helpful in providing a large number of datasets about fire incidents, fire inspections, and building information details. However, before we can build any sort of predictive model, we need to know that the buildings referred to in one database, such as the Fire Incidents in Atlanta, are the same buildings referred to in another database, which may have specific building information such as the year it was built, its building material, occupancy, usage, zoning, etc. This process has been spearheaded by Wenwen Zhang, a PhD student conducting research on Geo-Information Systems, or GIS, at Georgia Tech. The diagram below shows the different datasets, and the method we are using to join them together.
We have been working to join the datasets using 3 data types: building addresses, X/Y GPS coordinates, and GIS information known as the “parcel ID.” Parcels are a division of land used by the Tax Commissioner’s office for tax valuation purposes, but are, for our purposes, a useful way to join sometimes inconsistent, vague, or incomplete location data. Once this process is complete, we will have a unified dataset with all of the buildings where fire incidents have occurred, to be able to build our model of predictive factors of fire incidents. In the above diagram, we have “AFRD”, or a database of Fire Incidents in Atlanta from 2011-2015, “FSAF”, a database of Fire Inspections in Atlanta, “Costar”, which is a property assessment of 7000 commercial properties in Atlanta, providing information about specific details of building construction, “SCI”, another building information assessment, and “CO”, or Certificate of Occupancy, which businesses are required to obtain before allowing people inside them.
Below is an example of the method of joining parcels (in tan), with geocoordinates and addresses (in red), with fire incidents (in green). However, the most useful immediate output for our purposes is not a map, but a large database, or CSV file, with all of the commercial properties, their building information (from CoStar), and whether they had a fire or not (from AFRD).
Also, in our process of understanding what data we actually have in these 6 different (and very large) datasets, we have made a set of codebooks for each of them, explaining what the various attributes mean, since many are highly abbreviated, highly specific or non-obvious terminology. Additionally, we have been doing some data cleaning and exploratory data analysis, making sure that, for instance, the mean year of building construction doesn’t appear to be 1432, because of many missing entries containing zeroes, instead of NA. (1432 wasn’t the average year for building construction in Atlanta, it turns out). This is all part of the necessary process of data cleaning before we can begin building our predictive model.
We are meeting today with a group of fire inspectors at the AFRD Central Office, and we’ll be working with them to make sense out of the City of Atlanta Fire Ordinance Codes, to better understand which buildings in the city need inspection permits, based on the materials or processes that take place in that business. We will be trying to draw on their hard-earned experience and tacit knowledge about fire risk factors, to help inform our predictive model.
Other blog posts from our team:
Week 1 – Hello from the DSSG-ATL fire team!
Week 2 – Update from the Fire Team
Week 3 – A Day with Fire Inspectors
Week 4 – Understanding Fire Inspections