Reflections from the DSSG 2015 Fire Team

(posted on behalf of the DSSG Fire team students)


After a year since the culmination of our DSSG 2015 summer program, we wanted to share a bit about how the work we did with DSSG last summer has developed since then.

We spent the summer of 2015 working with the Atlanta Fire Rescue Department (AFRD) to help them use disparate data sources from various city departments to identify new properties that required fire inspection, and we built a predictive model to help them prioritize fire inspections according to the fire risk of commercial properties.

You can see some of our blog posts from last summer on beginning the project, understanding and joining our data sources, riding along with fire inspectors to understand their existing processes, conducting preliminary analyses of the data, and building a predictive model of fire risk.

We created a framework, which we call Firebird, to describe this process of property discovery and risk prediction, as seen below.


As a more permanent home for this work, we have created a website for Firebird, which provides a high-level overview of the project, and includes a link to our code on Github.

At the end of last summer, we presented our work to an audience of local data scientists at the final summer presentation at General Assembly, garnering interest from several firefighters from neighboring counties that were in attendance. Following that presentation, Fire Chief Joel Baker, the head of AFRD, invited our team to speak at a meeting of the AFRD executive staff, including the battalion chiefs for each of the 7 battalions that comprise the city of Atlanta.

Following this, AFRD has already begun to implement our recommendations, from starting to inspect the properties at highest risk of fire at greater priority than other properties, to beginning conversations about allocation of inspection personnel and resources to reflect the distribution of commercial properties requiring inspection in the city.

In September 2015, we submitted and presented a short paper describing our work and its outcomes to the Bloomberg Data for Good Exchange, a conference on applications of data science for problems of social good, involving participants from academia, industry, government, and NGOs.

Then, wanting to further the impact of this work, we submitted a full paper to the 2016 Knowledge Discovery and Data Mining (KDD) conference, a top conference in the data mining field. It has recently been accepted, and we will be presenting the work there in August. A pre-print draft of the paper can be found here.

Finally, two representatives from our project, Dr. Bistra Dilkina and Dr. Matt Hinds-Aldrich, presented this work at the National Fire Protection Association (NFPA) Annual Conference this June. The NFPA magazine also recently published an article on Embracing Analytics, with a nice description of our work, explaining our process and its results to a wider audience of fire professionals.

A Predictive Model for Fire Risk in Atlanta

As part of our deliverables to the Atlanta Fire Rescue Department (AFRD), we are giving them a list of potential properties to inspect. However, we needed to be able to prioritize this list based on fire risk, so that AFRD can best allocate their inspection resources. To prioritize the list of properties to inspect, we created a model that predicts fire risk based on certain characteristics of properties in Atlanta. This model was built in R statistical programming language and used a SVM (Support Vector Machine) algorithm. The model used 58 independent variables to predict fire as an outcome variable. Data sources for features in the model include the Costar properties dataset, Parcel data and SCI data from the City of Atlanta, demographic data from the U.S. Census Bureau, and fire incident and inspection data from AFRD. Features were based on property location, land or property use, financial factors, time-based factors such as year built, condition, occupancy, size, building details, owner information, demographics of property location, and inspection data.

Prediction Model Validation

Our predictive model was found to be highly predictive of fires. We validated our predictive model in two ways:

First, we validated our model using a time-based approach. The model would be easy to validate if we could run the model and, after predicting which buildings would catch on fire in the next year, we could look into the future to see which actually did catch on fire. Because we can’t look into the future, we simulated this approach by using data from 2011 – 2014 to predict fires in the last year of data, 2014 – 2015. We used 10 bootstrapped random samples and took the average of each of them to calculate our results. This model did very well, with an average accuracy of 0.77 and average area under the curve (AUC) of 0.75. Here is a confusion matrix of the results:


Figure 1: Confusion matrix for time-based model validation approach.

The most important metric in this case is true positives – that is, how many properties the model predicted to have a fire that actually did have a fire. Of the properties in the last year of data that did have a fire, our model was able to predict 73.31% of them. This means that for every 10 fires, our model would have predicted approximately seven of them. Considering how few fires occur (only about 6% of properties have fires), this is much better than if you were guessing by chance at which properties would catch on fire.

We also validated our model using 10-fold cross validation, a more standard machine learning validation approach. This model also did quite well, with an average accuracy of 0.78 and average AUC of 0.73. Here is a confusion matrix of the results:


Figure 2: Confusion matrix for 10-fold cross-validation approach.

In this validation, we were able to predict true positives 67.56% of the time. This means that for every 10 fires, our model would have predicted almost 7 of them.

It is worth briefly discussing the implications of the false positives in this model. In both validation approaches, we had a substantial amount of false positives – that is, properties that our model predicted would have a fire, but did not actually have a fire. Though many predictive models try to maximize the specificity (the ratio of true negatives to all negatives) by increasing true negatives and reducing false positives, in the context of determining which properties to inspect, false positives are actually quite valuable. False positives represent properties that share many characteristics with those properties that did catch on fire. Thus, because they have these characteristics, these are properties that may be at high risk of catching on fire, and should be inspected by AFRD. Additionally, because in a sense our training set and the data set that we ultimately apply the model to are the same (that is, the list of commercial properties in Atlanta), a perfect model with no false positives would do nothing more than tell us which buildings had previously caught on fire. While this is useful to know, it is data AFRD already has. False positives give us the added value of predicting properties that have not caught on fire, but are at risk of fire due to their characteristics.

We want to give the caveat that this particular model is not necessarily the best fit of the data. Although we tried many other algorithms and configurations of factors and found this model to be the most predictive, further experimentation would undoubtedly yield a more predictive model. We encourage AFRD or others to build upon our methods to improve the model if they wish.

Applying the predictive model to potential inspections

After we built the predictive model, we applied it to the list of current and potential inspections so that AFRD could prioritize inspections to focus on properties most at risk of fire. To do this, we first computed the raw output of the prediction model on this list of properties. This generated a score between 0 and 1 for each property (see Figure 3 below). To be more useful, we translated these scores to a 1-10 scale. Then we divided these scores into low risk (1), medium risk (2-5), and high risk (6-10).



Figure 3: Transforming model output to risk scores.

We then applied these risk scores to the list of current and potential properties to inspect, and included them on the interactive map.

As a result of this work, AFRD will be able to focus their inspection efforts on those commercial properties in Atlanta that are most at risk of fire. We hope that this focused inspection will result in fewer fires, fewer fire-related injuries, and fewer fire-related deaths in Atlanta.

Thanks for following our blog posts this summer! It’s been a pleasure to work with Dr. Matt Hinds-Aldrich and the rest of our contacts at AFRD. Please feel free to contact me at with any questions about this blog post or the project in general.

– Oliver Haimson