Predictive Model of Fire Risk - Firebird: Predicting Fire Risk and Prioritizing Fire Inspection

Chapter 9: Firebird: Predicting Fire Risk and Prioritizing Fire Inspections in

9.5 Predictive Model of Fire Risk

However, 19,397 new properties (or even the shorter list of 6,096) is far more than AFRD is able to add to their annual property inspections, and not all of those properties are likely to need inspection at the same priority. We therefore created a predictive model to generate a fire risk score based on the building- and parcel-level characteristics of properties that had fire incidents in the last five years. This model was built using the scikit-learn machine learning package in Python [188]. The model uses 58 independent variables to predict fire as an outcome variable for each property.

9.5.1 Data Cleaning

After joining various datasets together to obtain building- and parcel-level information, significant data cleaning still needed to occur. The bulk of the data cleaning process involved finding the extent of the missing data and deciding how to deal with that missingness. Our missingness procedures were designed to minimize deletion of properties with missing data, because a significant number of the properties in our model had NA values (not available) for many variables (such as the structure condition of a building, which is only known if the building was inspected by the CoStar Group before). For each property with missing data for a particular feature, we replaced missing values with 0 when appropriate. We also included a binary feature indicating whether each property had missing data for each feature.

We used log transformation for variables with a large numerical range, such as the “for sale”

price of properties.

9.5.2 Feature Selection

After merging datasets, we had a total of 252 variables for each property. We manually examined each variable to determine whether it may be relevant to fire prediction, and excluded many obviously non-predictive variables in this initial process (such as the phone

number of the property owner, or property ID numbers). We then used forward and backward feature selection processes to determine each variable’s contribution to the model, and removed the variables that did not contribute to higher predictive accuracy. Our final model includes only 58 variables. We then expanded categorical variables into binary features. For example, the zip code variable was expanded into 37 binary features, and for each property only one zip code was coded as 1 (all zip codes were designated as 0 if a property’s zip code data was missing). After expansion, we had 1127 features in total.

9.5.3 Evaluation of the Models

We chose to validate our model using a time-partitioned approach. A fire risk model would ideally be tested in practice by predicting which properties would have a fire incident in the following year, and then waiting a year to verify which properties actually did catch fire.

Because we wanted to effectively evaluate the accuracy of our model without waiting a year to collect data on new fires, we simulated this approach by using data from fire incidents in July 2011 to March 2014 as training data to predict fires in the last year of our data, April 2014 to March 2015.

We used grid search with 10-fold cross validation on the training dataset to select the best models and parameters. The models we tried included Logistic Regression [189], Gradient Boosting [190], Support Vector Machine (SVM) [174], and Random Forest [175]. SVM and Random Forest performed the best, with comparable performances (see Table 9.3).

For SVM, the best configuration is using RBF kernel with C = 0.5 and γ = _#features¹⁰ . For Random Forest, restricting the maximum depth of each tree to be 10 gave the best performance. Increasing the number of trees in general improves the performance, but we only used 200 trees since adding more trees only obtained insignificant improvement.

We then trained SVM and Random Forest on the whole training set using the best parameters and generated predictions on the testing set. Note that training and testing sets include the same set of properties, but different labels correspond to fires in different periods

0.0 0.2 0.4 0.6 0.8 1.0

False Positive Rate

0.0 0.2 0.4 0.6 0.8 1.0

True Positive Rate

10-fold CV (AUC: 0.8131) testing (AUC: 0.8246)

(a) Random Forest

0.0 0.2 0.4 0.6 0.8 1.0

False Positive Rate

0.0 0.2 0.4 0.6 0.8 1.0

True Positive Rate

10-fold CV (AUC: 0.8052) testing (AUC: 0.8079)

(b) SVM Figure 9.3: ROC curves of Random Forest and SVM

of time. This is a valid approach because we didn’t use information that we would only know after the training period, i.e., fires in 2015.

The ROC curves for the training and testing performances are shown in Figure 9.3. All the results are averaged over 10 trials. The most important metric in this case is the true positive rate (TPR), i.e., how many fires were correctly predicted as positive in our model.

The SVM model was able to predict 71.36% of the fires in 2014-2015, at a false positive rate (FPR) of 20%, which was deemed practically useful for AFRD — potential to save lives (by achieving a higher TPR) significantly outweighs the increase in FPR. At the same time, a high FPR facilitates more inspections of risky buildings, which is also beneficial. In practice, AFRD can adjust the TPR/FPR ratio to match their risk aversity and inspection capacity. The Random Forest model achieved a slightly lower TPR of 69.28% at the same FPR, but had a higher area under the ROC curve (AUC). Considering how few fires occur (only about 6% of the properties in our total dataset had fires), these results are much more predictive than guessing by chance.

False positives (FPs) provide important information to AFRD. As our testing period was the final year in our dataset, it is possible that some of those FP properties may actually catch fire in the near future. These properties share many characteristics with those that did

catch fire, and should likely be inspected by AFRD.

9.5.4 Further Discussion of the Models

In this section, we discuss some insight we obtained while conducting the experiments.

First, there is a mismatch between the meaning of labels in the training and testing datasets.

The training labels represent fires that happened in a relatively long period of time, whereas the testing labels represent fires in a single year. One way to address this issue would be to expand each properties into multiple examples, one for each year. Each example is then a properties for a particular year, and the corresponding label indicates whether there was a fire in that year. Using this approach, however, did not improve the performance in our experiments. The reason is that most of our variables are static, such as floor size and zip code, and only a few variables are time-dependent, such as the age of the building and the time since last inspection. Therefore, expanding the properties only gives us many similar examples. However, this approach would potentially be helpful after collecting other dynamic information in the future, such as violations of health codes, sanitation ordinances, or other information from relevant city agencies.

Another important issue is whether the performance of predicting fires is consistent in different testing time periods. To test this, we tried different time windows for training, and for each window, we evaluated its prediction performance for the subsequent year.

For each time window, we repeated the process described in Section 9.5.3, including grid search and cross validation, and finally used the best model to predict fires in the following year. The results are shown in Table 9.3. The performances decrease slightly for shorter training periods. This is due to fewer positive training examples, especially in the period of 2011-2012, which only consists of eight months of data (July 2011 to March 2012).

However, this is still significantly better than guessing by chance, which demonstrates that we were not just “lucky” in predicting fires for a particular year.

Finally, it is helpful for us and for AFRD to know which features are the most effective

Testing AUC of the following year Training window Random Forest SVM

2011-2012 0.7624 0.7614

2011-2013 0.8030 0.7914

2011-2014 0.8246 0.8079

Table 9.3: Testing AUC of each year

predictors. The Random Forest model presents a natural way to evaluate feature importance:

for each decision tree in the Random Forest, the importance of a feature is calculated by the ratio of examples split by it. The final importance is then averaged among all trees.

The top ten most predictive features are displayed in Table 9.4. Collectively, they capture the intuitive insight that buildings of a larger size or those containing more units (thus more people) would have higher probability of catching fire, and those of higher appraised value and higher taxes would have a lower probability of catching fire. The impact of higher appraised property value may be due to more developed fire prevention practices or infrastructure, but this hypothesis has not been empirically validated.

We also tried logistic regression, a linear model, to estimate each feature’s importance based on the corresponding weight coefficient in the model. We found that the top features in the logistic regression were very different from the ones in Random Forest. All were binary features indicating either a particular neighborhood or property owner. Some neighborhoods have either very high or low fire rates, and logistic regression tends to assign large positive or negative weights to them, respectively. However, since each of these features is only good at predicting a small number of properties within a certain area but does not predict well on the overall data, they are not chosen in the first few iterations of a decision tree.

9.5.5 Assignment of Risk Scores

After we built the predictive model, we then applied the fire risk scores of each property to the list of current and potential inspectable properties, so that AFRD could focus on inspecting the properties most at risk of fire. To do this, we first computed the raw output

Top 10 features 1 floor size 2 land area 3 number of units 4 appraised value 5 number of buildings 6 total taxes

7 property type is multi-family 8 lot size

9 number of living units 10 percent leased

Table 9.4: Top-10 features in Random Forest

of our predictive model for the list of properties we used to train and test the model. This generated a score between 0 and 1, which we then mapped to the discrete range of 1 to 10 that is easier for our AFRD colleagues to work with. Then, based on visual examination of the clustering of risk scores, we categorized the scores into low risk (1), medium risk (2-5), and high risk (6-10). These risk categorizations were intended to assign a manageable amount of medium risk (N = 402) and high risk properties (N = 69) for AFRD to prioritize.

We then needed to find out which of the properties with risk scores were in the lists of 2,573 current annually inspected properties and 6,096 potentially inspectable properties.

Because of the lack of a consistent property ID across the various datasets used to develop the risk model, the currently inspected and potentially inspectable properties were spatially joined with the properties in the risk model, based on their geo-coordinates or addresses.

After joining, we were able to assign risk scores to 5,022 of the 8,669 total commercial properties on the inspection list (both currently inspected [2,573] and potentially inspectable [6,096]).

在文檔中 I Adversarial Attack and Defense of Deep Neural Networks 17 (頁 187-193)